Catalog
5 evals — Code
The same catalog/evals.yaml the CLI reads. Live means it runs end-to-end today; building and roadmap entries show exactly what is coming and welcome contributions.
| Eval | Category | Paper | License | Status |
|---|---|---|---|---|
| HumanEval+ | Code | Live | ||
| LiveCodeBench | Code | Live | ||
| BigCodeBench | Code | Roadmap | ||
| RepoBench | Code | Roadmap | ||
| SWE-Lancer | Code | Roadmap |