AGI·EVALSSign in
Catalog

2 evals — Agent / Tool use

The same catalog/evals.yaml the CLI reads. Live means it runs end-to-end today; building and roadmap entries show exactly what is coming and welcome contributions.

EvalStatus
SWE-bench VerifiedBuilding
SWE-bench LiteBuilding