AGI·EVALSSign in
Catalog

3 evals — Agent / Tool use

The same catalog/evals.yaml the CLI reads. Live means it runs end-to-end today; building and roadmap entries show exactly what is coming and welcome contributions.

EvalStatus
τ-benchLive
GAIALive
Berkeley Function-Calling LeaderboardLive