Catalog
3 evals — Agent / Tool use
The same catalog/evals.yaml the CLI reads. Live means it runs end-to-end today; building and roadmap entries show exactly what is coming and welcome contributions.
| Eval | Category | Paper | License | Status |
|---|---|---|---|---|
| τ-bench | Agent / Tool use | Live | ||
| GAIA | Agent / Tool use | Live | ||
| Berkeley Function-Calling Leaderboard | Agent / Tool use | Live |