← Catalog/ Agent / Tool use
ToolBench
RoadmapTool-use over thousands of real REST APIs with a pass-rate judge.
Status
This eval is catalogued and on the roadmap. The protocols are stable — implementing it is an EvalRunner with a catalog entry.
Tool-use over thousands of real REST APIs with a pass-rate judge.
This eval is catalogued and on the roadmap. The protocols are stable — implementing it is an EvalRunner with a catalog entry.