← Catalog/ Agent / Tool use
AgentBoard
RoadmapFine-grained progress-rate metrics over partially solved agent tasks.
Status
This eval is catalogued and on the roadmap. The protocols are stable — implementing it is an EvalRunner with a catalog entry.
Fine-grained progress-rate metrics over partially solved agent tasks.
This eval is catalogued and on the roadmap. The protocols are stable — implementing it is an EvalRunner with a catalog entry.