← Catalog/ Code
BigCodeBench
RoadmapPractical programming tasks chaining many real library function calls.
Status
This eval is catalogued and on the roadmap. The protocols are stable — implementing it is an EvalRunner with a catalog entry.
Practical programming tasks chaining many real library function calls.
This eval is catalogued and on the roadmap. The protocols are stable — implementing it is an EvalRunner with a catalog entry.