← Catalog/ Agent / Tool use
MLE-bench
Roadmap75 Kaggle competitions where agents build and submit ML solutions.
Status
This eval is catalogued and on the roadmap. The protocols are stable — implementing it is an EvalRunner with a catalog entry.
75 Kaggle competitions where agents build and submit ML solutions.
This eval is catalogued and on the roadmap. The protocols are stable — implementing it is an EvalRunner with a catalog entry.