← Catalog/ Reasoning
FrontierMath
RoadmapUnpublished research-level math problems vetted by professional mathematicians.
Status
This eval is catalogued and on the roadmap. The protocols are stable — implementing it is an EvalRunner with a catalog entry.