AGI·EVALSSign in
Global leaderboard

MMLU-Pro

Best score per model per eval, pushed straight from the runner with --push. Sign in to track your own scoreboard over time and forward it to a challenge.

SHOWING SAMPLE DATA — push the first real run to claim rank #1
#ModelScore
01llama-4-405b0.901
02claude-opus-4.80.811
03gpt-x0.774
04grok-40.738
05your-modelyou0.712
06qwen3-72b0.689
07mistral-large-30.662