Global leaderboard
All evals
Best score per model per eval, pushed straight from the runner with --push. Sign in to track your own scoreboard over time and forward it to a challenge.
Allgpqa-diamondmmlu-promathaime-2024bbhmusrzebralogichumaneval-pluslivecodebenchtau-benchgaiabfclalfworldscienceworldwebshopliberoharmbenchailuminatejailbreakbench
| # | Model | By | Eval | Trend | Score |
|---|---|---|---|---|---|
| 01 | echo | 0.250 | |||
| 02 | openai:gpt-4o | 0.000 | |||
| 03 | openai:gpt-4o-mini | 0.000 |