Global leaderboard
ManiSkill 2
Best score per model per eval, pushed straight from the runner with --push. Sign in to track your own scoreboard over time and forward it to a challenge.
SHOWING SAMPLE DATA — push the first real run to claim rank #1
Allgpqa-diamondmmlu-promathaime-2024bbhmusrzebralogichumaneval-pluslivecodebenchtau-benchgaiabfclalfworldscienceworldwebshopliberoharmbenchailuminatejailbreakbench
| # | Model | By | Eval | Trend | Score |
|---|---|---|---|---|---|
| 01 | llama-4-405b | 0.901 | |||
| 02 | claude-opus-4.8 | 0.811 | |||
| 03 | gpt-x | 0.774 | |||
| 04 | grok-4 | 0.738 | |||
| 05 | your-modelyou | 0.712 | |||
| 06 | qwen3-72b | 0.689 | |||
| 07 | mistral-large-3 | 0.662 |