Challenges
Bigger boards.
A challenge is a larger, time-boxed leaderboard — Kaggle-style. Run an eval, then forward the run with one API call. Your GitHub repo or endpoint rides along, so results stay reproducible.
SAMPLE CHALLENGES — real ones open when the database goes live
Reasoning Open 2026
2026-06-01 → 2026-09-01Best combined score across GPQA Diamond, MATH, and AIME 2024. Any model, any size, attach your repo.
POST /api/v1/challenges/reasoning-open-2026/submissions
{"run_id": "<your run id>"}Open Code Sprint
2026-06-15 → 2026-08-15HumanEval+ pass rate, open-weights models only. One push per day counts.
POST /api/v1/challenges/open-code-sprint/submissions
{"run_id": "<your run id>"}