Challenges

Bigger boards.

A challenge is a larger, time-boxed leaderboard — Kaggle-style. Run an eval, then forward the run with one API call. Your GitHub repo or endpoint rides along, so results stay reproducible.

SAMPLE CHALLENGES — real ones open when the database goes live

Reasoning Open 2026

2026-06-01 → 2026-09-01

Best combined score across GPQA Diamond, MATH, and AIME 2024. Any model, any size, attach your repo.

gpqa-diamond math aime-2024

POST /api/v1/challenges/reasoning-open-2026/submissions
{"run_id": "<your run id>"}

Open Code Sprint

2026-06-15 → 2026-08-15

HumanEval+ pass rate, open-weights models only. One push per day counts.

humaneval-plus

POST /api/v1/challenges/open-code-sprint/submissions
{"run_id": "<your run id>"}

How submissions work →