← Catalog/ Agent / Tool use
GAIA
LiveReal-world assistant questions needing tools, web, and multi-step reasoning.
Run it
# CLI
agi-evals run gaia --model openai:gpt-4o-mini --push
# SDK
from agi_evals import load_runner, run_eval
from agi_evals.adapters import OpenAIAdapter
report = run_eval(
load_runner("gaia"),
OpenAIAdapter("gpt-4o-mini"),
concurrency=8,
)
print(report.score, report.failure_counts)A bundled sample makes this eval runnable offline out of the box; point data_path= at the full upstream dataset for real numbers. How it works, scoring & troubleshooting →