AGI·EVALSSign in
← Catalog/ Agent / Tool use

SWE-bench Lite

Building

300-issue subset of SWE-bench for cheaper, faster iteration.

Status

A runner for this eval is in progress. The protocols are stable — implementing it is an EvalRunner with a catalog entry.