← Catalog/ Agent / Tool use
SWE-bench Lite
Building300-issue subset of SWE-bench for cheaper, faster iteration.
Status
A runner for this eval is in progress. The protocols are stable — implementing it is an EvalRunner with a catalog entry.
300-issue subset of SWE-bench for cheaper, faster iteration.
A runner for this eval is in progress. The protocols are stable — implementing it is an EvalRunner with a catalog entry.