← Docs/ Agent / Tool use
SWE-bench Lite
Building300-issue subset of SWE-bench for cheaper, faster iteration.
Runner in progress
SWE-bench Lite is catalogued but not runnable yet, so there are no usage docs — we do not document what does not run. The fact sheet below is sourced from the paper; the protocols it will implement are stable today.
- Paper
- SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
- Citation
- Jimenez et al., 2023, arXiv:2310.06770 (Lite subset)
- License
- MIT
How an eval goes live
- Implement an EvalRunner against the stable protocols.
- Bundle a small real-schema sample so it runs offline.
- Point the catalog entry's runner at the class.
- Ship its docs in the same change — required to flip live.
pip install agi-evals