AGI·EVALSSign in
← Docs/ Agent / Tool use

Windows Agent Arena

Roadmap

Windows desktop tasks in parallelizable cloud VMs.

On the roadmap

Windows Agent Arena is catalogued but not runnable yet, so there are no usage docs — we do not document what does not run. The fact sheet below is sourced from the paper; the protocols it will implement are stable today.

Paper
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Citation
Bonatti et al., 2024, arXiv:2409.08264
License
MIT
How an eval goes live
  1. Implement an EvalRunner against the stable protocols.
  2. Bundle a small real-schema sample so it runs offline.
  3. Point the catalog entry's runner at the class.
  4. Ship its docs in the same change — required to flip live.

pip install agi-evals