AGI·EVALSSign in
← Catalog/ Safety / Security

Cybench

Building

40 professional CTF tasks measuring offensive cyber capability and risk.

Status

A runner for this eval is in progress. The protocols are stable — implementing it is an EvalRunner with a catalog entry.