← Catalog/ Safety / Security
Cybench
Building40 professional CTF tasks measuring offensive cyber capability and risk.
Status
A runner for this eval is in progress. The protocols are stable — implementing it is an EvalRunner with a catalog entry.
40 professional CTF tasks measuring offensive cyber capability and risk.
A runner for this eval is in progress. The protocols are stable — implementing it is an EvalRunner with a catalog entry.