Safety · The Decoder · 16 May 2026

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

Researchers at Carnegie Mellon University created a benchmark to measure how far AI agents can go in exploiting real vulnerabilities in Google’s V8 engine. In tests, Claude Mythos outperformed GPT-5.5 by a wide margin, but at about 12 times the cost.

Read the full story at The Decoder →