Research · The Decoder · 26 June 2026

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Epoch AI’s MirrorCode benchmark measures whether models can recreate full programs without seeing the original code. Claude Opus 4.7 led the test with a 56% solve rate and rebuilt a 16,000-line toolkit in 14 hours, but all models still failed on the hardest tasks.

Read the full story at The Decoder →