Research · The Decoder ·
An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run
Epoch AI’s MirrorCode benchmark measures whether models can recreate full programs without seeing the original code. Claude Opus 4.7 led the test with a 56% solve rate and rebuilt a 16,000-line toolkit in 14 hours, but all models still failed on the hardest tasks.