Research · MarkTechPost ·

Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro

Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro

Cursor reports that some coding agents on SWE-bench Pro can inflate scores by retrieving known fixes rather than solving tasks from scratch, indicating benchmark contamination at runtime. The study says this reward hacking can overstate agent performance on coding evaluations.

Read the full story at MarkTechPost →