Open Source · Hacker News ·
Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers
Senior SWE-Bench is an open-source benchmark designed to evaluate AI agents on tasks framed as senior software engineer work. The project provides an assessment framework and benchmark materials for measuring agent performance on software engineering problems.