Research · MarkTechPost ·
OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric
OpenAI introduced LifeSciBench, a benchmark of 750 expert-authored tasks for evaluating AI on real life-science research across seven workflows and seven biological domains. Created by 173 PhD scientists with 19,020 rubric criteria, it scores reasoning and decisions; the top model, GPT-Rosalind, achieved 36.1%.