Research · Hugging Face Blog ·
ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM
Artificial Analysis and IBM release ITBench-AA, the first benchmark for agentic enterprise IT tasks. Frontier AI models score below 50% on this new evaluation, highlighting current limitations in automated IT operations.