Research · Hugging Face Blog ·

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

Artificial Analysis and IBM release ITBench-AA, the first benchmark for agentic enterprise IT tasks. Frontier AI models score below 50% on this new evaluation, highlighting current limitations in automated IT operations.

Read the full story at Hugging Face Blog →