Tools · Hugging Face Blog ·
EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios
Hugging Face announced EVA-Bench Data 2.0, expanding the benchmark to 3 domains, 121 tools, and 213 scenarios for evaluating tool use and agent performance. The update provides a broader test set for measuring capabilities across more tasks and environments.