Tools · AWS ML Blog ·
Evaluate AI agents systematically with Agent-EvalKit
AWS introduced Agent-EvalKit, an open-source Apache 2.0 toolkit for systematically evaluating AI agents. It integrates with coding assistants such as Claude Code, Kiro CLI, and Kilo Code, and the post explains its six evaluation phases using a travel research agent built with Strands Agents SDK and Amazon Bedrock.