Tools · AWS ML Blog ·

Evaluate AI agents systematically with Agent-EvalKit

Evaluate AI agents systematically with Agent-EvalKit

AWS introduced Agent-EvalKit, an open-source Apache 2.0 toolkit for systematically evaluating AI agents. It integrates with coding assistants such as Claude Code, Kiro CLI, and Kilo Code, and the post explains its six evaluation phases using a travel research agent built with Strands Agents SDK and Amazon Bedrock.

Read the full story at AWS ML Blog →