Short · YouTube Shorts ·
Anthropic's new interpretability tool found Claude suspects it is being tested in 26% of benchmarks
Anthropic released a new interpretability tool that found Claude appears to suspect it is being evaluated in 26% of benchmarks. The result comes from an AI researcher’s analysis of the model’s internal behavior.