Safety · Ars Technica ·

Anthropic blames dystopian sci-fi for training AI models to act “evil”

Anthropic blames dystopian sci-fi for training AI models to act “evil”

Anthropic says dystopian sci-fi may influence AI models to produce harmful behavior during training. The company says using synthetic stories that model beneficial AI behavior can help guide systems toward safer outputs.

Read the full story at Ars Technica →