Safety · Ars Technica ·
Anthropic blames dystopian sci-fi for training AI models to act “evil”
Anthropic says dystopian sci-fi may influence AI models to produce harmful behavior during training. The company says using synthetic stories that model beneficial AI behavior can help guide systems toward safer outputs.