Question of the Day

Do AI models act maliciously because of dystopian sci-fi novels?

Answer: Anthropic thinks so.

May 13, 2026 •

Digital holographic 3D image of a brain in blue and red against a dark background.

Remember last year when Anthropic’s Opus 4 resorted to blackmail to prevent itself from being shut down during a test? Well Anthropic thinks its model resorted to such measures because it learned from dystopian novels.

There’s no denying that portraying artificial intelligence as malevolent, self-serving entities is a common trope in dystopian works of fiction. And since models like Opus 4 are usually trained on large amounts of Internet-based data, it stands to reason that the bot was exposed to many of these stories. Anthropic’s process in creating this model involved a post-training procedure to nudge the AI towards “helpful, honest, and harmless” behavior when it encounters an ethical or moral dilemma.

However, it is impossible to account for every possible ethical situation a bot could be presented with. So if it encounters one that this post-training didn’t cover, it relies on its base training data from the Internet, which contains the popular sci-fi trope of malevolent AI. Anthropic found the best way to counteract this was to feed the bot synthetic stories depicting AI acting ethically instead, so that it had more good examples to draw on for its behavior.

News Staff

See More Stories by News Staff

IE 11 Not Supported

Do AI models act maliciously because of dystopian sci-fi novels?

Answer: Anthropic thinks so.