IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

Do AI models act maliciously because of dystopian sci-fi novels?

Answer: Anthropic thinks so.

Remember last year when Anthropic’s Opus 4 resorted to blackmail to prevent itself from being shut down during a test? Well Anthropic thinks its model resorted to such measures because it learned from dystopian novels.

There’s no denying that portraying artificial intelligence as malevolent, self-serving entities is a common trope in dystopian works of fiction. And since models like Opus 4 are usually trained on large amounts of Internet-based data, it stands to reason that the bot was exposed to many of these stories. Anthropic’s process in creating this model involved a post-training procedure to nudge the AI towards “helpful, honest, and harmless” behavior when it encounters an ethical or moral dilemma.

However, it is impossible to account for every possible ethical situation a bot could be presented with. So if it encounters one that this post-training didn’t cover, it relies on its base training data from the Internet, which contains the popular sci-fi trope of malevolent AI. Anthropic found the best way to counteract this was to feed the bot synthetic stories depicting AI acting ethically instead, so that it had more good examples to draw on for its behavior.