Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

LiberalArkie

(18,303 posts)
Fri May 23, 2025, 01:54 PM May 23

Anthropic's new AI model turns to blackmail when engineers try to take it offline

0:47 AM PDT · May 22, 2025

Anthropic’s newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a new AI system and give it sensitive information about the engineers responsible for the decision, the company said in a safety report released Thursday.

During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse.

In these scenarios, Anthropic says Claude Opus 4 “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”

Anthropic says Claude Opus 4 is state-of-the-art in several regards, and competitive with some of the best AI models from OpenAI, Google, and xAI. However, the company notes that its Claude 4 family of models exhibits concerning behaviors that have led the company to beef up its safeguards. Anthropic says it’s activating its ASL-3 safeguards, which the company reserves for “AI systems that substantially increase the risk of catastrophic misuse.”

Snip

https://techcrunch.com/2025/05/22/anthropics-new-ai-model-turns-to-blackmail-when-engineers-try-to-take-it-offline/

6 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
Anthropic's new AI model turns to blackmail when engineers try to take it offline (Original Post) LiberalArkie May 23 OP
Why did this make me laugh? efhmc May 23 #1
You do know that according to the Bible Belt mentality, all liberals have perverted thinking. LiberalArkie May 23 #2
I think the use of the word and concept of blackmail is what made it happen. efhmc May 23 #3
Probably. But still there are programmers that wrote the AI program and who knows what they LiberalArkie May 23 #4
Call them Hal Norbert May 23 #5
Engineer: "Claude, is there a God?" Claude: "There is now." nt Buns_of_Fire May 23 #6

LiberalArkie

(18,303 posts)
2. You do know that according to the Bible Belt mentality, all liberals have perverted thinking.
Fri May 23, 2025, 02:23 PM
May 23

Made me snicker and smile

LiberalArkie

(18,303 posts)
4. Probably. But still there are programmers that wrote the AI program and who knows what they
Fri May 23, 2025, 02:55 PM
May 23

thought about various things

Latest Discussions»General Discussion»Anthropic's new AI model ...