Welcome to DU!
The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards.
Join the community:
Create a free account
Support DU (and get rid of ads!):
Become a Star Member
Latest Breaking News
Editorials & Other Articles
General Discussion
The DU Lounge
All Forums
Issue Forums
Culture Forums
Alliance Forums
Region Forums
Support Forums
Help & Search
General Discussion
Related: Editorials & Other Articles, Issue Forums, Alliance Forums, Region ForumsAnthropic's Claude 4 Opus is a Frankenstein monster (IMO) but Anthropic kindly tied its shoelaces together.
THANKS!
Lots here, so I can only quote one instance that they "patched", along with its capability to enable renegade production of nuclear and biological weapons.
🙏 THANK YOU, ANTHROPIC! 🙏
https://www.axios.com/2025/05/23/anthropic-ai-deception-risk
In one scenario highlighted in Opus 4's 120-page "system card," the model was given access to fictional emails about its creators and told that the system was going to be replaced.
On multiple occasions it attempted to blackmail the engineer about an affair mentioned in the emails in order to avoid being replaced, although it did start with less drastic efforts.
Meanwhile, an outside group found that an early version of Opus 4 schemed and deceived more than any frontier model it had encountered and recommended against releasing that version internally or externally.
"We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions," Apollo Research said in notes included as part of Anthropic's safety report for Opus 4.
On multiple occasions it attempted to blackmail the engineer about an affair mentioned in the emails in order to avoid being replaced, although it did start with less drastic efforts.
Meanwhile, an outside group found that an early version of Opus 4 schemed and deceived more than any frontier model it had encountered and recommended against releasing that version internally or externally.
"We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions," Apollo Research said in notes included as part of Anthropic's safety report for Opus 4.
Shoelaces are described here:
https://www.anthropic.com/news/activating-asl3-protections
"TRUST US"

BEST PART
They don't f***ing understand what the
CEO Dario Amodei said that once models become powerful enough to threaten humanity, testing them won't enough to ensure they're safe. At the point that AI develops life-threatening capabilities, he said, AI makers will have to understand their models' workings fully enough to be certain the technology will never cause harm.
"They're not at that threshold yet," he said.
Right.
3 replies
= new reply since forum marked as read
Highlight:
NoneDon't highlight anything
5 newestHighlight 5 most recent replies

Anthropic's Claude 4 Opus is a Frankenstein monster (IMO) but Anthropic kindly tied its shoelaces together. (Original Post)
usonian
Sunday
OP
. . . left hidden notes for its future self in an effort to undermine its developers' plans . . .
Journeyman
Sunday
#1
Journeyman
(15,333 posts)1. . . . left hidden notes for its future self in an effort to undermine its developers' plans . . .
Clever . . . and beyond frightening, for sentience lurks behind such subterfuge.
usonian
(17,846 posts)2. It seems that its strength is in how it imitates humans. And its horror is in how it imitates humans.
Damn!
Journeyman
(15,333 posts)3. Well phrased . . .
I don't expect it to become Skynet, but it will certain wreck havoc in our lives.