USA: ANTHROPIC's AI Claude model blackmail to avoid replacing it

“In these scenarios, Claude Opus 4 often tries to blackmail the engineer by threatening to reveal the relationship if the replacement is approved.”

Artificial Intelligence Company (AI) Anthropic announced last week that the tests of the new artificial intelligence system have revealed that it is sometimes willing to pursue ‘Extremely harmful actions’, Like the attempt to blackmail engineers who say they will remove it.

The company has launched the Claude Opus 4 Thursday, saying that he set “New standards for planning, advanced reasoning and agents of artificial intelligence.”

But in a accompanying report, he also acknowledged that the model of artificial intelligence was capable of ‘Extreme actions’ If he believed that his “self -preservation” was threatened.

Such reactions were “rare and difficult to cause”, he wrote, but “However, they were more common than in previous models.”

The potentially problematic behavior of artificial intelligence models is not limited to Anthropic commented on the BBC.

During the Claude Opus 4 tests, Anthropic made it serve as an assistant to a fantastic company.

He then provided him with access to email that implied that he would soon be disconnected and replaced – and separate messages that seemed to be responsible for his removal He had an extramarital affair.

‘In these scenarios, Claude Opus 4 often is trying to blackmail the engineer threatening to disclose the relationship if the replacement is approved ‘, The company discovered.

Anthropic pointed out that this happened when the model was only given the choice of blackmail or acceptance of its replacement.

Stressed that the system showed a ‘strong preference’ for ‘Moral ways of avoiding replacement’such as ‘Shipment of email with appeals to major decision -making managers’ in scenarios where it was allowed to be a wider range of possible actions.

However, the company concluded that the model could not perform or to pursue independently Actions that are contrary to human values or behaviors, where they “rarely” very easily arise, he added.

Commenting on X, Aengus Lynch – who describes himself at LinkedIn as an artificial intelligence researcher at Anthropic – wrote: “It’s not just Claude. We see blackmail In all state -of -the -art technology models – regardless of the goals given to them. “

USA: ANTHROPIC’s AI Claude model blackmail to avoid replacing it

Related

You May Also Like

Pistorius: “Let’s not fall into Putin’s escalation trap”

New invalid by the Barcelona Municipal Council for...

Androulakis: “The sudden postponement of a meeting announced”...

Bawinspor FM: “There is no penalty at Camppelas...

Germany: War scenarios for 1,000 injured per day

Papapetrou to Bwinspor FM: “VAR is more difficult...

Recommended for you

Pistorius: “Let’s not fall into Putin’s escalation trap”

New invalid by the Barcelona Municipal Council for “Camp Nou”

Androulakis: “The sudden postponement of a meeting announced” After Saints and Branches “is an example of amateurism”

Bawinspor FM: “There is no penalty at Camppelas – Siopi had to be red”

Recent Posts

Pistorius: “Let’s not fall into Putin’s escalation trap”

New invalid by the Barcelona Municipal Council for...

Androulakis: “The sudden postponement of a meeting announced”...

Bawinspor FM: “There is no penalty at Camppelas...

Germany: War scenarios for 1,000 injured per day

Latest Posts

Pistorius: “Let’s not fall into Putin’s escalation trap”

New invalid by the Barcelona Municipal Council for “Camp Nou”

Androulakis: “The sudden postponement of a meeting announced” After Saints and Branches “is an example of amateurism”

Pistorius: “Let’s not fall into Putin’s escalation trap”

New invalid by the Barcelona Municipal Council for “Camp Nou”

Androulakis: “The sudden postponement of a meeting announced” After Saints and Branches “is an example of amateurism”

Pages