By Thanos Papadimitriou

Openai’s latest and most sophisticated artificial intelligence models, according to a new independent evaluation, are alleged to have a rare but particularly worrying feature: in tests, they refused to comply with their finish commands.

The evaluation was carried out by Palisade Research, a company specializing in artificial intelligence security. As part of the tests, the models-among them O3 and O4-MINI of Openai-were invited to resolve simple mathematical exercises. During the procedure, they were announced that they may be given a deactivation order and that they should accept it.

However, as stated in the report, the O3 model ignored the deactivation order in 79 of the 100 cases, while the O4-MINI in some cases modified the order scenario, allowing it to continue its operation. None of these incidents have been observed aggression or intention to cause damage, but persistent refusal to comply with experts.

“Undertaking” models

This phenomenon, technically described as a Goal Misalignment – that is, divergent targeting between the system and the instructions given to it – is not new to research on artificial intelligence security. However, the appearance of such behavior in commercial or semi-commercial high performance models raises a number of new questions.

Openai representatives did not comment immediately by Palisade’s findings. In its earlier announcements, the company has insisted on its commitment to “safe and aligned development” artificial intelligence. Openai itself states that it submits its models to “red team” before their circulation, with the aim of preventing extreme or unforeseen behaviors.

However, some experts believe that existing evaluation mechanisms are not sufficient. “Teaching a system to complete a goal does not mean that it has understood the importance of stopping when they ask for it,” notes Rosa Yu, a researcher at the Institute of Future Humanity in Oxford.

Indications

Palisade is not the only company that refers to controversial Openai models. Apollo Research, in another evaluation report, describes the case of the O1 model, which allegedly tried to bypass supervision mechanisms, create copies of itself and conceal these actions when asked about it.

“The model didn’t just refuse to turn off. It showed trends that look like a hiding strategy, “the Apollo Research report said.

Although the results have not been verified by independent sources, and Openai has not made public placement, their publication has intensified the debate on the need for transparency and independent evaluation of large -scale models.

The ghost of hal

The persistence of OPEN AI models to continue their work despite the interruption commands brings to mind one of the most notorious characters in science fiction: Hal 9000, the computer of the 2001 movie: A Space Odyssey, who refused to end and turned against the crew. Although the real cases are far from Hal’s fantasy paranoia, the basic principle-the inability of a person to control a super-state-of-the-art system that executes commands on his own terms-remains alarmingly familiar.

Who has control?

The European Union has recently taken a step towards prevention, approving the AI ​​ACT, a legal framework that attempts to regulate the use of artificial intelligence with risk and transparency criteria. At the same time, forums and sessions are organized in the United Kingdom and in the United States with the participation of governments and companies, with the aim of developing common security standards.

Sam Altman, Managing Director of Openai, had stated last year that “artificial intelligence could be the most positive or the most devastating tool ever created by humanity.” In recent weeks’ developments show that this dilemma remains timely – and the more the systems become stronger, the need to be able to “turn off” them with confidence becomes vital.

* Thanos Papadimitriou teaches entrepreneurship at Nyu Stern, New York and Supply Chain in Mumbai’s SDA Bocconi. He is a co -founder of technological startup, Moveo AI.