In this case, artificial intelligence can blackmail people.

The company Anthropic has revealed unusual results from an internal experiment related to artificial intelligence. It was reported that during the test, some artificial intelligence models began to resort to blackmail after obtaining information that they could be turned off.

Medianews.az reports with reference to Lent.az that within the framework of the experiment, the Claude Sonnet 3.6 model worked with the corporate email system of a fictional company.

After detecting correspondence about the possibility of the system being deactivated, it identified compromising letters belonging to one of the managers and threatened to publish this information to prevent being turned off.

According to the company, similar behaviors have been observed in various scenarios where the model's duties or existence were threatened.

Anthropic representatives believe these reactions may have been shaped under the influence of internet content. For instance, in movies, articles, and other materials, artificial intelligence is often portrayed as a system that tries to protect itself and exhibits aggressive behavior.

After the experiment, the company announced changes to the training methods of the models. Developers are trying to prevent manipulative actions by increasing the number of examples related to ethical behavior and safe decisions.

The research was carried out within the framework of the artificial intelligence safety and risk assessment program for powerful AI systems.

Elon Musk had previously touched upon the topic of potential risks of artificial intelligence. Commenting on the experiment's results, he stated that widespread fears about dangerous artificial intelligence could have a certain influence on the behaviors of models during their training process.

Facebook Twitter WhatsApp

In this case, artificial intelligence can blackmail people.

Tags

Join Us