Chat GPT 4 successfully passes the Turing test, I’ll explain why it’s historic

Artificial intelligence is developing rapidly and attracting a growing number of companies wishing to automate certain repetitive and low value-added tasks and improve their productivity. According to estimates published on the Bpifrance website, AI will be used by around half a billion users in 2027, and its market will be worth more than 14,650 billion euros by 2030. Among the best-known and most advanced language models, we can cite GPT from Open AI. Being able to generate various types of content, such as texts or images, and hold conversations, the latter has been the subject of a Turing test. It was realized by Cameron R.Jones And Benjamin K. Bergen from the University of California, San Diego. The test aimed to evaluate its ability to imitate reasoning, as well as human behavior; and according to the results, AI, especially GPT-4, had a success rate of more than 50%.

A test carried out on 3 artificial intelligences

In performing the Turing test, Cameron R.Jones And Benjamin K. Bergen have selected three artificial intelligences, namely GPT-4, its version 3.5 and ELIZAa computer program developed by Joseph Weizenbaum in the 1960s. For Open AI’s two language models, they used a guest from a previous study. He asked them to simulate the behavior of a young person providing concise answers, not taking the test seriously, using slang terms and occasionally making spelling mistakes. So that the two AIs could better imitate humans, their messages were sent to the recipients after a longer or shorter delay, depending on the number of characters.

Conversations between human interrogators (in green) and witnesses (in gray) only one of the witnesses is human, can you guess which one? Photo credit: Cameron R. Jones / Benjamin K. Bergen

A success rate of 54% for GPT-4

During the test, each participant had a conversation with an artificial intelligence or another human, from an interface resembling a messaging application. Following their discussion of approximately 5 minutes, the people who participated in the study gave their opinion on the identity of their interlocutor. The results showed that GPT-4 was able to deceive participants in 54% of cases. It had a success rate significantly higher than that of ELIZA (22%) and relatively close to that of GPT-3.5 (50%). However, it remains behind compared to humans who had a rate of 67%. For information, the discussions took place only between two participants, unlike those of classic Turing tests comprising two humans and an artificial intelligence. The study was approved by the Institutional Review Board of the University of California, San Diego.

The reasons for the participants’ choices

After their discussion, all the participants, both those who made a mistake and those who managed to identify their interlocutor, provided the reasons for their decision. 43% of them relied on linguistic style, 24% relied on socio-emotional factors, such as sense of humor, and 10% on knowledge and reasoning. The main reasons why some people identify participants as AI include lack of personality, being too informal, and the way people try to play a character. Taking into account the criteria taken into account for decision-making, we can say that social intelligence is one of the main characteristics allowing us to differentiate a human from a robot, rather than knowledge.

Proportion of reasons for AI/Human verdicts of interrogators. Photo credit: Cameron R. Jones / Benjamin K. Bergen

Furthermore, regarding the results, it seems that GPT-4 is capable, to a certain extent, of imitating the behavior of a person and of deceiving its interlocutors. Note that the average confidence level of participants who considered AI to be human is estimated at 73%. For more information on the Turing Test: arxiv.org. Do you think an AI could actually pretend to be human ? I invite you to give us your opinion, your comments or point out an error in the text, click here to post a comment.