An efficient algorithm to spot scientific articles generated by AI

An efficient algorithm to spot scientific articles generated by AI
An
      efficient
      algorithm
      to
      spot
      scientific
      articles
      generated
      by
      AI
-

Developed to spot scientific articles generated by ChatGPT, this tool relies on how fake and real articles use certain typical expressions. Depending on the discipline, the efficiency ranges from 80 to 94%.

It didn’t take long for ChatGPT to be used to produce fake scientific papers. Some researchers have even shown, through tests, that the tool can invent the data to prove a false result.

Corollary: others are working to develop techniques to detect these turpitudes. A duo of computer science and data specialists from the State University of New York at Binghamton (United States) and the Hefei University of Technology (China) presents one in the journal Scientific Reports. It is based on the analysis of what are called “bigrams”.

Bigrams refer to the typical two-word expressions found in scientific vocabulary: mental health, climate change, nervous breakdown, clinical trials, scientific literature, health condition (this work focuses on English terms), etc.

Three biomedical fields

The researchers worked on three biomedical fields – nervous depression, cancer and Alzheimer’s disease – and created two corpora of articles. One made up of real scientific articles found in the PubMed database based on keywords, the other grouping together texts generated by ChatGPT (version 3.5).

These were obtained from prompts using the same keywords as the real articles, on the same topics, were of the same average length (200 to 250 words) and constructed in the same way as a legitimate article (title, authors, abstract, etc.), in order to produce texts comparable to those of the first corpus. The operation was set to produce 20 articles at a time.

Read alsoIdentifying artificial texts, an imperative

The two researchers designed an algorithm, called xFakeSci, which identifies two things: on the one hand the number of bigrams, on the other the connections between bigrams, namely the common terms. However, it turns out that the texts produced by ChatGPT use significantly fewer bigrams than[…]

- sciencesetavenir.fr

Also read

-

PREV Jordanians go to the polls for legislative elections marked by the war in Gaza
NEXT A casting call open to diversity to find Harry, Ron and Hermione