Artificial intelligence, a new threat to the integrity of Wikipedia – rts.ch

Sunday 03rd November 2024 01:48 AM

The online collaborative encyclopedia is facing a proliferation of articles artificially created by conversational robots. Faced with this threat to the reliability of information, Wikipedia’s moderation teams are developing new strategies for detecting and verifying suspicious content.

It’s a silent battle playing out behind the scenes of the largest collaborative online encyclopedia. The rise of generative artificial intelligence (GAI) places Wikipedia facing a major challenge. From now on, the site must face a proliferation of articles entirely created or partially modified by conversational robots like ChatGPT. At stake, the reliability of l’information.

On the only English version of Wikipedia, the pace is dizzying: a new page is created every minute. In this continuous flow of contributions, the encyclopedia teams detect dozens of artificially generated texts and photos every day. A situation which pushed contributors to create specialized brigades, such as the “WikiProject AI Cleanup”, responsible for tracking down this suspicious content.

The Amberlisihar affair: when AI invents a ghost fortress

The Amberlisihar Fortress affair perfectly illustrates the scale of the problem. For almost a year, Wikipedia readers were able to discover the detailed history of this 600-year-old Ottoman fortress. The article, written over 2000 words, accurately described its historical battles and its multiple renovations, all supported by apparently solid references.

Artificial intelligence regularly invents references that do not exist, making verification particularly complex

Ilyas Lebleu, co-founder of the WikiProject AI Cleanup project

Impressive documentation, except for one detail: the fortress never existed. The whole thing had been generated by an artificial intelligence, which had skillfully mixed fiction and real historical figures to give its story an appearance of truth.

“The real problem with ChatGPT lies in its relationship to the sources,” analyzes Ilyas Lebleu, one of the founders of the cleaning project WikiProject AI Cleanup. “Artificial intelligence regularly invents references that do not exist, making verification particularly complex. How can we differentiate an authentic but rare ancient work from an entirely fabricated source?” A question all the more crucial as problematic contributions are not limited to the creation of fictitious articles.

The expertise of volunteers in the face of AI markers

The moderation teams have thus discovered numerous cases of approximate enrichment of existing articles. Ilyas Lebleu cites the revealing example of an Iranian village: “ChatGPT had added a bucolic description of a picturesque agricultural village. However, the geographical reality was quite different: the locality is located in the heart of a mountainous desert area. ” This tendency of AI to generate standardized descriptions, without taking into account the real context, poses a major challenge to the encyclopedia.

Faced with this threat, Wikipedia’s volunteer teams have developed advanced linguistic expertise. In particular, they identified stylistic markers characteristic of texts generated by AI. “Certain expressions, such as ‘rich cultural heritage’, too subjective for an encyclopedia, recur recurrently in artificial productions,” explains Ilyas Lebleu.

There are of course voluntary creators of disinformation, but also users in good faith. “These are often people who are not very informed about how Wikipedia works and who, seeing something that generates content, say to themselves that it is perfect for expanding the encyclopedia,” explains Ilyas Lebleu, while emphasizing the downside of the medal. “With ChatGPT, we can generate ten articles in ten minutes, but in these articles, there will probably be dozens, hundreds of errors, approximations, false citations that will have to be cleaned up.”

A community divided on the use of AGI

On Wikipedia, the debate rages around artificial intelligence. The online encyclopedia community is divided over the use of text created by robots. Three positions compete. On the one hand, purists are calling for an outright ban. On the other hand, moderates simply suggest reporting AI-generated content. Between the two, some contributors doubt whether we can really control these artificial texts.

Artificial intelligence only amplifies a pre-existing problem: the massive and uncontrolled circulation of unverified information on the Internet

Thomas Huchon, journalist specializing in the study of disinformation

While waiting to reach an agreement, Wikipedia rejects the vast majority of texts created by AI. The reason is simple: these contents do not allow their sources to be verified, a golden rule of the encyclopedia.

The crucial issue of verifying sources

This phenomenon reveals a broader problem. The lack of effective regulation of online information. “Artificial intelligence only amplifies a pre-existing problem: the massive and uncontrolled circulation of unverified information on the Internet,” underlines Thomas Huchon, journalist specializing in the study of disinformation.

While awaiting regulation of generative AI, experts recommend that readers be more vigilant. This includes systematically checking the sources cited at the bottom of the page. A large number of verified sources generally indicates more reliable information.

Pascal Wassmer