AI is very bad at history if this study is to be believed

Researchers have developed a new assessment system called Hist-LLM. It allows testing of the three main language models (LLM): GPT-4 (OpenAI), Llama (Meta) and Gemini (Google). The study evaluates the accuracy of responses based on the Seshat Global History Databank, a history database named after the Egyptian goddess of wisdom.

Advertising, your content continues below

AI shows its limits in the field of history

It was last month during the NeurIPS conference that the results were revealed and considered disappointing by researchers from the Complexity Science Hub (CSH), an Austrian research institute. The most efficient model, GPT-4 Turbo, only achieves 46% accuracy.

Maria del Rio-Chanona, co-author of the study and professor of computer science at University College London, said: “The main finding of this study is that LLMs, while impressive, still lack the depth of understanding needed for advanced history. They perform well for basic facts, but for more nuanced doctoral-level historical questions, They’re not up to it yet.”

At the microphone of TechCrunchresearchers shared examples of errors about the story. For example, GPT-4 Turbo incorrectly claims that scale armor was present during a specific period in ancient Egypt. However, it did not appear until 1,500 years later.

But why this difficulty for LLMs in answering historical questions when they excel in programming? According to Maria del Rio-Chanona, AI extrapolates from visible historical data and struggles to find less known historical elements.

Peter Turchin, who led the study and is a CSH faculty member, emphasizes that these results show that LLMs are not able, for the moment, to replace humans in certain areas.

However, researchers believe that LLMs have the potential to assist historians in the future. Their evaluation system needs to be improved with more data from underrepresented regions and more complex questions.

“Overall, while our results highlight areas where LLMs need to improve, they also highlight the potential of these models to aid historical research”concludes the study.

Advertising, your content continues below

Tech

AI shows its limits in the field of history

Related posts