AI and library: the test of 6

1. Catalog (finally!) the collections…

In the heart of Brussels, the Royal Library of Belgium (KBR) preserves nearly eight million items on eight hectares. “Every floor has a smell,” hume Sophie Vandepontseele, director of contemporary collections. Each month, more than 3,000 paper books enter the legal deposit. The 17-story book tower looks like Ghostbusters is almost full. And at the end of the first quarter of the 21st^e century, a work does not really exist if it is not cataloged on the Internet. Gold “we discovered a few years ago that half of our collections were not identified online,” points out the librarian. In 2022, KBR therefore launched an application with Microsoft that photographs the first pages of each work, and – this is where artificial intelligence comes in – identifies the metadata at a glance. The document is now indexable. This year, KBR is extending this application to legal deposit. “The goal is to reduce the time between their arrival here and their cataloging to a few weeks. »

2. … then transcribe them on the Internet… and protect them from AI!

Another project for KBR: digitize, with Google, 100,000 documents by 2026. Today, only 10% of Belgian national collections are digitized. Enough to expand their audience. The Library and Archives Canada institution uses Transkribus software to recognize handwritten characters and adapt them into digital format so that they can be readable on the internet. “The pilot project targeted documents created by the government department, formerly known as Indian Affairs. Approximately five million pages of this collection are accessible on the Canadiana website, features director Leslie Weir. This project highlights our organization’s deep commitment to reconciliation with Indigenous peoples, as well as our important role as a steward and source of valuable records. » This guardian is all the more essential in the face of AI capable of generating new content that can distort reality. The library can identify these distortions by comparing them with the original documents it keeps. And that it takes care to save in several copies, in the event of change or loss of data.

3. Promote digitized content

Cataloging makes it possible to know the existence of a document, and its digitization allows access to it from a device connected to the Internet. But how to navigate this mass? The National Institute for Research in Digital Sciences and Technologies (Inria) and the National Audiovisual Institute (Ina) have developed the GallicaSnoop tool, named after the digital library of the Bibliothèque nationale de France and “snoop” in English: his lynx eyes spot the similarities between hundreds of thousands of images. Enough to allow researchers to effectively compare iconography over time, to locate the monkeys that populate the margins of medieval manuscripts, to identify vehicles invented to walk on water… The concept also exists for the ancient press (NewsEye project). Transformation of image pixels into digital text, identification of names of people and places, creation of key words and filters… Work coordinated by the University of La Rochelle, with the national libraries of France, Austria, Finland and other European universities.

4. Conserve physical heritage

AI intelligence relies on identifying similarities between thousands of pieces of data. Once she has identified major laws that seem to structure this system, such as grammatical rules, she can offer predictions. This is what the Dalgocol university project consisted of, using millions of BnF documents: predicting their state of degradation, by cross-checking their metadata. These indicate which medium it is, the different treatments undergone… This major upstream work saves time later: generating a calendar which indicates which document lying dormant in the reserves must be maintained as a priority.

5. Help create reliable content

The way in which conversational agents feed on information, then sort it and then generate texts, is very opaque. How can we trust it? By asking libraries, guarantors of reliable knowledge, to be the nurturers of AI. This is what the National Library of France and the INA are doing for a consortium of companies: Mistral AI is developing a large open source French language model based on their millions of data, Giskard is in charge of evaluating the reliability of the content and its security, and Artefact makes everything usable by companies. This is also in the pipeline of the Royal Library of Denmark, as Cecile Christensen, director of digital transformation, explains to us: the country’s Internet archives can result in a large Danish language model, which could serve as food to alternative conversational agents to the American ChatGPT, for example. The algorithm would be “an open source and transparent”, specifies this law graduate, who is in discussions with Sweden and Norway. “It will always be biased, because every choice involves a bias, but it will be our biases! » And to conclude: “This would allow our library to enter a new era and fully play a role in our democracy. »

“Let us no longer give in to the sirens of techno-solutionism”

“Let’s not fantasize about the impact and possibilities of AI in libraries for the production of bibliographic records or content recommendation. National libraries did not need to wait for the rise of AI to fulfill their missions. The possible saving of time for this type of task does not rebalance the carbon impact of these technologies and their costs in terms of development and maintenance. Today, there are less energy-consuming technologies to facilitate documentary research. An ergonomic interface and work on the visibility of libraries and their collections across the Web are more than enough to facilitate the search for documents for users. This is also a process in which libraries have been engaged for several years. Most generative AI services such as ChatGPT, Gemini and Midjourney are made available by web giants whose infrastructure relies on data centers that consume electricity and water to cool them. The alternative is to use AI that we install directly on our machines, like Jan.AI, without needing to communicate with servers hosted elsewhere. This also allows us to keep control of our data. Let us no longer give in to the sirens of techno-solutionism. Let’s make way for ethical, reasoned and civic digital technology. It is not the race to use the latest fashionable technology that will make us more attractive or modern to our users, but rather our ability to take a step back, support, make people think about the impact and consequences of digital technology in our society. »

Tags France La Rochelle