Are you sure you want to use a chatbot as a search engine?

After a prototype unveiled last July, OpenAI is officially launching a search engine within ChatGPT. The tool relies on a special version of GPT-4o and feeds on results from other search engines (presumably Bing), as well as content from information and media providers with which OpenAI has partnerships. Instead of a list of links, the tool answers queries in natural language and embeds snippets and sources that users can click to learn more. They can also refine their search by chatting with the tool.

Of course, ChatGPT users did not wait for this new tool to ask questions that they previously addressed to search engines, that is to say, to Google. Of course, OpenAI is not the first company to directly provide answers to queries (Google Quick Answers), nor to combine large language model and search engine (Bing/Copilot), nor to mention sources in answers ( Perplexity.ai). However, the launch of ChatGPT Search formalizes this use within the pioneering and most popular tool.

From intermediary to information source

Its launch therefore deserves attention to the challenges of this emerging use. In other words, what changes when we use a conversational interface mentioning sources as a search engine? First, the power given to the search engine is evolving. With their list of results, traditional search engines are authoritative on the references to consult: “here are the sites on which you will find what you are looking for”. With conversational interfaces, the search engine now has authority over the information itself: “here is the information you are looking for, here is the answer to your question”.

This is problematic when we know that large language models can invent information and that the conversational interface builds user confidence. “The fact that the information absorbed by the models also allows them to generate apparently relevant and coherent texts does not make them trustworthy sources of information – even if it seems like a conversation makes people more inclined to trust them,” explain researchers from the University of Washington in a scientific article on the issue (Situating Search).

These conversational search engines also have an impact on the diversity of information sources. With traditional search engines, links that did not appear on the first page of results already tended to be neglected by users. With ChatGPT Search or Perplexity, these lower-ranked sources disappear completely.

Delegation to algorithms

In their aforementioned article, the researchers also highlight the variety of users, uses and reasons for using a search engine: sometimes we know what we are looking for, sometimes we want to explore what is said or learn more on a subject, sometimes we want to select the sources in which we trust the most.

These uses struggle to be supported by conversational search engines. By synthesizing information, these new tools do much of the work for users. They no longer have to scan and select the results or reformulate their query. This delegation leads to a reduction in cognitive load but also an impoverishment of uses and tactics.

“We should seek to build tools that help users find and make sense of information rather than tools that claim to do everything for them,” the researchers conclude.

The most attractive answers are the least sourced

Like Perplexity.ai, Chat GPT Search will indicate the sources its answer is based on. For many users, these mentions and the possibility of verifying the information at the source constitute decisive arguments in favor of these solutions.

Except that this sourcing of information is not reliable. According to a comparative study (Evaluating Verifiability in Generative Search Engines) by researchers at Stanford University on various tools (Bing Chat, NeevaAI, Perplexity.ai, YouChat), only half of the statements in the responses are fully supported by the sources indicated (recall). And, in the other direction, one source in four does not completely support the statement associated with it (accuracy).

This lack of reliability is all the more worrying since simply indicating sources reinforces confidence – who actually takes the time to check the source of each statement? “We believe these results are unacceptable for systems that are quickly becoming a popular tool for answering queries and which already have millions of users, especially considering that the responses generated often appear informative and useful,” write the researchers.

Another result of their study is even more problematic: the perceived usefulness of the answers is inversely correlated with the accuracy of the sources mentioned. In other words, the less the statements are supported by the sources, the more users judge them to be fluent and useful. The researchers’ explanation: the most reliable tools tend to copy or paraphrase the statements appearing in the sources to the detriment of fluency and usefulness. Conversely, tools that deviate from the sources have more latitude to generate fluid responses that seem important and useful.

More generally, researchers note “that existing generative search engines struggle to process queries that cannot be answered extractively (e.g., aggregating information from multiple citations) and to appropriately weight citations that vary in relevance (content selection).”

The two research papers mentioned in the article:
Chirag Shah and Emily M. Bender. (2022). Situating Search.
Liu, N. F., Zhang, T., & Liang, P. (2023). Evaluating verifiability in generative search engines.

From intermediary to information source

Delegation to algorithms

The most attractive answers are the least sourced

Related posts