Learning from an AI overview provides less knowledge than a web search

In a series of experiments, we found that individuals who learn about topics from summaries of large language models acquire shallower knowledge than those who learn through standard web searches. We felt that those who learned from large language models invested less in producing advice and produced advice that was sparser and less original than advice learned through web searches. This research PNAS Nexus.

Large-scale language models (LLMs) are artificial intelligence systems designed to interpret and generate human language by learning statistical patterns from large collections of text. These are typically based on deep learning architectures and can handle context and relationships between words across long sentences. The most popular large-scale language models today include those developed by OpenAI (the GPT series used by ChatGPT), Google (Gemini), Anthropic (Claude), and Meta (LLaMA).

The development of large-scale language models has progressed rapidly over the past decade due to advances in computing power, availability of large datasets, and improvements in training algorithms. While early models focused primarily on simple text prediction, modern models can perform complex inference, summarization, translation, and interaction. Training typically involves two main stages. One is extensive pre-training on general texts, and the other is fine-tuning with more specific tasks or human feedback.

These models are widely used in applications such as chatbots, virtual assistants, search engines, and automated customer support. In education and research, we assist with writing, coding, literature review, and data exploration. In business and industry, it is used for document analysis, marketing content generation, and decision support. Despite their usefulness, large-scale language models can produce errors, bias, or misleading information because they do not truly understand the world and rely on patterns learned from the material used for training.

Study authors Shiri Melumad and Jin Ho Yun point out that many people use summaries of the various materials produced by LLMs as learning tools. However, when learning from the LLM Overview, users no longer need to go through the effort of gathering and extracting various information sources on their own. The study authors hypothesized that the lower effort required to assemble knowledge from an LLM overview may limit the depth of knowledge that users gain compared to traditional web search learning, resulting in shallower knowledge. This reduced knowledge means less investment in providing advice based on that knowledge, making the advice diluted and less unique. Such advice would be considered less helpful and persuasive.

The study authors conducted a series of experiments to validate elements of the model. The first experiment involved 1,104 participants recruited through Prolific. They were asked to imagine that a friend was asking for advice on how to plant a vegetable garden. One group of participants had to learn about this through Google Search, and the other group learned from ChatGPT. Then they will give you advice.

The second experiment involved 1,979 participants recruited through Prolific. It was the same as the first experiment, but participants were restricted to entering only one query. This query did not perform a normal search or generate a response. Instead, all participants were given the same outcome, formulated as a series of linked websites or a summary of ChatGPT-style suggestions.

The third experiment was similar to Experiment 1, but participants in two groups used Google search or Google’s “AI Overview” (but not ChatGPT). They were supposed to give advice on living a healthier lifestyle. In this way, the platform was kept constant. Participants in the fourth experiment rated various features of the advice generated in the third study.

The results of these experiments showed that participants who used LLM summaries spent less time studying and reported learning less new things. They thought less and spent less time writing their advice. As a result, they felt less ownership of the advice they gave. Overall, this supports the idea that learning from an LLM overview results in shallower learning and less investment in knowledge acquisition and use.

Participants who learned from web searches and websites produced richer advice with more original content. Their advice texts were longer, less similar to each other, and more semantically unique.

“Because LLM summaries reduce the need to discover and synthesize information from original sources (an essential step in deep learning), the theory is that users may gain less knowledge than when learning from web links. When they then formulate advice on the topic, this appears as advice that is more sparse, less original, and less likely to be adopted by recipients. Results of seven experiments support these predictions, and LLM “We show that these differences occur even when summaries are enriched with real-time web links. Therefore, learning from LLM synthesis (rather than web links) may limit the development of deeper and more original knowledge,” the study authors concluded.

This study contributes to scientific understanding of how people learn using LLM. However, it should be noted that while the first experiment included a hypothetical scenario (advice to a friend), later experiments confirmed that the results held even when the topic was personally relevant to the participants.

Additionally, this experiment included paid participants. It is likely that these participants were primarily motivated by the prize for their participation, rather than by the quality of the advice they provided. Results may be different from studies examining real-world learning situations in which participants feel responsible for their learning outcomes and have a personal stake in the quality of the advice they produce.

The paper, “Experimental evidence of the effects of large-scale language models and web search on learning depth,” was authored by Shiri Melumad and Jin Ho Yun.

Source link