Evidence of an AI learning penalty

Are we missing the big story of what AI means for human capital? I posed this question last year in my closing remarks at the AI and Human Capital conference. Startling new evidence suggests we may be. A recent study in China found that when students started using generative AI on their own, their homework scores improved, but their exam scores plummeted.

Many stakeholders, including the World Bank, are carefully designing and evaluating AI tools to enhance education and health services. My team is funding a series of research pilots I’m part of the working group that put together the Generative AI Evaluation Playbook.

But while we are making careful interventions, generative AI is rapidly permeating society, and its effects remain largely unmeasured. Anyone with a smartphone is already using AI via search engines, and smartphone ownership is almost ubiquitous in most countries, even among teenagers. New research suggests that the impact of this unrestricted use of AI could undermine the benefits gained from controlled applications. Daron Acemoglu pointed this study out to me during a recent visit to the World Bank and quoted it in a New York Times roundtable.

The study follows students in grades 7 to 12 in a county in China for two and a half years starting in 2022. We compare changes in outcomes for students who begin using AI tools on their own with those who do not, exploiting variation in the timing of adoption to identify causal relationships.

They found three notable effects of students’ use of AI.

30% less time spent on homework
Homework scores increased by 18%
Scores on the monthly exams plummeted by 20% after five months, and scores on the two high-stakes entrance exams also dropped by 18% and 24%.

The figure at the end of this blog shows these results. These findings pertain to average scores across all subjects, with social sciences having the greatest impact, followed by STEM, and then languages.

These findings suggest that students used AI as a crutch for homework, improving their homework scores but learning significantly less.

The study appears to support concerns that students using AI are bypassing tasks essential to learning. We know that learning, like physical training, requires effort. If you want to build muscle, you don’t need to bring a forklift to the gym. And with freely available AI tools, anyone with an internet connection can get a metaphorical homework forklift.

One way to measure the size of an effect is to compare it to the overall variation in student performance. According to this criterion, the decrease in monthly exam scores after 5 months is 1.4 standard deviations (SD). This is an extremely large effect considering the level of educational research. As a point of comparison, a major study on AI-based tutoring in Nigeria found a learning improvement of 0.31 SD. This is less than a quarter of the negative effects estimated from the widespread use of AI in Chinese research.

One potential challenge to causal interpretation is that other shocks may have prompted students to use AI and reduce academic performance. For example, in some parts of China, many children no longer live with their parents who have immigrated for work. As the recent World Bank Human Capital Report highlights, the academic performance of these children has declined significantly. Could some students in this study have started using AI in response to their parents’ migration? If so, AI might be held responsible for the learning loss caused by parental absence.

The authors point out that such a shock is unlikely to be the whole story behind the study’s findings, as students without AI rarely experience a 20 percent drop in test scores. Nevertheless, they cannot rule out the possibility that shock played some (probably very small) role. As a result, the actual effects of AI may be slightly smaller than reported in the paper. I hope the authors will consider this issue further as this paper goes through peer review.

This careful and rigorous study could serve as a wake-up call to parents and educators about the potential risks of AI in learning. More studies like this are needed in other settings to understand how generalized this is.

The impact of this research extends beyond education. As AI becomes integrated into everyday life, people are increasingly using it for health information, skills development, career advice, and psychological support, often outside of structured applications that are most easily evaluated by researchers (because innovations cannot be randomized). AI’s greatest impact on human capital is likely to come not from the tools we intentionally deploy, but from the ways people choose to use the technology themselves. Understanding this broader story of AI “in the wild” should be a core priority for human capital research if we are to realize the potential of AI while protecting people from its risks.