Human study: AI coding assistance reduces developer skill proficiency by 17%

AI News


Anthropic recently published a randomized controlled trial showing that developers who used AI coding assistance scored 17% lower on comprehension tests than those who coded manually, and the productivity gains did not reach statistical significance. A survey of 52 junior engineers revealed a clear disparity. Developers who used AI for conceptual questions scored above 65%, while developers who outsourced code generation to AI scored below 40%.

A randomized controlled trial by Anthropic researchers investigated how an AI coding assistant impacts skill development when learning new tools. The 52, mostly junior engineers with at least one year of weekly Python experience, all learned Trio, an asynchronous programming library with which they were unfamiliar. Both the control and AI-assisted groups completed two coding tasks, followed by quizzes on debugging, code reading, and conceptual understanding.

The AI ​​group finished about 2 minutes earlier, but the difference was not statistically significant. Quiz scores tell a different story. The AI ​​group had an average score of 50%, while the manual coding group had an average score of 67%, with the biggest difference being for debugging questions.

This image shows a comparison of participants' coding speed and skill formation, illustrating the impact of different levels of AI assistance, such as high skill, hybrid, conceptual, low skill, progressive, and repeated conditions, with different completion times and quiz scores. AI-generated content may be inaccurate.

In a Hacker News thread, siliconc0w captured the central tension.

You’re trading off learning and performance for increased productivity that you don’t always get.

Another commenter, AstroBen, raised generational concerns.

Will we see a future where juniors are unable to acquire the skills and experience to do their jobs adequately and are completely dependent on AI?

The outcome was determined more by how developers interacted with the AI ​​than by whether they used it. Low-scoring patterns with an average of less than 40% include full AI delegation for code generation, gradual dependence where developers gradually delegate all work to AI, and iterative AI debugging where developers rely on AI to solve problems rather than clarify them. Patterns with high scores of 65% or higher on average shared commonalities in cognitive engagement, such as asking follow-up questions after code generation, combining code generation with explanations, and using AI only for conceptual questions while coding independently. Hacker News commentator AstroBen said:

AI is very useful as a tutor.

This pattern also applies to independent academic research. A 2024 peer-reviewed study by Jošt, Taneski, and Karakatič from the University of Maribor (Applied Sciences) conducted a 10-week experiment with 32 undergraduate students learning React and found nearly identical results. A significant negative correlation was found between the use of LLM in code generation and debugging and the final grade, while the use of LLM in explanation did not show a significant negative impact. The authors conclude that the use of this form of LLM “can potentially help, rather than hinder, student performance.”

Medium contributor Tom Smykowski argues that human studies studies measure learning new libraries specifically, not programming ability in general, and writes that this shows:

It’s not about how AI will impact programmers in general, but how the use of AI will impact learning new things.

Moderate contributor Guru Prasad framed the core tension as cognitive engagement versus cognitive offload, rather than AI versus AI.

This finding lines up with Anthropic’s previous observational study that showed that AI can reduce task completion time by 80% for tasks for which developers already have the appropriate skills. The researchers suggest that AI may accelerate the productivity of established skills while simultaneously hindering the acquisition of new skills, although they acknowledge that the study measures comprehension immediately after a task rather than tracking long-term skill development.

Anthropic recommends deploying AI tools with intentional design choices that support engineer learning, noting that increased productivity may come at the expense of the debugging and verification skills needed to monitor AI-generated code. Leading LLM providers such as Anthropic and OpenAI offer dedicated learning modes designed to prioritize understanding over delegation, such as Claude Code’s learn and explain mode and ChatGPT learning mode.





Source link