Eliminating errors in AI may be impossible – what that means for healthcare use

Applications of AI


Over the past decade, the success of AI has generated unbridled enthusiasm and bold claims, even as users frequently experience AI-induced errors. AI-powered digital assistants can misinterpret someone’s conversation in embarrassing ways, chatbots can hallucinate facts, and, as I experienced, AI-based navigation tools can even guide drivers through cornfields, all without recording any errors.

People tolerate these mistakes because technology makes certain tasks more efficient. But more and more people are championing the use of AI in fields where mistakes can be costly, such as healthcare. The use of AI has limited human supervision in some cases. For example, a bill to be introduced in the U.S. House of Representatives in early 2025 would allow AI systems to autonomously prescribe medications. Since then, health researchers and lawmakers have debated whether such a regimen is feasible or wise.

It remains to be seen how exactly such a prescription would work if this or a similar bill were passed. But it raises the stakes in how much error AI developers can tolerate in their tools and what the consequences will be if those tools lead to negative outcomes, even patient death.

As a researcher who studies complex systems, I investigate how the various components of a system interact to produce unpredictable outcomes. Part of my work focuses on exploring the limits of science, and more specifically AI.

Over the past 25 years, I have worked on projects such as adjusting traffic lights, improving bureaucracy, and detecting tax evasion. Although these systems are highly effective, they are by no means perfect.

Especially in the case of AI, errors can be an unavoidable consequence of the way the system operates. Research in my lab suggests that certain characteristics of the data used to train AI models play a role. This situation is unlikely to change, no matter how much time, effort, and money researchers invest in improving AI models.

No one, not even AI, is perfect

Alan Turing, considered the father of computer science, once said: “If machines are expected to be infallible, they can't be intelligent either.” This is because learning is an important part of intelligence, and people usually learn from their mistakes. In my research, I feel there is a tug-of-war between intelligence and infallibility at play.

In a study published in July 2025, my colleagues and I showed that it may be impossible to completely organize a given dataset into clear categories. In other words, a given dataset may produce minimal error simply due to the fact that elements in many categories overlap. On some datasets, which are the core of many AI systems, AI will never perform better than chance.

Portraits of seven dogs of different breeds.
Characteristics of different dog breeds can overlap, making it difficult for some AI models to distinguish between them.
MirasWonderland/iStock via Getty Images Plus

For example, a model trained on a dataset of millions of dogs that only records age, weight, and height could probably distinguish between a Chihuahua and a Great Dane with perfect accuracy. However, different individuals of different species can fall within the same age, weight, and height range, so it is possible to make mistakes when distinguishing between an Alaskan Malamute and a Doberman Pinscher.

This classification is called classifiability, and my students and I began researching it in 2021. Using data from more than 500,000 students who attended the National Autonomous University of Mexico between 2008 and 2020, we wanted to solve a seemingly simple problem. Can AI algorithms be used to predict which students will complete their college degree on time – within three, four, or five years of starting their studies, depending on their major?

We tested several common algorithms used for AI classification and also developed our own. There was no perfect algorithm. The best ones (even those developed specifically for this task) achieved an accuracy rate of about 80%. This means that at least one in five students was misclassified. We noticed that although many students were identical in grade, age, gender, socio-economic status, and other characteristics, some finished on time while others did not. No algorithm can make perfect predictions in such situations.

You might think that more data would increase predictability, but it usually comes with diminishing returns. This means, for example, that each 1% increase in accuracy can require 100 times more data. Therefore, there will never be enough students to significantly improve the model's performance.

Additionally, many unexpected changes in the lives of students and their families can occur after their first year of college, such as job loss, death, and pregnancy, which can affect whether they finish on time. Therefore, even if there are an infinite number of students, there will be errors in the predictions.

Limits of prediction

More generally, it is complexity that limits predictions. The word complex comes from the Latin plexuswhich means to be intertwined. The components that make up a complex system are intertwined, and the interactions between them determine what happens to them and how they behave.

Therefore, studying the elements of a system in isolation can yield misleading insights not only about those elements but also about the system as a whole.

For example, consider a car driving around town. If you know how fast your car is going, you can theoretically predict where it will end up at a particular time. However, in real traffic situations, that speed is determined by interactions with other vehicles on the road. The details of these interactions are revealed in the moment and cannot be known in advance, so you can only accurately predict what will happen to your car after a few minutes.

AI is already playing a huge role in healthcare.

There is no problem with my health condition

These same principles apply to prescribing medications. Different conditions or diseases can have the same symptoms, and people with the same condition or disease can have different symptoms. For example, a fever can be caused by a respiratory or digestive illness. Also, a cold may cause a cough, but it doesn't necessarily mean a cough.

This means there is significant overlap in medical datasets, hampering AI's ability to prevent errors.

It's true that humans make mistakes. But if the AI ​​misdiagnoses a patient, as it undoubtedly will, the situation will reach a legal impasse. It is not clear who or what is responsible if a patient is injured. Pharmaceutical company? Software developer? Insurance agent? pharmacy?

In many situations, neither humans nor machines are the best choice for a particular task. “Centaurs,” or “hybrid intelligence,” or combinations of humans and machines, tend to be better than either working alone. Doctors can certainly use AI to decide which drugs may be used for different patients depending on their medical history, physiological details, and genetic makeup. Researchers are already studying this approach in precision medicine.

But common sense and the precautionary principle
This suggests that it is premature for AI to prescribe drugs without human oversight. And the fact that mistakes can be built into the technology could mean that human oversight is required whenever human health is at stake.



Source link