AI has already brought positive results in the medical field and is expected to bring even more. But it’s important to proceed at a thoughtful and deliberate pace when implementing AI tools, especially in life-or-death medical settings, warns Shengpu Tang, assistant professor of computer science at Emory University.
Tang et al. found flaws in many peer-reviewed studies that used an AI technique known as reinforcement learning to theoretically guide sepsis treatment.
diary npj digital medicine published their findings.
Through simulation experiments, they demonstrated problems with techniques commonly used for data preprocessing and indexing related to sepsis treatment. This flawed technology can cause slight time skews that cause AI agents to stray from the arrow of time and use future events to predict the past.
If the model’s test data is similarly off, the problem remains hidden, the researchers warn.
This flaw hides behind “inflated” performance metrics that look great on paper but fail in practice. ”
Shengpu Tang, Assistant Professor of Computer Science, Emory University
Researchers showed that if these flawed sepsis treatment systems were introduced into healthcare settings, they would recommend over- or under-treatment of patients in nearly half of the states.
“We found that the majority of papers using reinforcement learning to analyze sepsis treatment over the past decade, including our own work, made this time-skew error,” said Tang, lead author of the paper.
Tang et al. have developed a simple workaround to circumvent this flaw. This has led to a fundamental change in the way problems are formulated in reinforcement learning for healthcare.
Their simulation experiments, based on real clinical data, showed that reinforcement learning algorithms to guide sepsis treatment do not increase or decrease patient mortality if deficiencies are not resolved.
However, simulations showed that eliminating the time-shifting defect reduced patient mortality by 8-10%.
“We hope this study serves as a wake-up call and a roadmap for building safer and more reliable reinforcement learning models for the clinical bedside,” Tan said.
Co-authors of the paper include Sonali Parbhu, assistant professor of electrical and electronic engineering at Imperial College, London. Jenna Wiens, professor of computer science and engineering at the University of Michigan. and Yao Jiayu, who worked on this paper as a postdoctoral researcher at Columbia University.
The high cost of sepsis
Sepsis is a serious medical condition caused when an infection causes a life-threatening chain reaction within the body. Hospitalized patients are often particularly vulnerable due to weakened immune systems. According to the Centers for Disease Control and Prevention, one in three adults who die in the hospital had sepsis during their hospitalization.
Some health systems are already using AI tools to help monitor patients’ risk of developing sepsis. The algorithms for these predictive tools are often developed using machine learning techniques known as supervised learning. A large dataset of vital signs and other statistics from patients who have or have not developed sepsis is input into the model during training. AI models can then be deployed in real-world situations to alert healthcare professionals about high-risk patients.
With the effectiveness of risk prediction tools, computer scientists want to take AI a step further to help guide treatment protocols. Prompt treatment after diagnosis is important to prevent tissue damage, organ failure, and death.
However, unlike risk assessment, predicting treatment protocols requires synchronizing a series of datasets in a dynamic environment. This includes different types of treatments such as intravenous fluids, antibiotics, blood pressure medications, and surgery. Dose or intensity of treatment. Treatment period. Patient vital signs before and after treatment. and survival/mortality rates.
different learning frameworks
Reinforcement learning is needed to deal with this dynamic environment and make a series of decisions that occur over time without a single predefined yes or no answer. For example, reinforcement learning is used to train AI algorithms to compete in turn-based games like chess. When the AI agent observes the board and selects an action or move, a competitor moves. The composition of the board keeps changing and the process keeps repeating in separate rounds.
In recent years, reinforcement learning algorithms have been applied to a range of decision-making tasks in healthcare. The algorithm analyzes past treatment sequences to identify patterns associated with favorable outcomes. Learned decision rules map these patterns to recommended treatments based on changes in the patient’s condition. At each discrete time step, the agent observes the patient’s physiological state and selects a treatment. Then the situation develops into a new state.
As a graduate student, Tan worked on a 2020 paper that used reinforcement learning to study best practices for sepsis treatment.
After completing this work, Tang began to suspect that the data preprocessing techniques commonly used in reinforcement learning might not provide the most accurate results in a medical setting. He and his colleagues began investigating the problem thoroughly and discovered flaws.
amazing insight
Unlike standard reinforcement learning benchmarks that operate on well-defined trajectories, healthcare applications often involve events sampled irregularly over time. For example, data entry for electronic medical records may or may not occur in real time.
The data of the patient’s condition and the actions taken for treatment are preprocessed for reinforcement learning by slicing them into windows of equal time length and indexing into discrete time units. These indices are arranged to form state-action pairs.
This problem occurs because the AI agent recognizes the patient’s condition as a summary of vital signs, which can only be calculated at the end of the time window. However, the action must be determined at the beginning of that window.
“The patient may have received a pill in the middle of the window, or started an IV much earlier in the window, but the AI agent assumes that the decision to administer these treatments was driven by a summary of the patient’s condition, which is only determined at the end of the window,” Tang explains.
Tang and colleagues began investigating other papers using reinforcement learning to train models for sepsis treatment and found that 80% of them used flawed methods.
They also identified a simple fix for this flaw. Shifting the action index backward by one time step yields the correct temporal alignment.
spread the word
It appears that the developers assumed that the data management techniques used to train supervised learning models could also be applied to reinforcement learning models.
“Many people don’t stop and think about how indexes work in different situations,” says Tan. “It’s important to think carefully about preprocessing and indexing data to avoid mistakes, rather than just working on ‘autopilot.'”
As a computer scientist dedicated to developing AI tools that effectively support the decision-making process of healthcare professionals, Tang advocates for a measured pace of adoption of these tools.
“I’m an old-school guy,” he says. “I think AI moves too fast in some cases and needs to be scrutinized more closely.”
Although the paper focuses on treating sepsis, Tan is concerned that this flawed technique could occur in various reinforcement learning models.
“People seem to be making the same mistakes over and over again,” Tan said. “We want to bring this issue to more AI researchers and developers — researchers and developers who are focused on healthcare and broader applications — and make sure they are aware of this issue.”
sauce:
Reference magazines:
Tan, S. others. (2026). Beating the beat: The impact of temporal misalignment in reinforcement learning for sepsis treatment. Npj digital medicine. DOI: 10.1038/s41746-026-02625-2. https://www.nature.com/articles/s41746-026-02625-2
