As part of a national trend that emerged during the pandemic, many NYU Langone Health patients began using electronic health record (EHR) tools to ask their doctors questions, refill prescriptions, check test results, etc. Many of these digital inquiries came through a communications tool called In Basket, which is built into NYU Langone's EHR system, EPIC.
Physicians have always spent time managing EHR messages, but in recent years, the number of messages they receive each day has increased by more than 30% annually, according to an article by Paul A. Testa, MD, Chief Medical Information Officer at New York University Langone. Dr. Testa writes that it's not uncommon for physicians to receive more than 150 In Basket messages in a single day. Because the health system isn't designed to handle this kind of traffic, physicians end up spending long hours after work sifting through messages to fill the gaps. This burden is cited as the reason half of physicians report feeling burned out.
Now, a new study led by researchers at NYU Grossman School of Medicine shows that AI tools can draft answers to patient EHR queries just as accurately as human medical professionals, and with more “empathy.” The findings highlight that these tools have the potential to significantly reduce physicians' in-basket burden and improve communication with patients, as long as human medical providers review the AI drafts before sending them.
NYU Langone has been testing the capabilities of generative artificial intelligence (genAI), where a computer algorithm generates possible choices for the next word in any sentence based on how people use the word in context on the internet. As a result of this next-word prediction, a genAI chatbot can respond to questions in persuasive, human-like language. In 2023, NYU Langone licensed a “private instance” of GPT-4, a modern analogue of the well-known chatGPT chatbot, which allows doctors to experiment with real patient data while adhering to data privacy rules.
Published online July 16th JAMA Network OpenThe new study looked at draft responses GPT-4 generated to patients’ In Basket queries and asked primary care physicians to compare them with real human responses to those messages.
“Our findings suggest that chatbots can reduce provider workload by enabling efficient and empathetic responses to patient concerns. We found that an EHR-integrated AI chatbot using patient-specific data can produce messages of comparable quality to human providers.”
William Small, MD Lead study author, Clinical Assistant Professor, Department of Medicine, New York University Grossman School of Medicine
In the study, 16 family physicians rated 344 randomly assigned pairs of AI and human responses to patient messages on accuracy, relevance, completeness, and tone, and indicated whether they would use the AI response as a first draft or have to start over crafting the patient message. This was a blinded study, so physicians did not know whether the response they were reviewing was generated by a human or an AI tool.
The research team found that there was no statistical difference between the accuracy, completeness, or relevance of the generative AI and human donors' responses. The generative AI responses were 9.5% better than human donors in terms of understandability and tone. Additionally, the AI responses were more than twice as likely (125% more likely) to be perceived as empathetic and 62% more likely to use language that conveyed positivity (which may be related to hope) and closeness (“we're in the same situation”).
Meanwhile, the AI's responses were 38 percent longer and 31 percent more likely to use complex language, suggesting the tool needs further training. According to a standard measure of readability called the Flesch-Kincaid score, humans responded to patients' questions at a sixth-grade level, while the AI's writing was at an eighth-grade level.
The researchers argued that the chatbot's use of patients' personal information, rather than general internet information, is closer to how the technology would be used in the real world. Future studies will be needed to see whether private data specifically improved the performance of the AI tool.
“This study demonstrates that AI tools can draft high-quality responses to patient requests,” said corresponding author Devin Mann, M.D., senior director of informatics innovation at NYU Langone's Medical Center for Information Technology (MCIT). “If approved by the physician, in the near future GenAI's message quality will be comparable to human-generated responses in terms of quality, communication style, and ease of use,” added Dr. Mann, who is also a professor in the Department of Population Health and the Department of Medicine.
In addition to Dr. Small and Dr. Mann, other study authors from NYU Langone are Beatrix Brandfield Harvey, BS, Zoe Jonassen, PhD, Soumik Mandal, PhD, Elizabeth R. Stevens, MPH, PhD, Vincent J. Major, PhD, Erin Rostrario, Adam C. Cherency, PhD, Simon A. Jones, PhD, Indaron Afinanyanaphong, MD, PhD, and Steven B. Johnson, PhD. Additional authors are Oded Nov, MS, PhD, of NYU Tandon School of Engineering, and Bhatia Mishan Wiesenfeld, PhD, of NYU Stern School of Business.
This work was funded by National Science Foundation grants 1928614 and 2129076, and by Swiss National Science Foundation grants P500PS_202955 and P5R5PS_217714.