Researchers at the University of Oxford have benchmarked an artificial intelligence (AI) tool that can automatically remove personal information from patients' electronic health records (EHRs), in a key step towards enabling large-scale confidential medical research.
As healthcare continues to become more digital, the wealth of information stored in millions of electronic health records (EHRs) provides a valuable resource. These regularly collected data facilitate advances in research, education, and quality improvement. but, interest increases When using EHRs to train deep learning models aimed at improving patient outcomes, questions have arisen as to whether current anonymization methods are robust enough to fully protect patient privacy.
Dr Rachel Kuo, NIHR Postdoctoral Fellow at the University of Oxford, said: “Patient confidentiality is essential to building public trust in medical research.” 'Manually editing personally identifiable information, such as patient names and locations, is time-consuming and expensive. Automated anonymization may alleviate this burden, but you must ensure that your software meets acceptable performance standards. ”
This study iscience The study was a collaboration between Dr Rachel Kuo and Professor Dominic Furniss from the Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, and Dr Andrew Soltan and Professor David Eyre from the Oncology Department. By the Oxford Institute for Population Health and Big Data. This study evaluated the ability of large-scale language models (LLMs) and specialized software tools to detect and remove patient names, dates, medical record numbers, and other identifiers from real-world records without changing clinical content.
The first step was to test the ability of humans to anonymize data. The team manually compiled 3,650 medical records, comparing and revising the data until they had a complete set that could be used as a benchmark. We then compared two task-specific anonymization software tools (Microsoft Azure and AnonCAT) with five general-purpose LLMs for redacting identifiable information (including GPT-4, GPT-3.5, Llama-3, Phi-3, and Gemma).
Microsoft's Azure anonymization service had the best overall performance, closely matching human reviewers. GPT-4 also showed strong performance, demonstrating that modern language models can accurately remove identifiers with minimal fine-tuning or task-specific training.
“One of our most promising findings is that complex AI models do not need to be retrained from scratch,” explained Dr Soltan, NIHR Academic Clinical Lecturer. Engineering researcher.
“We found it some model worked hard Ready to use and it others saw them Performance is pushed upwards and simple technique. For a generic model, it means displaying it just some examples Describe what a properly anonymized record looks like. Regarding dedicated software, 1 model trained to Understand the nuances of our hospital data, like the format of extension phone, rear Fine adjustment above just small sample. This is interesting because it suggests a realistic path for hospitals to take. These technologies No need to manually label thousands of patient notes.
But the study also revealed risks. Some models caused “hallucinations”, displaying text that was not present in the original recording, or in some cases introducing fabricated medical details. “While some large-scale language models perform well, others can produce erroneous or misleading text,” Dr. Soltan explained. “This behavior poses risks in clinical practice, so it is important to carefully validate it before implementation.”
The researchers concluded that automating anonymization can significantly reduce the time and cost required to prepare clinical data for research while maintaining patient privacy in accordance with data protection regulations. “This study shows that AI can be a powerful ally in protecting patient confidentiality,” Professor Eyre said. “However, human judgment and strong governance must remain at the heart of any system that handles patient data.”
This research was supported by the National Institute for Health and Care Excellence (NIHR), Microsoft Research UK, Cancer Research UK, EPSRC and the NIHR Oxford Biomedical Research Center.
