AI model NYUTron shows amazing accuracy in clinical tasks with handwritten doctor’s notes

AI News


In a recent article published in a magazine NatureResearchers use electronic health record (EHR) unstructured clinical notes to train NYUTron, a large-scale language model of medical language, and then perform five clinical and operational predictive tasks evaluated ability.

Research: Health system-wide language models are universal predictive engines. Image credit: Elnur / ShutterstockResearch: Health system-wide language models are universal predictive engines. Image credit: Elnur / Shutterstock

Background

Initially, all the information needed to make a medical decision is highly dispersed in the patient’s medical record, for example prescriptions, laboratory and imaging reports. Physicians summarize all relevant information from this pool of information into handwritten notes to document and summarize patient care.

Existing clinical predictive models rely on structured inputs derived from patient EHRs or clinician inputs, complicating data processing, model development, and deployment. As a result, most medical predictive models have been trained, validated and published, but have never been used in real clinical practice and are often considered a ‘last mile problem’.

Artificial intelligence (AI)-based large-scale language models (LLMs), on the other hand, rely on reading and interpreting human language. Researchers therefore theorized that LLMs could solve the last-mile problem by reading handwritten physician notes. In this way, these LLMs can facilitate medical decision-making in a wide range of clinical and operational settings.

About research

In this study, researchers leveraged recent advances in LLM-based systems to develop NYUTron and prospectively evaluated its effectiveness in performing the following five clinical and operational predictive tasks:

  1. 30-day all-cause readmission
  2. In-hospital mortality
  3. Prediction of comorbidity index
  4. Length of stay (LOS)
  5. Insurance denial prediction

In addition, the researchers performed a detailed analysis of readmission prediction, the likelihood that patients for any reason will seek readmission within 30 days of discharge. Specifically, he performed five additional evaluations in retrospective and prospective settings. For example, the team evaluated the scaling properties of his NYUTron and compared it to other models using several fine-tuned data points.

In a retrospective evaluation, they directly compared six physicians at different seniority levels to the NYUTron. In a prospective evaluation that will run from January 2022 to his April, the team tested his NYUTron in an accelerated format. They loaded it into an inference engine that interfaced with the EHR and read the discharge note duly signed by the treating physician.

a, We contacted the NYU Langone EHR for two datasets. The pre-training dataset, NYU Notes, contains 10 years of inpatient clinical notes (387,144 patients, 4.1 billion words).  There are 5 refinement datasets. Each contains 1-10 years of inpatient clinical notes (55,791-413,845 patients, 51-87 million words) with task-specific labels (2-4 classes).  b. Using the MLM task, we pre-trained a 109-million-parameter BERT-like LLM called NYUTron over the EHR to create a pre-trained model of the medical language contained within the EHR.  c. The pre-trained model was then fine-tuned on a specific task (eg, 30-day all-cause readmission prediction) and validated on retained retrospective data.  d. Finally, the fine-tuned model was compressed into a fast format and loaded into the inference engine. The inference engine works with the NYU Langone EHR to read discharge notes signed by the treating physician.

be, We contacted the NYU Langone EHR for two datasets. The pre-training dataset, NYU Notes, contains 10 years of inpatient clinical notes (387,144 patients, 4.1 billion words). There are 5 refinement datasets. Each contains 1-10 years of inpatient clinical notes (55,791-413,845 patients, 51-87 million words) with task-specific labels (2-4 classes). bA 109-million-parameter BERT-like LLM called NYUTron was pre-trained across the EHR using the MLM task to create a pre-trained model of the medical language contained within the EHR. cThe pre-trained model was then fine-tuned on a specific task (e.g. 30-day all-cause readmission prediction) and validated on retained retrospective data. dFinally, the fine-tuned model is compressed into a high-speed format and loaded into an inference engine that interfaces with the NYU Langone EHR to read discharge notes signed by the treating physician.

result

Compared to the previous conventional model, the NYUTron’s overall area under the curve (AUC) ranged from 78.7% to 94.9%, with an improvement of up to 14.7% in terms of AUC. Additionally, the authors demonstrated the benefits of pre-training his NYUTron using clinical texts. This allowed for greater versatility through fine-tuning and ultimately full deployment in single-arm prospective trials.

Readmission prediction is well studied in the published literature on medical informatics. In a retrospective evaluation, NYUTron outperformed physicians, with a median false positive rate (FPR) of 11.11% for both his NYUTron and physicians. However, the median true positive rate (TPR) was higher for NYUTron than physicians, 81.72% vs. 50%.

NYUTron predicted 2692 readmissions out of 3271 (82.30% recall) with an accuracy of 20.58% and an AUC of 78.7% in a prospective assessment. A panel of 6 physicians then randomly evaluated 100 readmission cases collected by NYUTron and found that some predictions by NYUTron were clinically relevant and preventable from readmission. I discovered that it could have been done.

Interestingly, 27 of the NYUTron-predicted readmissions were preventable, and patients predicted to be readmitted were six times more likely to die in hospital. In addition, 3 of the 27 preventable readmissions had enteritis. Enteritis is a bacterial infection that frequently occurs in hospitals. Clostridioides difficile. Of note, 1 in 11 infected people over the age of 65 will die.

The researchers used 24 NVIDIA A100 GPUs with 40 GB of VRAM for 3 weeks for NYUTron pre-training and 8 A100 GPUs for 6 hours per run for fine-tuning. Researchers generally do not have access to this amount of calculations. However, research data showed that a high-quality dataset for fine-tuning is more valuable than pre-training. Based on experimental results, the authors recommended using local finetuning when users have limited computational power.

Furthermore, in this study, the researchers used decoder-based architectures such as Bidirectional Encoder Representation with Transformers (BERT) to demonstrate the benefits of fine-tuning medical data and general text for LLM studies. emphasized the need for a domain shift from medical texts to medical texts.

Conclusion

In summary, the results of the current study suggested the feasibility of using LLM as a prediction engine for a range of medical (clinical and operational) prediction tasks. The authors also noted that physicians may over-rely on NYUTron’s predictions, potentially leading to fatal outcomes, which is a genuine ethical concern. Research findings therefore underscore the need to optimize human-AI interactions and assess sources of bias and unexpected failures.

In this regard, the researchers recommended different interventions depending on the patient’s risk predicted by NYUTron. For example, a follow-up call may suffice for a patient with a low risk of readmission within 30 days. However, early discharge is strictly a “no” for high-risk patients. More importantly, while operational forecasting can be fully automated, all patient-related interventions must be performed strictly under physician supervision. Nevertheless, LLM offers a unique opportunity to seamlessly integrate into healthcare workflows, even in large healthcare systems.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *