UNM researchers develop machine learning technology to reveal hidden things

Machine Learning


Deep within the labyrinth of electronic health records (EHRs), important information about a patient’s mental health resides silently, often hidden and difficult to access. A groundbreaking study conducted by the University of New Mexico School of Medicine has uncovered a significant and alarming gap in the way clinical records of histories of self-harm often escape traditional medical coding systems. Analyzing the electronic health records of more than 1.3 million veterans treated within the Veterans Health Administration (VHA), researchers found that diagnosis codes, long relied upon by clinicians and health care systems to identify and quantify health conditions, captured only a quarter of clinically documented cases of self-harm. This discrepancy reveals serious flaws in the way health systems measure and respond to mental health needs.

At the heart of this investigation is a disturbing realization. Reliance on diagnosis codes alone significantly underestimates the prevalence of self-harm, which is an essential risk factor for predicting future suicide and guiding treatment interventions. Dr. Christophe Lambert, the study’s principal investigator and an expert in translational informatics, emphasized that this “visibility gap” not only hinders research accuracy, but also clinical vigilance and resource allocation. While traditional coding is streamlined and streamlined, the trade-offs extracted from subtle, narrative notes within the EHR leave many patients’ critical medical history not immediately visible.

The study, published in the Journal of Medical Internet Research, used an advanced machine learning framework to penetrate this opacity. Unlike traditional approaches that require distinct case and control groups, the team introduced a technique known as Positive and Unlabeled Learning Selected Not At Random (PULSNAR). This technique excels in the chaotic realm of real-world data, where the absence of a diagnostic code does not guarantee that the condition itself does not exist. Instead, PULSNAR models the probability that a given patient has an extensive but uncoded history of self-harm, capturing subtle patterns from both coded records and unstructured clinical notes typical of physician documentation.

Self-harm is not just a tragic event. The undocumented presence of self-harm in the EHR poses a persistent risk of subsequent psychological crisis, further exacerbated by co-occurring medical conditions such as depression, post-traumatic stress disorder (PTSD), bipolar disorder, substance use disorder, and traumatic brain injury. These overlapping clinical situations require timely and complete visibility into a patient’s medical history to inform both customized treatment plans and system-wide mental health strategies. Unfortunately, even aggregations designed for clinical aggregation, such as problem lists, are subject to inconsistencies and incompleteness. The study found that only about 22.6% of veterans with a history of coded self-harm had this important information reflected in their problem list, further muddying the data from those on the medical front lines.

The implications of these gaps extend beyond individual clinical settings to the broader realm of health services research and policy development. Misclassification or underestimation of self-harm due to poor coding can distort epidemiological insights and the allocation of limited mental health resources. Given that some EHRs in this study contained more than 500,000 lines of clinical notes per patient, it is unrealistic to expect individual clinicians to sift through this vast repository during routine office visits. Although relying on codified data facilitates large-scale analysis, documented nuances risk excluding some important patients.

The innovative machine learning approach employed in this study exemplifies a pivotal advance in health informatics. PULSNAR’s ability to learn from the labeled presence of diagnosis codes and infer likely but uncoded cases acknowledges the selective, non-random nature of medical coding. This method provides probabilistic estimates that closely match expert chart review, suggesting it is a powerful tool for closing the knowledge gap in mental health documentation. The model identifies subtle indicators scattered throughout medical records that traditional coding often misses, such as risk factors, patterns of injury, and behaviors consistent with self-harm.

First author Praveen Kumar explained that these undocumented patterns often remain buried in clinicians’ notes, hidden from structured data fields scrutinized by algorithms and peer reviewers. This study succeeded in examining only patterns in which self-harm was documented in narrative form but not coded. However, broader challenges include uncovering instances where self-harm is indirectly inferred through associated symptoms and treatment patterns, a frontier that requires patient engagement and integration of data beyond the EHR.

This research represents a collaborative victory that brings together interdisciplinary expertise from medical informatics, psychiatry, computer science, economics, and statistics across multiple institutions, including Raymond G. Murphy Veterans Affairs Medical Center and Vanderbilt University. This integration facilitated the creation of a robust analytical framework designed to address real-world clinical data challenges. This paper highlights how accurately measuring mental health history can strengthen suicide prevention efforts, enhance clinical decision-making, and enrich the scientific basis for public health interventions.

This study aligns with a larger research initiative aimed at uncovering undocumented conditions within medical records using positive, unlabeled learning methodologies. Previously, the team applied similar techniques to identify uncoded opioid use disorder, and ongoing projects extend this paradigm to other elusive conditions such as PTSD, depression, bipolar disorder, and sleep disorders. These efforts collectively aim to uncover “hidden diseases” that are often missed by traditional health data infrastructures.

Although the PULSNAR approach is not yet intended for front-line clinical deployment due to validation requirements and ethical considerations, it is clear that it has the potential to complement existing suicide and overdose reporting tools. By providing a scalable, data-driven lens that supplements the known limitations of standardized coding systems, healthcare organizations can more reliably identify patients with a documented but unclear history of self-harm. This could streamline the deployment of targeted interventions and resources.

At a time when the mental health crisis is escalating and health systems grapple with an increasingly complex data ecosystem, this study highlights the need to leverage innovative computational techniques to uncover critical insights hidden in plain sight. The strategic integration of machine learning and clinical expertise exemplifies an important path forward in transforming overwhelming amounts of clinical data into actionable knowledge that improves patient safety and quality of care.

Ultimately, these findings challenge the status quo and prompt a paradigm shift in how medical frameworks capture and utilize mental health information. Dr. Lambert poignantly reflects on this systemic challenge, stating that the history of self-harm is “too important to be buried in a record that is impractical to review line by line during routine clinical practice.” The researchers’ work offers a ray of hope for a future in which technology enhances human judgment and allows clinicians and researchers to fully understand and address areas of mental health that have long been obscured by limited access to documents and data.

Research theme: people

Article title: Detecting uncoded self-harm from veterans’ electronic health records using positive, label-free learning: A retrospective cohort study.

News publication date: June 4, 2026

Web reference:

Journal of Medical Internet Research article
DOI: 10.2196/89071
PULSNAR method description

keyword: computer modeling, self-harm, electronic health records, machine learning, unlabeled positive learning, mental health documentation, veterans health care, health informatics

Tags: Limitations of Clinical Coding Electronic Health Record Analysis Improving the Accuracy of Mental Health Data Machine Learning in Healthcare Mental Health Documentation Challenges Natural Language Processing in Medicine Detecting Self-Harm in Veterans Suicide Risk Prediction Methods Translation Informatics in Healthcare Underreporting of Self-Harm Veterans Mental Health Research Veterans Health Administration Data



Source link