Towards smarter diagnosis of artificial joint infections

According to a systematic review published in , machine learning models generally showed high performance in diagnostic and predictive tasks related to prosthetic joint infections after total hip or total knee arthroplasty. Orthopedic Research Journal. However, most models lack external validation and are developed using retrospective, single-center data, raising questions about their applicability to the real world.

Prosthetic joint infection (PJI) occurs in up to 1.7% of patients within 2 years after arthroplasty and is associated with significant morbidity, decreased quality of life, prolonged hospital stay, and increased healthcare costs. The 5-year mortality rate for patients with PJI after total hip arthroplasty has been reported to be as high as 21%.

“Diagnosis of PJI remains a challenge due to the limitations of current diagnostic criteria,” the researchers wrote. “Machine learning offers a data-driven approach to improving diagnostic accuracy, potentially enabling earlier and more accurate identification and ensuring timely and appropriate treatment.”

How to conduct a review

The researchers searched PubMed and Embase for studies that applied machine learning to PJI-related clinical problems, including hips and knees. While many studies have focused on diagnosis, others have addressed related tasks such as early prediction of infection, recurrence, and surgical outcomes.

After screening 583 records, a total of 12 studies met inclusion criteria. Sample sizes ranged from 20 to 17,165 procedures, all using retrospective datasets. Only one study included external validation.

Model inputs were diverse and included patient demographics (11 studies), comorbidities (10), serological markers (7), synovial fluid analysis (4), microbiology (3), and imaging (3). A total of 23 different machine learning approaches were evaluated, including linear models, tree-based methods, support vector machines, k-nearest neighbors, naive Bayes, and deep learning models.

diagnostic performance

Model performance is most commonly evaluated using the area under the curve (AUC). The reported AUC values range from 0.68 to 0.993, ranging from acceptable to excellent performance.

Examples of high-performance approaches include:

Decision tree model for preoperative diagnosis (AUC up to 0.993)
Meta-learning model for revision surgery evaluation, AUC up to 0.988
Intraoperative prediction model at second stage revision (AUC max. 0.968)
Image-based model, knee AUC is 0.957, hip AUC is 0.906

In one study on intraoperative diagnosis, the model achieved 100% specificity and higher sensitivity than traditional standards.

However, the authors noted that in some cases, the models were trained and evaluated using the same consensus diagnostic criteria (e.g., MSIS and ICM), which could overestimate real-world performance.

Key limitations and next steps

Despite the promising results, the overall quality of the studies was moderate, and several limitations were consistent across the literature. Most models were developed using retrospective, single-center data with relatively short follow-up periods. External validation was rare and model interpretability was often limited.

The authors also highlighted potential methodological concerns, such as variability in input features and outcome definitions and the risk of circularity when models are trained and tested against the same diagnostic framework.

“Machine learning models show great promise,” the researchers wrote. “However, further research is required to ensure robustness and clinical applicability.”

They emphasized the importance of multicenter studies using standardized and diverse datasets, along with rigorous external validation and more transparent modeling approaches.

For full researcher disclosures, please visit onlinelibrary.wiley.com.

Source: Orthopedic Research Journal

Source link