- The Sharp/van der Heijde (SvdH) method for analyzing X-ray images is the standard method for measuring joint space narrowing and bone erosion in rheumatoid arthritis and requires a well-trained reader.
- Recent advances in machine learning and artificial intelligence have the potential to automate SvdH scoring.
- In this initial study, a machine learning system called autoscoRA produced SvdH scores that showed good to excellent agreement with experienced human readers.
Researchers say a machine learning system for analyzing radiographs of rheumatoid arthritis (RA) patients was able to generate Sharpe/van der Heide (SvdH) scores, a standard method for quantifying joint space narrowing and bone erosion, with higher accuracy compared to human readers.
The system, called autoscoRA, matched human reading scores for joint space narrowing in more than 95% of images of hands and feet, said Thomas Deimel, MD, PhD, of the Medical University of Vienna in Austria, and colleagues.
Performance in scoring erosion was more variable, the group reported. arthritis and rheumatismbut the level of agreement was still considered good. Only 6.3% of hand images and 11.0% of foot X-ray images had a score difference of more than 1 point with the SvdH method.
Another finding in favor of autoscoRA came from a test in which images were scored by a first human reader, autoscoRA, and a second human reader. AutoscoRA matched the first reader in terms of total score with an intraclass correlation of 0.94, whereas the second human reader’s score matched the first reader with a correlation of 0.86. When scoring individual joints, autoscoRA readings for joint space narrowing differed by more than 1 point from those for the first reader in less than 3% of cases, whereas for the second reader these differences were present in approximately 10% of the images.
“For erosion scores, the automated system’s performance numerically closely matched that of a second human reader, but visual inspection showed that the former may make more consistent predictions,” Deimel et al. added.
One of the problems with standard SvdH scoring is that inter-reader (and even intra-reader) reliability is only so-so. Consistency is therefore a desirable goal for any method of radiographic analysis. First, errors are more likely to occur systematically, which makes them easier to recognize and correct than if they occur randomly.
Another reason to prefer automated systems is cost and efficiency. SvdH scoring requires considerable training and, as a result, experienced leaders are in short supply, especially outside of major referral centers. Reading itself takes time and requires specialized staff, which is expensive. “Automated systems like autoscoRA directly address the feasibility gap and provide a scalable and reproducible solution to convert images into reliable, structured outcome data,” Deimel et al. write.
AutoscoRA has been in development for some time. Deimel gave a preliminary presentation on this at the 2020 Rheumatology Society Meeting. This new study includes more images and additional analysis to better define the system’s potential.
The researchers used a large archive of hand and foot X-rays taken from 769 rheumatoid arthritis patients seen at the Medical University of Vienna. Patients had a total of 3,437 office visits and more than 12,000 x-rays. Approximately 60% of the images were used for training, 20% for validation, and 20% as a “test set.” Comparative tests with autoscoRA and human readers were performed on this latter set.
In addition to scoring individual radiographs, this study also examined serial images from 54 patients from a total of 237 visits. This allowed us to examine how autoscoRA can quantify disease progression. Agreement with human readers averaged 70% across different progression definitions (i.e., degree of erosion and joint space score change over time). “Overall performance appears to be relatively stable across the cutoff range,” the researchers wrote.
Deimel et al. emphasized that before autoscoRA is considered for routine clinical use, additional “external validation” with images from other centers is needed, and more focus is needed on the system’s ability to assess progression over time. But in the meantime, the researchers suggested it could have near-term applications in clinical trials and in the analysis of large image collections such as registries and other observational patient cohorts.
