AI could help streamline breast imager workflows, but a study released on September 3rd shows that tweaks are needed to lower the recall rate. American Journal of Roentgenology.
AI systems reached near perfect negative predictors (NPV) in real-world population-based studies for both digital mammography and digital breast tomosynthesis (DBT), but almost doubled the recall rate compared to radiologists. He wrote a team led by Iris Chen at the University of California, Los Angeles.
“The findings support the possibilities of AI to help radiologists workflow efficiency,” the Chen team wrote. “However, there is a need for strategies to address frequently false positive results, especially in the intermediary risk category.”
Researchers continue to explore the possibilities of AI to make workloads more efficient for radiologists. This helps reduce the time radiologists spend interpreting large amounts of images and reduce the risk of burnout.
Previous studies suggest that defining AI-based intermediate risk categories as positive outcomes could lead to more cancers being discovered. However, this could lead to more false positive cases by recalling all cases in this category.
Chen and colleagues compared the diagnostic performance of commercially available AI systems (Transpara V1.7.1, Screenpoint Medical) with the diagnostic performance of radiologists. The system classified breast tests as positive at two different thresholds (elevated risk vs. mid-/risk). The team focused on NPV and recalled rates for its research.
Craniosphere disorder (left) and mid-lateral oblique (right) digital mammography images of the left breast were assessed by interpreting the radiologist as Bi-rads category 1, consistent with negative results. The AI system flags the asymmetry of the outer breast in the skull view (circle), and tests classified as intermediate risk are consistent with negative or positive results depending on the threshold used to classify AI outcomes. Patients were not diagnosed with breast cancer within 1 year after screening tests and were consistent with negative outcomes according to the reference criteria of the current study. Thus, radiologist interpretations were true negative and positive when defining both intermediate risk and high-risk categories by AI systems as positive. The annotations were not generated by the AI system, but were reproduced by the current author based on the AI output coordinates.arrrs.
The study also included interpretations of 11 breast radiologists with 1-40 years of post-training experience. The radiologist did not use AI for interpretation, and the researcher did not track the number of breast imaging tests for each radiologist interpreted.
The digital mammography cohort included 26,693 trials in 20,409 women with a mean age of 58 years. AI classified 58.2% of the trials as low risk, 27.7% as intermediate risk, and 14% as increased risk.
The DBT cohort included 4,824 trials in 4,379 women with a mean age of 61.3 years. AI classified 68.1% of the trials as low risk, 19.8% as intermediate risk, and 12.1% as increased risk.
In both groups, AI interpretations resulted in high NPVs for both increased risk and mid-/upward risk thresholds. However, it also led to a higher recall rate compared to radiologist recall rates, with mid- and elevated risk thresholds having the highest recall rates.
| Various thresholds without AI and performance of AI systems in radiologists | ||||||
|---|---|---|---|---|---|---|
| Digital Mammography | DBT | |||||
| measurement | Radiologist | AI (increasing risk) | ai (midway/risking in risk) | Radiologist | AI (increasing risk) | ai (midway/risking in risk) |
| sensitivity | 88.6% | 74.4% | 94% | 83.8% | 78.4% | 89.2% |
| Specificity | 93.3% | 86.3% | 58.6% | 93.7% | 88.4% | 68.5% |
| Recall rate | 7.2% | 14% | 41.8% | 6.9% | 12.1% | 31.9% |
| NPV | 99.9% | 99.8% | 99.9% | 99.9% | 99.8% | 99.8% |
The high percentage of NPV and mammograms achieved by AI suggests that AI systems, classified as low risk, can further streamline the work for AI-supported radiologists, researchers emphasized. This could lead radiologists to focus their attention on complex cases rather than negative tests.
“This approach could significantly improve workflow efficiency, reduce interpretive fatigue, and better allocate medical resources,” the study author wrote.
They also presented two possible reasons why AI interpretations led to higher recall rates. These include AI systems that cannot incorporate data from previous imaging tests into their assessments, and AI systems that only process Tomosynthesis images and cannot access synthetic mammography images.
“Large-scale prospective studies are needed to understand the optimal approach to integrating radiologists and AI, particularly in the context of medium-risk outcomes, to reduce false-positive recalls while maintaining high cancer detection rates,” the authors concluded.
Read the complete research here.
