Radiology lung cancer risk model using AI

Machine Learning


Radiology-based lung cancer screening may be improved by machine learning-driven lung cancer risk prediction models that exhibit better discriminatory performance than traditional statistical approaches, according to data from a large-scale perspective cohort in China.

Rationale for risk prediction with a focus on radiology

As lung cancer screening relies heavily on radiological imaging, particularly low-dose CT, accurate pre-screening risk stratification is essential to optimize radiology resources and identify individuals most likely to benefit. Risk prediction models applied before imaging can help refine eligibility criteria and improve screening efficiency. Nevertheless, there are limited studies investigating machine learning-based risk models in China, and population-specific risk factors may influence radiology test results. This study evaluated whether machine learning algorithms can enhance lung cancer risk prediction compared to traditional logistic regression.

Research design and model building

Researchers analyzed data from 11,708 participants enrolled in a prospective cohort within the Guangzhou Lung Care Project program. Using stratified random sampling, the dataset was divided into a training set containing 70% of participants and a validation set containing 30%. Key predictor variables were selected using minimum absolute shrinkage and selection operator regression. Next, two lung cancer risk prediction models were developed on the training set. One used logistic regression and the other used an extreme gradient boosting algorithm known as XGBoost. Model performance was evaluated on the validation set using the area under the curve as a measure of discriminatory ability.

Achievements regarding radiological examinations

In the validation set, the lung cancer risk prediction model based on logistic regression achieved an area under the curve of 0.647 (95% CI: 0.574 to 0.720). The XGBoost model showed slightly improved discrimination with an area under the curve of 0.658 (95% CI: 0.589 to 0.727). Although the absolute differences were small, the machine learning model showed better robustness and predictive accuracy, suggesting potential value when integrated into radiology screening pathways to guide referrals for diagnostic imaging.

A further finding of clinical relevance is that exposure to cooking fuels during childhood was identified as an important risk factor for lung cancer. This variable has been poorly included in previous models and may be particularly relevant in populations where early childhood exposure to solid fuels is common, and may be an influence on long-term lung cancer risk assessed prior to radiological testing.

Impact on radiology practice

The findings indicate that a lung cancer risk prediction model based on the XGBoost algorithm may better support risk assessment at the screening stage than logistic regression alone. Incorporating such models into radiology screening programs may enhance selection of high-risk individuals, improve efficiency of imaging services, and support more targeted use of low-dose CT. Although further validation is required before routine clinical implementation, this study highlights the growing role of machine learning in radiology-driven cancer prevention strategies.

reference

Zhang T et al. Construction of a lung cancer screening risk prediction model based on machine learning algorithm. J Evid Based Med. 2026:e70104.



Source link