According to the World Health Organization, HIV remains an unprecedented major global public health issue. The ART scheme to treat human HIV infection reduced HIV viral load in PLWH to undetectable levels and restored CD4+T cells count to normal levels, significantly reducing AIDS mortalitytwenty four. Over the past few years, PLWW survival rates have improved significantly, with a change in cause of death from AIDS-related to non-ADS-relatedtwenty five. Although current treatments have significantly increased the lifespan of PLWH, early prediction of disease progression, identification of high-risk factors, and early intervention measures to mitigate risk may further improve quality of life and survival time for PLWH. Therefore, survival analysis is particularly important for accurately identifying risk factors that affect survival.
Machine learning models have powerful feature extraction capabilities, excellent individual predictions, and find a wide range of uses in survival analysis for a variety of diseases. Unlike traditional survival analysis methods, survival prediction models developed using machine learning techniques achieve higher accuracy. Most traditional statistical methods have limited efficiency and struggle to capture complex nonlinear relationships. For example, Cox proportional hazards models, typically suitable for low-dimensional data, assume linear relationships between variables. In the medical field, data is often vast, high-dimensional, complex, and often requires high-performance computing to process large-scale data. Machine learning algorithms are excellent at processing high-dimensional data and capturing complex nonlinear relationships and interactions between variables. Meta-analysis shows that machine learning models improve performance when predicting PLWH survival.4.
We have successfully developed a highly accurate machine learning-based model to predict the long-term survival risk of PLWH. This study quantitatively compared traditional statistical analysis and machine learning methods for accuracy in predicting PLWH survival, based on a cohort with approximately 20 years of monitoring. Nomograms focus more on the interpretation of the overall model, but to better understand the model, multivariate COX proportional hazards analysis was performed to better understand the model. We tested four machine learning models: Xgboost, RF, SVM, and MLP by running machine learning code. After parameter tuning and 5x cross-validation, we found that the RF model was the most successful and the SVM model was followed. The XGBoost and MLP models worked well in the test cohort, but were not very ideal in the training cohort. Additionally, we trained DeepSurv, DeepHit, and Random Survival Forest models (Table S4). However, similar to Deep Learning-based MLP models, the DeepSurv model may have the risk of overfitting due to more complex models compared to the data (Figure S10). The deep hit model and the random survival forest model have average training results (Figures S11 and S12). The performance of the multivariate COX proportional hazards model was the performance of the XGBoost and MLP models.
Based on the results of model comparison, we have successfully developed an RF machine learning model to predict the survival rate of PLWH after diagnosis. Unlike traditional approaches that rely on single indicators and empirical judgments, this approach may provide medical professionals with dynamic and personalized risk assessment tools. Our model integrates multiple prognostic factors to generate continuous survival risk scores for PLWH, making them directly applicable in clinical settings. This model also demonstrates the ability to identify high-risk individuals that are frequently overlooked in traditional ways, and encourages early and aggressive interventions to prevent the risk of degradation. In practical clinical implementations, we believe that data-driven decision support systems will facilitate more accurate allocation of medical resources and enhance management efficiency and treatment outcomes for PLWH.
When selecting a machine learning model, you often need to consider both the interpretability of the model and the accuracy of the prediction.26. RF models have low interpretability, but are highly predictive.27. In this study, we use SHAP values to increase the interpretability of the RF model.26. Compared to nomograms, SHAP value analysis can also focus on explaining the outcomes of different prognosis PLWH. Diagnosis age, whether to receive art, recent CD4+T cell count has a significant effect on PLWH on both survival and death outcomes. Therefore, you can identify your age at the time of diagnosis, whether you will receive an ART, and your recent CD4.+ Counts as a core variable that affects long-term survival. These results may encourage clinicians to regularly and qualitatively monitor highly-weighted prognostic factors after a patient is diagnosed with the virus. Force plots can be used to enable dynamic risk stratification and can be used easily and effectively, thus facilitating more effective tracking, tracking and management of PLWH by clinicians. Furthermore, PLWH survival prediction models developed based on machine learning models can help promote access to care. In areas where expert experience is lacking, models can help primary care providers make better decisions.
Our study still has some limitations. First, for our study, we only selected PLWH from the Henan snake in China. Survival prediction models may indicate deviations in prediction results in other regions. The distribution of PLWH characteristics in snakes is similar to the overall situation in Henan Province, and although the survival of PLWH in Henan Province can represent to some extent the situation, a nationwide multicenter epidemiological survey of PLWH should be conducted to improve the generalizability of the model. Second, our study is a retrospective cohort study, and more prospective cohort studies are required to validate the model.
