A scoping review of robustness concepts for machine learning in healthcare

Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).

Article
CAS
PubMed

Google Scholar

Health, C. for D. and R. Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. FDA (2023).

Zhang, Z. et al. Pathologist-level interpretable whole-slide cancer diagnosis with deep learning. Nat. Mach. Intell. 1, 236–245 (2019).

Article

Google Scholar

Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

Article
CAS
PubMed
PubMed Central

Google Scholar

Pham, T.-C., Luong, C.-M., Hoang, V.-D. & Doucet, A. AI outperformed every dermatologist in dermoscopic melanoma diagnosis, using an optimized deep-CNN architecture with custom mini-batch logic and loss function. Sci. Rep. 11, 17485 (2021).

Article
CAS
PubMed
PubMed Central

Google Scholar

Wu, E. et al. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat. Med. 27, 582–584 (2021).

Article
CAS
PubMed

Google Scholar

Nagendran, M. et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 368, m689 (2020).

Article
PubMed
PubMed Central

Google Scholar

Finlayson, S. G. et al. Adversarial attacks on medical machine learning. Science 363, 1287–1289 (2019).

Article
CAS
PubMed
PubMed Central

Google Scholar

Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLOS Med. 15, e1002683 (2018).

Article
PubMed
PubMed Central

Google Scholar

Peggy, B. & Yuan, L. Using AI to help find answers to common skin conditions (Google). https://blog.google/technology/health/ai-dermatology-preview-io-2021/.

Zhang, J. M., Harman, M., Ma, L. & Liu, Y. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transact Softw Engg 48, 1–36 (2022).

Article

Google Scholar

High Level Expert Group on AI. Assessment List for Trustworthy Artificial Intelligence (ALTAI) for self-assessment. https://digital-strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-assessment (2020).

Lekadir, K. et al. FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare. Preprint at https://doi.org/10.48550/arXiv.2309.12325 (2024).

DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021).

Article

Google Scholar

Balendran, A., Benchoufi, M., Evgeniou, T. & Ravaud, P. Algorithmovigilance, lessons from pharmacovigilance. Npj Digit. Med. 7, 1–6 (2024).

Article

Google Scholar

Arksey, H. & O’Malley, L. Scoping studies: towards a methodological framework. Int. J. Soc. Res. Methodol. 8, 19–32 (2005).

Article

Google Scholar

Munn, Z. et al. Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med. Res. Methodol. 18, 143 (2018).

Article
PubMed
PubMed Central

Google Scholar

Nyanchoka, L. et al. A scoping review describes methods used to identify, prioritize and display gaps in health research. J. Clin. Epidemiol. 109, 99–110 (2019).

Article
PubMed

Google Scholar

Kyung, S. et al. Improved performance and robustness of multi-task representation learning with consistency loss between pretexts for intracranial hemorrhage identification in head CT. Med. Image Anal. 81, 102489 (2022).

Article
PubMed

Google Scholar

Valliani, A. A. et al. Robust Prediction of Non-home Discharge After Thoracolumbar Spine Surgery With Ensemble Machine Learning and Validation on a Nationwide Cohort. World Neurosurg. 165, e83–e91 (2022).

Article
PubMed

Google Scholar

Huo, J., Wu, L. & Zang, Y. Development and Validation of a Robust Immune-Related Prognostic Signature for Gastric Cancer. J. Immunol. Res. 2021, 5554342 (2021).

Article
PubMed
PubMed Central

Google Scholar

Zhang, W. et al. A Novel and Robust Prognostic Model for Hepatocellular Carcinoma Based on Enhancer RNAs-Regulated Genes. Front. Oncol. 12, 849242 (2022).

Article
CAS
PubMed
PubMed Central

Google Scholar

Guan, Y. et al. Assessment of the timeliness and robustness for predicting adult sepsis. iScience 24, 102106 (2021).

Article
CAS
PubMed
PubMed Central

Google Scholar

Khoshnevisan, F. & Chi, M. Unifying Domain Adaptation and Domain Generalization for Robust Prediction Across Minority Racial Groups. in Machine Learning and Knowledge Discovery in Databases. Research Track (eds. Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J. & Lozano, J. A.) 521–537 (Springer International Publishing, Cham, 2021). https://doi.org/10.1007/978-3-030-86486-6_32.

Lu, Y. et al. Robust Speech and Natural Language Processing Models for Depression Screening. in 2020 IEEE Signal Processing in Medicine and Biology Symposium (SPMB) 1–5 (2020). https://doi.org/10.1109/SPMB50085.2020.9353611.

Malafaia, M., Silva, F., Neves, I., Pereira, T. & Oliveira, H. P. Robustness Analysis of Deep Learning-Based Lung Cancer Classification Using Explainable Methods. IEEE Access 10, 112731–112741 (2022).

Article

Google Scholar

O’Brien, M., Bukowski, J., Hager, G., Pezeshk, A. & Unberath, M. Evaluating neural network robustness for melanoma classification using mutual information. in Medical Imaging 2022: Image Processing vol. 12032 173–177 (SPIE, 2022).

Joel, M. Z. et al. Using Adversarial Images to Assess the Robustness of Deep Learning Models Trained on Diagnostic Images in Oncology. JCO Clin. Cancer Inform. 6, e2100170 (2022).

Article
PubMed
PubMed Central

Google Scholar

Ma, L. & Liang, L. A regularization method to improve adversarial robustness of neural networks for ECG signal classification. Comput. Biol. Med. 144, 105345 (2022).

Article
PubMed

Google Scholar

Wang, K., Wang, G., Chen, N. & Chen, T. How Robust is Your Automatic Diagnosis Model? in 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 877–884 (2019). https://doi.org/10.1109/BIBM47256.2019.8983217.

Çallı, E. et al. Deep learning with robustness to missing data: A novel approach to the detection of COVID-19. PloS One 16, e0255301 (2021).

Article
PubMed
PubMed Central

Google Scholar

Ramoni, M., Sebastiani, P. & Dybowski, R. Robust outcome prediction for intensive-care patients. Methods Inf. Med. 40, 39–45 (2001).

Article
CAS
PubMed

Google Scholar

Liang, P. P. et al. MULTIBENCH: Multiscale Benchmarks for Multimodal Representation Learning.

Potapenko, I. et al. Detection of oedema on optical coherence tomography images using deep learning model trained on noisy clinical data. Acta Ophthalmol. (Copenh.) 100, 103–110 (2022).

Article

Google Scholar

Ju, L. et al. Improving Medical Images Classification With Label Noise Using Dual-Uncertainty Estimation. IEEE Trans. Med. Imaging 41, 1533–1546 (2022).

Article
PubMed

Google Scholar

Peng, T. et al. Noise Robust Learning with Hard Example Aware for Pathological Image classification. in 2020 IEEE 6th International Conference on Computer and Communications (ICCC) 1903–1907 (2020). https://doi.org/10.1109/ICCC51575.2020.9344937.

Hekler, A. et al. Effects of Label Noise on Deep Learning-Based Skin Cancer Classification. Front. Med. 7, 177 (2020).

Article

Google Scholar

Oakden-Rayner, L. Exploring Large-scale Public Medical Image Datasets. Acad. Radiol. 27, (2019).

Kurian, N. C., Meshram, P. S., Patil, A., Patel, S. & Sethi, A. Sample Specific Generalized Cross Entropy for Robust Histology Image Classification. in 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI) 1934–1938 (2021). https://doi.org/10.1109/ISBI48211.2021.9434169.

Saab, K. et al. Reducing Reliance on Spurious Features in Medical Image Classification with Spatial Specificity. in Proceedings of the 7th Machine Learning for Healthcare Conference 760–784 (PMLR, 2022).

Wang, X. et al. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. in 3462–3471 (IEEE Computer Society, 2017). https://doi.org/10.1109/CVPR.2017.369.

Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer Analysis Project. Nat. Genet. 45, 1113–1120 (2013).

Article
PubMed
PubMed Central

Google Scholar

Zhang, H. et al. Re-thinking and Re-labeling LIDC-IDRI for Robust Pulmonary Cancer Prediction. in Medical Image Learning with Limited and Noisy Data (eds. Zamzmi, G. et al.) 42–51 (Springer Nature Switzerland, Cham, 2022). https://doi.org/10.1007/978-3-031-16760-7_5.

Pan, S., Sheng, B., He, G., Li, H. & Xue, G. BAW: learning from class imbalance and noisy labels with batch adaptation weighted loss. Multimed. Tools Appl. 81, 13593–13610 (2022).

Article

Google Scholar

Hajiabadi, H., Babaiyan, V., Zabihzadeh, D. & Hajiabadi, M. Combination of loss functions for robust breast cancer prediction. Comput. Electr. Eng. 84, 106624 (2020).

Article

Google Scholar

Qayyum, A., Qadir, J., Bilal, M. & Al-Fuqaha, A. Secure and Robust Machine Learning for Healthcare: A Survey. IEEE Rev. Biomed. Eng. 14, 156–180 (2021).

Article
PubMed

Google Scholar

Freiesleben, T. & Grote, T. Beyond generalization: a theory of robustness in machine learning. Synthese 202, 109 (2023).

Article

Google Scholar

Castro, D. C., Walker, I. & Glocker, B. Causality matters in medical imaging. Nat. Commun. 11, 3673 (2020).

Article
CAS
PubMed
PubMed Central

Google Scholar

Peters, M. D. J. et al. Guidance for conducting systematic scoping reviews. JBI Evid. Implement. 13, 141–146 (2015).

Google Scholar

Tricco, A. C. et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 169, 467–473 (2018).

Article
PubMed

Google Scholar

Balendran, A. Machine learning robustness concepts in healthcare: a scoping review protcol. https://osf.io/xrqpb/?view_only=945f3c9f8f7346869418ebf5f788ed3f.

Funk, M. J. et al. Doubly Robust Estimation of Causal Effects. Am. J. Epidemiol. 173, 761–767 (2011).

Article
PubMed
PubMed Central

Google Scholar

Ishii, S. & Ljunggren, D. A Comparative Analysis of Robustness to Noise in Machine Learning Classifiers. (2021).

Arcaini, P., Bombarda, A., Bonfanti, S. & Gargantini, A. Dealing with Robustness of Convolutional Neural Networks for Image Classification. in 2020 IEEE International Conference On Artificial Intelligence Testing (AITest) 7–14 (IEEE, Oxford, UK, 2020). https://doi.org/10.1109/AITEST49225.2020.00009.

Ren, L.-R., Gao, Y.-L., Liu, J.-X., Zhu, R. & Kong, X.-Z. L2,1-Extreme Learning Machine: An Efficient Robust Classifier for Tumor Classification. Comput. Biol. Chem. 89, 107368 (2020).

Article
CAS
PubMed

Google Scholar

Abdelhack, M. et al. A Modulation Layer to Increase Neural Network Robustness Against Data Quality Issues.

Iori, M. et al. Mortality Prediction of COVID-19 Patients Using Radiomic and Neural Network Features Extracted from a Wide Chest X-ray Sample Size: A Robust Approach for Different Medical Imbalanced Scenarios. Appl. Sci. 12, 3903 (2022).

Article
CAS

Google Scholar

Adnan, N., Najnin, T. & Ruan, J. A Robust Personalized Classification Method for Breast Cancer Metastasis Prediction. Cancers 14, 5327 (2022).

Article
PubMed
PubMed Central

Google Scholar

Suter, Y. et al. Radiomics for glioblastoma survival analysis in pre-operative MRI: exploring feature robustness, class boundaries, and machine learning techniques. Cancer Imaging 20, 55 (2020).

Article
PubMed
PubMed Central

Google Scholar

Cai, L. et al. Robust phase-based texture descriptor for classification of breast ultrasound images. Biomed. Eng. OnLine 14, 26 (2015).

Article
PubMed
PubMed Central

Google Scholar

Park, Y. & Ho, J. C. Tackling Overfitting in Boosting for Noisy Healthcare Data. IEEE Trans. Knowl. Data Eng. 33, 2995–3006 (2021).

Article

Google Scholar

Clancy, K. et al. Deep learning for identifying breast cancer malignancy and false recalls: a robustness study on training strategy. in Medical Imaging 2019: Computer-Aided Diagnosis vol. 10950 20–25 (SPIE, 2019).

Vargason, T. et al. Classification of autism spectrum disorder from blood metabolites: Robustness to the presence of co-occurring conditions. Res. Autism Spectr. Disord. 77, 101644 (2020).

Article

Google Scholar

Moen, T., Ferrero, A. & McCollough, C. Robustness of Textural Features to Predict Stone Fragility Across Computed Tomography Acquisition and Reconstruction Parameters. Acad. Radiol. 26, 885–892 (2019).

Article
PubMed

Google Scholar

Massafra, R. et al. Robustness Evaluation of a Deep Learning Model on Sagittal and Axial Breast DCE-MRIs to Predict Pathological Complete Response to Neoadjuvant Chemotherapy. J. Pers. Med. 12, 953 (2022).

Article
PubMed
PubMed Central

Google Scholar

Source link