Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).
Google Scholar
Health, C. for D. and R. Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. FDA (2023).
Zhang, Z. et al. Pathologist-level interpretable whole-slide cancer diagnosis with deep learning. Nat. Mach. Intell. 1, 236–245 (2019).
Google Scholar
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
Google Scholar
Pham, T.-C., Luong, C.-M., Hoang, V.-D. & Doucet, A. AI outperformed every dermatologist in dermoscopic melanoma diagnosis, using an optimized deep-CNN architecture with custom mini-batch logic and loss function. Sci. Rep. 11, 17485 (2021).
Google Scholar
Wu, E. et al. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat. Med. 27, 582–584 (2021).
Google Scholar
Nagendran, M. et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 368, m689 (2020).
Google Scholar
Finlayson, S. G. et al. Adversarial attacks on medical machine learning. Science 363, 1287–1289 (2019).
Google Scholar
Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLOS Med. 15, e1002683 (2018).
Google Scholar
Peggy, B. & Yuan, L. Using AI to help find answers to common skin conditions (Google). https://blog.google/technology/health/ai-dermatology-preview-io-2021/.
Zhang, J. M., Harman, M., Ma, L. & Liu, Y. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transact Softw Engg 48, 1–36 (2022).
Google Scholar
High Level Expert Group on AI. Assessment List for Trustworthy Artificial Intelligence (ALTAI) for self-assessment. https://digital-strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-assessment (2020).
Lekadir, K. et al. FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare. Preprint at https://doi.org/10.48550/arXiv.2309.12325 (2024).
DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021).
Google Scholar
Balendran, A., Benchoufi, M., Evgeniou, T. & Ravaud, P. Algorithmovigilance, lessons from pharmacovigilance. Npj Digit. Med. 7, 1–6 (2024).
Google Scholar
Arksey, H. & O’Malley, L. Scoping studies: towards a methodological framework. Int. J. Soc. Res. Methodol. 8, 19–32 (2005).
Google Scholar
Munn, Z. et al. Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med. Res. Methodol. 18, 143 (2018).
Google Scholar
Nyanchoka, L. et al. A scoping review describes methods used to identify, prioritize and display gaps in health research. J. Clin. Epidemiol. 109, 99–110 (2019).
Google Scholar
Kyung, S. et al. Improved performance and robustness of multi-task representation learning with consistency loss between pretexts for intracranial hemorrhage identification in head CT. Med. Image Anal. 81, 102489 (2022).
Google Scholar
Valliani, A. A. et al. Robust Prediction of Non-home Discharge After Thoracolumbar Spine Surgery With Ensemble Machine Learning and Validation on a Nationwide Cohort. World Neurosurg. 165, e83–e91 (2022).
Google Scholar
Huo, J., Wu, L. & Zang, Y. Development and Validation of a Robust Immune-Related Prognostic Signature for Gastric Cancer. J. Immunol. Res. 2021, 5554342 (2021).
Google Scholar
Zhang, W. et al. A Novel and Robust Prognostic Model for Hepatocellular Carcinoma Based on Enhancer RNAs-Regulated Genes. Front. Oncol. 12, 849242 (2022).
Google Scholar
Guan, Y. et al. Assessment of the timeliness and robustness for predicting adult sepsis. iScience 24, 102106 (2021).
Google Scholar
Khoshnevisan, F. & Chi, M. Unifying Domain Adaptation and Domain Generalization for Robust Prediction Across Minority Racial Groups. in Machine Learning and Knowledge Discovery in Databases. Research Track (eds. Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J. & Lozano, J. A.) 521–537 (Springer International Publishing, Cham, 2021). https://doi.org/10.1007/978-3-030-86486-6_32.
Lu, Y. et al. Robust Speech and Natural Language Processing Models for Depression Screening. in 2020 IEEE Signal Processing in Medicine and Biology Symposium (SPMB) 1–5 (2020). https://doi.org/10.1109/SPMB50085.2020.9353611.
Malafaia, M., Silva, F., Neves, I., Pereira, T. & Oliveira, H. P. Robustness Analysis of Deep Learning-Based Lung Cancer Classification Using Explainable Methods. IEEE Access 10, 112731–112741 (2022).
Google Scholar
O’Brien, M., Bukowski, J., Hager, G., Pezeshk, A. & Unberath, M. Evaluating neural network robustness for melanoma classification using mutual information. in Medical Imaging 2022: Image Processing vol. 12032 173–177 (SPIE, 2022).
Joel, M. Z. et al. Using Adversarial Images to Assess the Robustness of Deep Learning Models Trained on Diagnostic Images in Oncology. JCO Clin. Cancer Inform. 6, e2100170 (2022).
Google Scholar
Ma, L. & Liang, L. A regularization method to improve adversarial robustness of neural networks for ECG signal classification. Comput. Biol. Med. 144, 105345 (2022).
Google Scholar
Wang, K., Wang, G., Chen, N. & Chen, T. How Robust is Your Automatic Diagnosis Model? in 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 877–884 (2019). https://doi.org/10.1109/BIBM47256.2019.8983217.
Çallı, E. et al. Deep learning with robustness to missing data: A novel approach to the detection of COVID-19. PloS One 16, e0255301 (2021).
Google Scholar
Ramoni, M., Sebastiani, P. & Dybowski, R. Robust outcome prediction for intensive-care patients. Methods Inf. Med. 40, 39–45 (2001).
Google Scholar
Liang, P. P. et al. MULTIBENCH: Multiscale Benchmarks for Multimodal Representation Learning.
Potapenko, I. et al. Detection of oedema on optical coherence tomography images using deep learning model trained on noisy clinical data. Acta Ophthalmol. (Copenh.) 100, 103–110 (2022).
Google Scholar
Ju, L. et al. Improving Medical Images Classification With Label Noise Using Dual-Uncertainty Estimation. IEEE Trans. Med. Imaging 41, 1533–1546 (2022).
Google Scholar
Peng, T. et al. Noise Robust Learning with Hard Example Aware for Pathological Image classification. in 2020 IEEE 6th International Conference on Computer and Communications (ICCC) 1903–1907 (2020). https://doi.org/10.1109/ICCC51575.2020.9344937.
Hekler, A. et al. Effects of Label Noise on Deep Learning-Based Skin Cancer Classification. Front. Med. 7, 177 (2020).
Google Scholar
Oakden-Rayner, L. Exploring Large-scale Public Medical Image Datasets. Acad. Radiol. 27, (2019).
Kurian, N. C., Meshram, P. S., Patil, A., Patel, S. & Sethi, A. Sample Specific Generalized Cross Entropy for Robust Histology Image Classification. in 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI) 1934–1938 (2021). https://doi.org/10.1109/ISBI48211.2021.9434169.
Saab, K. et al. Reducing Reliance on Spurious Features in Medical Image Classification with Spatial Specificity. in Proceedings of the 7th Machine Learning for Healthcare Conference 760–784 (PMLR, 2022).
Wang, X. et al. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. in 3462–3471 (IEEE Computer Society, 2017). https://doi.org/10.1109/CVPR.2017.369.
Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer Analysis Project. Nat. Genet. 45, 1113–1120 (2013).
Google Scholar
Zhang, H. et al. Re-thinking and Re-labeling LIDC-IDRI for Robust Pulmonary Cancer Prediction. in Medical Image Learning with Limited and Noisy Data (eds. Zamzmi, G. et al.) 42–51 (Springer Nature Switzerland, Cham, 2022). https://doi.org/10.1007/978-3-031-16760-7_5.
Pan, S., Sheng, B., He, G., Li, H. & Xue, G. BAW: learning from class imbalance and noisy labels with batch adaptation weighted loss. Multimed. Tools Appl. 81, 13593–13610 (2022).
Google Scholar
Hajiabadi, H., Babaiyan, V., Zabihzadeh, D. & Hajiabadi, M. Combination of loss functions for robust breast cancer prediction. Comput. Electr. Eng. 84, 106624 (2020).
Google Scholar
Qayyum, A., Qadir, J., Bilal, M. & Al-Fuqaha, A. Secure and Robust Machine Learning for Healthcare: A Survey. IEEE Rev. Biomed. Eng. 14, 156–180 (2021).
Google Scholar
Freiesleben, T. & Grote, T. Beyond generalization: a theory of robustness in machine learning. Synthese 202, 109 (2023).
Google Scholar
Castro, D. C., Walker, I. & Glocker, B. Causality matters in medical imaging. Nat. Commun. 11, 3673 (2020).
Google Scholar
Peters, M. D. J. et al. Guidance for conducting systematic scoping reviews. JBI Evid. Implement. 13, 141–146 (2015).
Tricco, A. C. et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 169, 467–473 (2018).
Google Scholar
Balendran, A. Machine learning robustness concepts in healthcare: a scoping review protcol. https://osf.io/xrqpb/?view_only=945f3c9f8f7346869418ebf5f788ed3f.
Funk, M. J. et al. Doubly Robust Estimation of Causal Effects. Am. J. Epidemiol. 173, 761–767 (2011).
Google Scholar
Ishii, S. & Ljunggren, D. A Comparative Analysis of Robustness to Noise in Machine Learning Classifiers. (2021).
Arcaini, P., Bombarda, A., Bonfanti, S. & Gargantini, A. Dealing with Robustness of Convolutional Neural Networks for Image Classification. in 2020 IEEE International Conference On Artificial Intelligence Testing (AITest) 7–14 (IEEE, Oxford, UK, 2020). https://doi.org/10.1109/AITEST49225.2020.00009.
Ren, L.-R., Gao, Y.-L., Liu, J.-X., Zhu, R. & Kong, X.-Z. L2,1-Extreme Learning Machine: An Efficient Robust Classifier for Tumor Classification. Comput. Biol. Chem. 89, 107368 (2020).
Google Scholar
Abdelhack, M. et al. A Modulation Layer to Increase Neural Network Robustness Against Data Quality Issues.
Iori, M. et al. Mortality Prediction of COVID-19 Patients Using Radiomic and Neural Network Features Extracted from a Wide Chest X-ray Sample Size: A Robust Approach for Different Medical Imbalanced Scenarios. Appl. Sci. 12, 3903 (2022).
Google Scholar
Adnan, N., Najnin, T. & Ruan, J. A Robust Personalized Classification Method for Breast Cancer Metastasis Prediction. Cancers 14, 5327 (2022).
Google Scholar
Suter, Y. et al. Radiomics for glioblastoma survival analysis in pre-operative MRI: exploring feature robustness, class boundaries, and machine learning techniques. Cancer Imaging 20, 55 (2020).
Google Scholar
Cai, L. et al. Robust phase-based texture descriptor for classification of breast ultrasound images. Biomed. Eng. OnLine 14, 26 (2015).
Google Scholar
Park, Y. & Ho, J. C. Tackling Overfitting in Boosting for Noisy Healthcare Data. IEEE Trans. Knowl. Data Eng. 33, 2995–3006 (2021).
Google Scholar
Clancy, K. et al. Deep learning for identifying breast cancer malignancy and false recalls: a robustness study on training strategy. in Medical Imaging 2019: Computer-Aided Diagnosis vol. 10950 20–25 (SPIE, 2019).
Vargason, T. et al. Classification of autism spectrum disorder from blood metabolites: Robustness to the presence of co-occurring conditions. Res. Autism Spectr. Disord. 77, 101644 (2020).
Google Scholar
Moen, T., Ferrero, A. & McCollough, C. Robustness of Textural Features to Predict Stone Fragility Across Computed Tomography Acquisition and Reconstruction Parameters. Acad. Radiol. 26, 885–892 (2019).
Google Scholar
Massafra, R. et al. Robustness Evaluation of a Deep Learning Model on Sagittal and Axial Breast DCE-MRIs to Predict Pathological Complete Response to Neoadjuvant Chemotherapy. J. Pers. Med. 12, 953 (2022).
Google Scholar