Machine learning in rare disease

Machine Learning


  • Schaefer, J., Lehne, M., Schepers, J., Prasser, F. & Thun, S. The use of machine learning in rare diseases: a scoping review. Orphanet J. Rare Dis. 15, 145 (2020).

    Article 

    Google Scholar 

  • Decherchi, S., Pedrini, E., Mordenti, M., Cavalli, A. & Sangiorgi, L. Opportunities and challenges for machine learning in rare diseases. Front. Med. 8, 747612 (2021).

    Article 

    Google Scholar 

  • Li, A. et al. Unsupervised analysis of transcriptomic profiles reveals six glioma subtypes. Cancer Res. 69, 2091–2099 (2009).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Senate and House of Representatives of the United States of America in Congress. Orphan Drug Act (1983).

  • Agarwal, V. et al. Learning statistical models of phenotypes using noisy labeled training data. J. Am. Med. Inform. Assoc. 23, 1166–1173 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Frénay, B. & Verleysen, M. Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25, 845–869 (2014).

    Article 
    PubMed 

    Google Scholar 

  • Toh, T. S., Dondelinger, F. & Wang, D. Looking beyond the hype: applied AI and machine learning in translational medicine. EBioMedicine 47, 607–615 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Clarke, R. et al. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat. Rev. Cancer 8, 37–49 (2008).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Altman, N. & Krzywinski, M. The curse(s) of dimensionality. Nat. Methods 15, 399–400 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

    Article 
    PubMed 

    Google Scholar 

  • Leek, J. T. svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Res. 42, e161 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kobak, D. & Berens, P. The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10, 5416 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Dorrity, M. W., Saunders, L. M., Queitsch, C., Fields, S. & Trapnell, C. Dimensionality reduction by UMAP to visualize physical and genetic interactions. Nat. Commun. 11, 1537 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chellappa, R. & Turaga, P. Feature selection. In Computer Vision: a Reference Guide 1–5 (Springer International, 2020).

  • Chen, C.-H., Härdle, W. & Unwin, A. Handbook of Data Visualization (Springer, 2008).

  • Jolliffe, I. T. & Cadima, J. Principal component analysis: a review and recent developments. Philos. Trans. A Math. Phys. Eng. Sci. 374, 20150202 (2016).

    PubMed 
    PubMed Central 

    Google Scholar 

  • McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at arXiv https://doi.org/10.48550/arXiv.1802.03426 (2018).

  • Nguyen, L. H. & Holmes, S. Ten quick tips for effective dimensionality reduction. PLoS Comput. Biol. 15, e1006907 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wattenberg, M., Viégas, F. & Johnson, I. How to use t-SNE effectively. Distill 1, https://doi.org/10.23915/distill.00002 (2016).

  • Way, G. P., Zietz, M., Rubinetti, V., Himmelstein, D. S. & Greene, C. S. Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations. Genome Biol. 21, 109 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • de Souto, M. C. P., Costa, I. G., de Araujo, D. S. A., Ludermir, T. B. & Schliep, A. Clustering cancer gene expression data: a comparative study. BMC Bioinformatics 9, 497 (2008).

    Google Scholar 

  • Kothari, S. et al. Removing batch effects from histopathological images for enhanced cancer diagnosis. IEEE J. Biomed. Health Inform. 18, 765–772 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Dwivedi, S. K., Tjärnberg, A., Tegnér, J. & Gustafsson, M. Deriving disease modules from the compressed transcriptional space embedded in a deep autoencoder. Nat. Commun. 11, 856 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Fertig, E. J., Ding, J., Favorov, A. V., Parmigiani, G. & Ochs, M. F. CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data. Bioinformatics 26, 2792–2793 (2010).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Quellec, G., Lamard, M., Conze, P.-H., Massin, P. & Cochener, B. Automatic detection of rare pathologies in fundus photographs using few-shot learning. Med. Image Anal. 61, 101660 (2020).

    Article 
    PubMed 

    Google Scholar 

  • Arvaniti, E. & Claassen, M. Sensitive detection of rare disease-associated cell subsets via representation learning. Nat. Commun. 8, 14825 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chaabane, I., Guermazi, R. & Hammami, M. Enhancing techniques for learning decision trees from imbalanced data. Adv. Data Anal. Classif. 14, 677–745 (2020).

    Google Scholar 

  • Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article 

    Google Scholar 

  • Köpcke, F. et al. Evaluating predictive modeling algorithms to assess patient eligibility for clinical trials from routine data. BMC Med. Inform. Decis. Mak. 13, 134 (2013).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Banerjee, J. et al. Integrative analysis identifies candidate tumor microenvironment and intracellular signaling pathways that define tumor heterogeneity in NF1. Genes 11, 226 (2020).

  • Colbaugh, R., Glass, K., Rudolf, C., & Tremblay, M. Learning to identify rare disease patients from electronic health records. AMIA Annu. Symp. Proc. 2018, 340–347 (2018).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Heiselet, B., Serre, T., Pontil, M. & Poggio, T. Component-based face detection. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition I (CPRV, 2001).

  • Kasinski, A. & Schmidt, A. The architecture of the face and eyes detection system based on cascade classifiers. In Computer Recognition Systems 2 (ed. Kurzynski, M. et al.) 124–131 (Springer, 2007).

  • Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. Preprint at arXiv https://doi.org/10.48550/arXiv.1301.3781 (2013).

  • Han, S., Williamson, B. D. & Fong, Y. Improving random forest predictions in small datasets from two-phase sampling designs. BMC Med. Inform. Decis. Mak. 21, 322 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ambert, K. H. & Cohen, A. M. A system for classifying disease comorbidity status from medical discharge summaries using automated hotspot and negated concept detection. J. Am. Med. Inform. Assoc. 16, 590–595 (2009).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).

    Article 

    Google Scholar 

  • More, A. Survey of resampling techniques for improving classification performance in unbalanced datasets. Preprint at arXiv https://doi.org/10.48550/arXiv.1608.06048 (2016).

  • Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT, 2016).

  • Futoma, J., Simons, M., Doshi-Velez, F. & Kamaleswaran, R. Generalization in clinical prediction models: the blessing and curse of measurement indicator variables. Crit. Care Explor. 3, e0453 (2021).

    Google Scholar 

  • Okser, S. et al. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet. 10, e1004754 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B Stat. Methodol. 67, 301–320 (2005).

    Article 

    Google Scholar 

  • Founta, K. et al. Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning. Mol. Med. 29, 12 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Torang, A., Gupta, P. & Klinke, D. J. 2nd An elastic-net logistic regression approach to generate classifiers and gene signatures for types of immune cells and T helper cell subsets. BMC Bioinformatics 20, 433 (2019).

    Google Scholar 

  • Dincer, A. B., Celik, S., Hiranuma, N. & Lee, S.-I. DeepProfile: deep learning of cancer molecular profiles for precision medicine. Preprint at bioRxiv https://doi.org/10.1101/278739 (2018).

  • Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at arXiv https://doi.org/10.48550/arXiv.1312.6114 (2013).

  • Sánchez Fernández, I. et al. Deep learning in rare disease. Detection of tubers in tuberous sclerosis complex. PLoS ONE 15, e0232376 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mungall, C. J. et al. The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res. 45, D712–D722 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Himmelstein, D. S. et al. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. eLife 6, e26726 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Callahan, T. J., Tripodi, I. J., Hunter, L. E. & Baumgartner, W. A. A framework for automated construction of heterogeneous large-scale biomedical knowledge graphs. Preprint at bioRxiv https://doi.org/10.1101/2020.04.30.071407 (2020).

  • Percha, B. & Altman, R. B. A global network of biomedical relationships derived from text. Bioinformatics 34, 2614–2624 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Orphanet https://www.orpha.net/consor/cgi-bin/index.php (2023).

  • Queralt-Rosinach, N. et al. Structured reviews for data and knowledge-driven research. Database 2020, baaa015 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Moon, C. et al. Learning drug–disease–target embedding (DDTE) from knowledge graphs to inform drug repurposing hypotheses. J. Biomed. Inform. 119, 103838 (2021).

    Article 
    PubMed 

    Google Scholar 

  • Li, X. et al. Improving rare disease classification using imperfect knowledge graph. BMC Med. Inform. Decis. Mak. 19, 238 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Sosa, D. N. et al. A literature-based knowledge graph embedding method for identifying drug repurposing opportunities in rare diseases. In Biocomputing 2020 463–474 (World Scientific, 2019).

  • Shen, F. et al. Rare disease knowledge enrichment through a data-driven approach. BMC Med. Inform. Decis. Mak. 19, 32 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rao, A. et al. Phenotype-driven gene prioritization for rare diseases using graph convolution on heterogeneous networks. BMC Med. Genomics 11, 57 (2018).

    Google Scholar 

  • Köhler, S. et al. The Human Phenotype Ontology in 2021. Nucleic Acids Res. 49, D1207–D1217 (2021).

    Article 
    PubMed 

    Google Scholar 

  • Rolland, T. et al. A proteome-scale map of the human interactome network. Cell 159, 1212–1226 (2014).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Martens, M. et al. WikiPathways: connecting communities. Nucleic Acids Res. 49, D613–D621 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010).

    Article 

    Google Scholar 

  • Lee, S.-I. et al. A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia. Nat. Commun. 9, 42 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mao, W., Zaslavsky, E., Hartmann, B. M., Sealfon, S. C. & Chikina, M. Pathway-level information extractor (PLIER) for gene expression data. Nat. Methods 16, 607–610 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Taroni, J. N. et al. MultiPLIER: a transfer learning framework for transcriptomics reveals systemic features of rare disease. Cell Syst. 8, 380–394 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Greene, D., NIHR BioResource, Richardson, S. & Turro, E. Phenotype similarity regression for identifying the genetic determinants of rare diseases. Am. J. Hum. Genet. 98, 490–499 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ionita-Laza, I., Capanu, M., De Rubeis, S., McCallum, K. & Buxbaum, J. D. Identification of rare causal variants in sequence-based studies: methods and applications to VPS13B, a gene involved in Cohen syndrome and autism. PLoS Genet. 10, e1004729 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Greene, D., NIHR BioResource, Richardson, S. & Turro, E. A fast association test for identifying pathogenic variants involved in rare diseases. Am. J. Hum. Genet. 101, 104–114 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Boycott, K. M., Vanstone, M. R., Bulman, D. E. & MacKenzie, A. E. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat. Rev. Genet. 14, 681–691 (2013).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Wright, C. F., FitzPatrick, D. R. & Firth, H. V. Paediatric genomics: diagnosing rare disease in children. Nat. Rev. Genet. 19, 253–268 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Adams, D. R. & Eng, C. M. Next-generation sequencing to diagnose suspected genetic disorders. N. Engl. J. Med. 379, 1353–1362 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Byrd, J. B., Greene, A. C., Prasad, D. V., Jiang, X. & Greene, C. S. Responsible, practical genomic data sharing that accelerates research. Nat. Rev. Genet. 21, 615–629 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rieke, N. et al. The future of digital health with federated learning. NPJ Digit. Med. 3, 119 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Yan, Y. et al. A continuously benchmarked and crowdsourced challenge for rapid development and evaluation of models to predict COVID-19 diagnosis and hospitalization. JAMA Netw. Open 4, e2124946 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2, 749–760 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhou, G., Zhang, J., Su, J., Shen, D. & Tan, C. Recognizing names in biomedical texts: a machine learning approach. Bioinformatics 20, 1178–1190 (2004).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Blitzer, J., McDonald, R. & Pereira, F. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (eds. Jurafsky, D. & Gaussier, E.) 120–128 (Association for Computational Linguistics, 2006).

  • Wang, C. & Mahadevan, S. Heterogeneous domain adaptation using manifold alignment. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence 2 (ed. Walsh, T.) 1541–1546 (AAAI, 2011).

  • Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Collado-Torres, L. et al. Reproducible RNA-seq analysis using recount2. Nat. Biotechnol. 35, 319–321 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kuhn, M. & Johnson, K. Applied Predictive Modeling (Springer, 2013).

  • Davis, J. & Goadrich, M. The relationship between precision–recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning (eds. Cohen, W. W. & Moore, A.) 233–240 (Association for Computing Machinery, 2006).

  • Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning (Springer, 2001).

  • Shin, H.-C. et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35, 1285–1298 (2016).

    Article 
    PubMed 

    Google Scholar 



  • Source link

    Leave a Reply

    Your email address will not be published. Required fields are marked *