Machine learning for microbiologists | Nature Reviews Microbiology

Machine Learning


  • Bishop, C. M. Pattern recognition and machine learning (Springer, 2006).

  • Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd edn (Springer Science & Business Media, 2009).

  • James, G., Witten, D., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning: with Applications in R (Springer Science & Business Media, 2013).

  • Murphy, K. P. Probabilistic Machine Learning: Advanced Topics (MIT Press, 2022).

  • Goodswen, S. J. et al. Machine learning and applications in microbiology. FEMS Microbiol. Rev. 45, fuab015 (2021).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Topçuoğlu, B. D., Lesniak, N. A., Ruffin, M. T., 4th, Wiens, J. & Schloss, P. D. A framework for effective application of machine learning to microbiome-based classification problems. mBio 11, e00434-20 (2020). This work focuses on applying machine learning to microbiome data for disease prediction, highlighting the important trade-off between model complexity and interpretability, and emphasizing the need for rigorous methodology towards more reproducible machine learning usage in microbiome research.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267 (2007).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Parks, D. H., MacDonald, N. J. & Beiko, R. G. Classifying short genomic fragments from novel lineages using composition and homology. BMC Bioinformatics 12, 328 (2011).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rosen, G. L., Reichenberger, E. R. & Rosenfeld, A. M. NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics 27, 127–129 (2011).

    CAS 
    PubMed 

    Google Scholar 

  • McHardy, A. C., Martín, H. G., Tsirigos, A., Hugenholtz, P. & Rigoutsos, I. Accurate phylogenetic classification of variable-length DNA fragments. Nat. Methods 4, 63–72 (2007).

    CAS 
    PubMed 

    Google Scholar 

  • Patil, K. R., Roune, L. & McHardy, A. C. The PhyloPythiaS web server for taxonomic assignment of metagenome sequences. PLoS ONE 7, e38581 (2012).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gregor, I., Dröge, J., Schirmer, M., Quince, C. & McHardy, A. C. PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes. PeerJ 4, e1603 (2016).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Vervier, K., Mahé, P., Tournoud, M., Veyrieras, J.-B. & Vert, J.-P. Large-scale machine learning for metagenomics sequence classification. Bioinformatics 32, 1023–1032 (2016). This work introduces a machine learning-based approach for tackling the taxonomic binning step, using a supervised approach that balances accuracy and speed and outperforms alignment-based methods.

    CAS 
    PubMed 

    Google Scholar 

  • Diaz, N. N., Krause, L., Goesmann, A., Niehaus, K. & Nattkemper, T. W. TACOA — taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics 10, 56 (2009).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Sczyrba, A. et al. Critical assessment of metagenome interpretation — a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Davis, J. J. et al. Antimicrobial resistance prediction in PATRIC and RAST. Sci. Rep. 6, 27930 (2016).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Arango-Argoty, G. et al. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome 6, 23 (2018).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Kavvas, E. S. et al. Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance. Nat. Commun. 9, 4306 (2018).

    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Moradigaravand, D. et al. Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data. PLoS Comput. Biol. 14, e1006258 (2018).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Rahman, S. F., Olm, M. R., Morowitz, M. J. & Banfield, J. F. Machine learning leveraging genomes from metagenomes identifies influential antibiotic resistance genes in the infant gut microbiome. mSystems 3, e00123–e00217 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).

    MathSciNet 

    Google Scholar 

  • Baldi, P. Deep Learning in biomedical data science. Annu. Rev. Biomed. Data Sci. 1, 181–205 (2018).

    Google Scholar 

  • Hannigan, G. D. et al. A deep learning genome-mining strategy for biosynthetic gene cluster prediction. Nucleic Acids Res. 47, e110 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Weimann, A. et al. From genomes to phenotypes: Traitar, the microbial trait analyzer. mSystems 1, e00101–e00116 (2016). This work uses machine learning to predict 67 microbial phenotypic traits from genome sequences, facilitating the analysis of large-scale microbial genomic data.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Thomas, A. M. et al. Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nat. Med. 25, 667–678 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wirbel, J. et al. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nat. Med. 25, 679–689 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Poore, G. D. et al. Microbiome analyses of blood and tissues suggest cancer diagnostic approach. Nature 579, 567–574 (2020).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pasolli, E., Truong, D. T., Malik, F., Waldron, L. & Segata, N. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS Comput. Biol. 12, e1004977 (2016).

    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).

    ADS 
    CAS 
    PubMed 

    Google Scholar 

  • Ghensi, P. et al. Strong oral plaque microbiome signatures for dental implant diseases identified by strain-resolution metagenomics. NPJ Biofilms Microbiomes 6, 47 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Salosensaari, A. et al. Taxonomic signatures of cause-specific mortality risk in human gut microbiome. Nat. Commun. 12, 2671 (2021).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kartal, E. et al. A faecal microbiota signature with high specificity for pancreatic cancer. Gut 71, 1359–1372 (2022).

    CAS 
    PubMed 

    Google Scholar 

  • Asnicar, F. et al. Microbiome connections with host metabolism and habitual diet from 1,098 deeply phenotyped individuals. Nat. Med. 21, 321–332 (2021).

    Google Scholar 

  • Lee, K. A. et al. Cross-cohort gut microbiome associations with immune checkpoint inhibitor response in advanced melanoma. Nat. Med. 28, 535–544 (2022).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • McCulloch, J. A. et al. Intestinal microbiota signatures of clinical response and immune-related adverse events in melanoma patients treated with anti-PD-1. Nat. Med. 28, 545–556 (2022).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Routy, B. et al. Gut microbiome influences efficacy of PD-1-based immunotherapy against epithelial tumors. Science 359, 91–97 (2018).

    ADS 
    CAS 
    PubMed 

    Google Scholar 

  • Gopalakrishnan, V. et al. Gut microbiome modulates response to anti–PD-1 immunotherapy in melanoma patients. Science 359, 97–103 (2018).

    ADS 
    CAS 
    PubMed 

    Google Scholar 

  • Derosa, L. et al. Intestinal Akkermansia muciniphila predicts overall survival in advanced non-small cell lung cancer patients treated with anti-PD-1 antibodies: results a phase II study. J. Clin. Orthod. 39, 9019–9019 (2021).

    Google Scholar 

  • Davar, D. et al. Fecal microbiota transplant overcomes resistance to anti-PD-1 therapy in melanoma patients. Science 371, 595–602 (2021).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Baruch, E. N. et al. Fecal microbiota transplant promotes response in immunotherapy-refractory melanoma patients. Science 371, 602–609 (2021).

    ADS 
    CAS 
    PubMed 

    Google Scholar 

  • Palma, S. I. C. J. et al. Machine learning for the meta-analyses of microbial pathogens’ volatile signatures. Sci. Rep. 8, 3360 (2018).

    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ianiro, G. et al. Variability of strain engraftment and predictability of microbiome composition after fecal microbiota transplantation across different diseases. Nat. Med. 28, 1913–1923 (2022). This study uses machine learning to develop predictive models for selecting optimal donors for faecal microbiota transplantation, making personalized microbiome-targeted treatments more effective.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Smillie, C. S. et al. Strain tracking reveals the determinants of bacterial engraftment in the human gut following fecal microbiota transplantation. Cell Host Microbe 23, 229–240.e5 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Schmidt, T. S. B. et al. Drivers and determinants of strain dynamics following fecal microbiota transplantation. Nat. Med. 28, 1902–1912 (2022).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Arumugam, M. et al. Enterotypes of the human gut microbiome. Nature 473, 174–180 (2011).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ravel, J. et al. Vaginal microbiome of reproductive-age women. Proc. Natl Acad. Sci. USA 108, 4680–4687 (2011).

    ADS 
    CAS 
    PubMed 

    Google Scholar 

  • Koren, O. et al. A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets. PLoS Comput. Biol. 9, e1002863 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Knights, D. et al. Rethinking ‘enterotypes’. Cell Host Microbe 16, 433–437 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Costea, P. I. et al. Enterotypes in the landscape of gut microbial community composition. Nat. Microbiol. 3, 8–16 (2018).

    CAS 
    PubMed 

    Google Scholar 

  • Gao, L. L., Bien, J. & Witten, D. Selective inference for hierarchical clustering. J. Am. Stat. Assoc. https://doi.org/10.1080/01621459.2022.2116331 (2022).

  • Karcher, N. et al. Analysis of 1321 Eubacterium rectale genomes from metagenomes uncovers complex phylogeographic population structure and subspecies functional adaptations. Genome Biol. 21, 138 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hamady, M. & Knight, R. Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res 19, 1141–1152 (2009).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).

    CAS 
    PubMed 

    Google Scholar 

  • Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 1–14 (2019).

    Google Scholar 

  • Konstantinidis, K. T. & Tiedje, J. M. Genomic insights that advance the species definition for prokaryotes. Proc. Natl Acad. Sci. USA 102, 2567–2572 (2005).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Nguyen, N.-P., Warnow, T., Pop, M. & White, B. A perspective on 16S rRNA operational taxonomic unit clustering using sequence similarity. NPJ Biofilms Microbiomes 2, 16004 (2016).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).

    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Murray, C. S., Gao, Y. & Wu, M. Re-evaluating the evidence for a universal genetic boundary among microbial species. Nat. Commun. 12, 4059 (2021).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rodriguez-R, L. M., Jain, C., Conrad, R. E., Aluru, S. & Konstantinidis, K. T. Reply to: ‘Re-evaluating the evidence for a universal genetic boundary among microbial species’. Nat. Commun. 12, 4060 (2021).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Li, W. & Godzik, A. cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

    CAS 
    PubMed 

    Google Scholar 

  • Bahram, M. et al. Structure and function of the global topsoil microbiome. Nature 560, 233–237 (2018).

    ADS 
    CAS 
    PubMed 

    Google Scholar 

  • Spang, A. et al. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521, 173–179 (2015).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

    ADS 

    Google Scholar 

  • Xiao, L. et al. A catalog of the mouse gut metagenome. Nat. Biotechnol. 33, 1103–1108 (2015).

    CAS 
    PubMed 

    Google Scholar 

  • Qin, J. et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chen, C. et al. Expanded catalog of microbial genes and metagenome-assembled genomes from the pig gut microbiome. Nat. Commun. 12, 1106 (2021).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).

    CAS 
    PubMed 

    Google Scholar 

  • Vanni, C. et al. Unifying the known and unknown microbial coding sequence space. eLife 11, e67667 (2022).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Apweiler, R. et al. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2021).

    CAS 
    PubMed 

    Google Scholar 

  • Abdi, H. & Williams, L. J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2, 433–459 (2010).

    Google Scholar 

  • Davis, T. D., Gerry, C. J. & Tan, D. S. General platform for systematic quantitative evaluation of small-molecule permeability in bacteria. ACS Chem. Biol. 9, 2535–2544 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Suchodolski, J. S. et al. The fecal microbiome in dogs with acute diarrhea and idiopathic inflammatory bowel disease. PLoS ONE 7, e51907 (2012).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mishiro, T. et al. Oral microbiome alterations of healthy volunteers with proton pump inhibitor. J. Gastroenterol. Hepatol. 33, 1059–1066 (2018).

    CAS 
    PubMed 

    Google Scholar 

  • Vázquez-Baeza, Y., Pirrung, M., Gonzalez, A. & Knight, R. EMPeror: a tool for visualizing high-throughput microbial community data. Gigascience 2, 16 (2013).

    PubMed 
    PubMed Central 

    Google Scholar 

  • van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  • Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2018).

    Google Scholar 

  • Howick, V. M. et al. The Malaria Cell Atlas: single parasite transcriptomes across the complete Plasmodium life cycle. Science 365, eaaw2619 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kuchina, A. et al. Microbial single-cell RNA sequencing by split-pool barcoding. Science 371, eaba5257 (2021).

    CAS 
    PubMed 

    Google Scholar 

  • Yatsunenko, T. et al. Human gut microbiome viewed across age and geography. Nature 486, 222–227 (2012).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rousk, J. et al. Soil bacterial and fungal communities across a pH gradient in an arable soil. ISME J. 4, 1340–1351 (2010).

    PubMed 

    Google Scholar 

  • Aagaard, K. et al. A metagenomic approach to characterization of the vaginal microbiome signature in pregnancy. PLoS ONE 7, e36466 (2012).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Blattman, S. B., Jiang, W., Oikonomou, P. & Tavazoie, S. Prokaryotic single-cell RNA sequencing by in situ combinatorial indexing. Nat. Microbiol. 5, 1192–1201 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Jeckel, H. & Drescher, K. Advances and opportunities in image analysis of bacterial cells and communities. FEMS Microbiol. Rev. 45, fuaa062 (2020).

    PubMed Central 

    Google Scholar 

  • Geier, B. et al. Spatial metabolomics of in situ host–microbe interactions at the micrometre scale. Nat. Microbiol. 5, 498–510 (2020).

    CAS 
    PubMed 

    Google Scholar 

  • Le Chatelier, E. et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).

    PubMed 

    Google Scholar 

  • Li, H. Microbiome, metagenomics, and high-dimensional compositional data analysis. Annu. Rev. Stat. Appl. 2, 73–94 (2015).

    Google Scholar 

  • Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V. & Egozcue, J. J. Microbiome datasets are compositional: and this is not optional. Front. Microbiol. 8, 2224 (2017).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Bermingham, M. L. et al. Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci. Rep. 5, 10312 (2015).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zeller, G. et al. Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol. 10, 766 (2014).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Zackular, J. P., Rogers, M. A. M., Ruffin, M. T. 4th & Schloss, P. D. The human gut microbiome as a screening tool for colorectal cancer. Cancer Prev. Res. 7, 1112–1121 (2014).

    CAS 

    Google Scholar 

  • Wong, S. H. et al. Quantitation of faecal Fusobacterium improves faecal immunochemical test in detecting advanced colorectal neoplasia. Gut 66, 1441–1448 (2017).

    CAS 
    PubMed 

    Google Scholar 

  • Xie, Y.-H. et al. Fecal Clostridium symbiosum for noninvasive detection of early and advanced colorectal cancer: test and validation studies. EBioMedicine 25, 32–40 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kostic, A. D. et al. Fusobacterium nucleatum potentiates intestinal tumorigenesis and modulates the tumor-immune microenvironment. Cell Host Microbe 14, 207–215 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rubinstein, M. R. et al. Fusobacterium nucleatum promotes colorectal carcinogenesis by modulating E-cadherin/β-catenin signaling via its FadA adhesin. Cell Host Microbe 14, 195–206 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Bourgon, R., Gentleman, R. & Huber, W. Independent filtering increases detection power for high-throughput experiments. Proc. Natl Acad. Sci. USA 107, 9546–9551 (2010).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hua, J., Tembe, W. D. & Dougherty, E. R. Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognit. 42, 409–424 (2009).

    ADS 

    Google Scholar 

  • Fan, J. & Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70, 849–911 (2008).

    MathSciNet 

    Google Scholar 

  • Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002).

    Google Scholar 

  • Radovic, M., Ghalwash, M., Filipovic, N. & Obradovic, Z. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics 18, 9 (2017).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Forslund, K. et al. Disentangling type 2 diabetes and metformin treatment signatures in the human gut microbiota. Nature 528, 262–266 (2015). This study underlines the importance of considering the influence of medication in machine learning-based microbiome analysis. In particular, it shows the effects of metformin on the gut microbiome of individuals with type 2 diabetes, highlighting the need to distinguish microbial signatures of diseases from medication.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hacılar, H., Nalbantoğlu, O. U. & Bakir-Güngör, B. in 2018 3rd Int. Conf. Computer Science and Engineering (UBMK) 434–438 (IEEE, 2018).

  • Flemer, B. et al. The oral microbiota in colorectal cancer is distinctive and predictive. Gut 67, 1454–1463 (2018).

    CAS 
    PubMed 

    Google Scholar 

  • Yachida, S. et al. Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer. Nat. Med. 25, 968–976 (2019).

    CAS 
    PubMed 

    Google Scholar 

  • Maimon, O. & Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook (Springer, 2010).

  • Lever, J., Krzywinski, M. & Altman, N. Model selection and overfitting. Nat. Methods 13, 703–704 (2016). This work highlights the importance of accurately assessing model performance to not fall into overfitting problems. Approaches that consider validation sets, test sets and cross-validation are extremely important especially when dealing with limited data.

    CAS 

    Google Scholar 

  • Lever, J., Krzywinski, M. & Altman, N. Classification evaluation. Nat. Methods 13, 603–604 (2016). This work highlights the importance of selecting the appropriate evaluation metrics when assessing the performances of classification models in the context of medical diagnosis. It also emphasizes the impact of class imbalance and the use of specific metrics in cases of imbalanced data sets.

    CAS 

    Google Scholar 

  • Ange, B. A., Symons, J. M., Schwab, M., Howell, E. & Geyh, A. Generalizability in epidemiology: an investigation within the context of heart failure studies. Ann. Epidemiol. 14, 600–601 (2004).

    Google Scholar 

  • He, Y. et al. Regional variation limits applications of healthy gut microbiome reference ranges and disease models. Nat. Med. 24, 1532–1535 (2018).

    CAS 
    PubMed 

    Google Scholar 

  • Renson, A. et al. Sociodemographic variation in the oral microbiome. Ann. Epidemiol. 35, 73–80.e2 (2019).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Sinha, R. et al. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat. Biotechnol. 35, 1077–1086 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Soneson, C., Gerster, S. & Delorenzi, M. Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation. PLoS ONE 9, e100335 (2014).

    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Riester, M. et al. Risk prediction for late-stage ovarian cancer by meta-analysis of 1525 patient samples. J. Natl Cancer Inst. 106, dju048 (2014).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhang, Y., Bernau, C., Parmigiani, G. & Waldron, L. The impact of different sources of heterogeneity on loss of accuracy from genomic prediction models. Biostatistics 21, 253–268 (2018). This work examines the impact of different types of heterogeneity on the validation accuracy of omics-based prediction models across data sets and provides insights into the challenges of validating prediction models in the presence of study heterogeneity.

    MathSciNet 
    PubMed Central 

    Google Scholar 

  • Bernau, C. et al. Cross-study validation for the assessment of prediction algorithms. Bioinformatics 30, i105–i112 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Moreno-Indias, I. et al. Statistical and machine learning techniques in human microbiome studies: contemporary challenges and solutions. Front. Microbiol. 12, 635781 (2021). This work highlights the growing importance of statistical and machine learning techniques in human microbiome studies and challenges posed by the heterogeneity of microbiome data, and emphasizes the potential of machine learning in disease diagnosis, biomarker identification and prediction while addressing issues such as data standardization, overfitting and model interpretability.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Tonkovic, P. et al. Literature on applied machine learning in metagenomic classification: a scoping review. Biology 9, 453 (2020).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Feng, Q. et al. Gut microbiome development along the colorectal adenoma–carcinoma sequence. Nat. Commun. 6, 6528 (2015).

    ADS 
    CAS 
    PubMed 

    Google Scholar 

  • Pasolli, E. et al. Accessible, curated metagenomic data through ExperimentHub. Nat. Methods 14, 1023 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Méheust, R., Burstein, D., Castelle, C. J. & Banfield, J. F. The distinction of CPR bacteria from other bacteria based on protein family content. Nat. Commun. 10, 4173 (2019).

    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain bacteria. Nature 523, 208–211 (2015).

    ADS 
    CAS 
    PubMed 

    Google Scholar 

  • Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun. 7, 13219 (2016).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Castelle, C. J. et al. Genomic expansion of domain archaea highlights roles for organisms from new phyla in anaerobic carbon cycling. Curr. Biol. 25, 690–701 (2015).

    CAS 
    PubMed 

    Google Scholar 

  • Probst, A. J. et al. Genomic resolution of a cold subsurface aquifer community provides metabolic insights for novel microbes adapted to high CO2 concentrations. Environ. Microbiol. 19, 459–474 (2017).

    CAS 
    PubMed 

    Google Scholar 

  • Yu, J. et al. Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer. Gut 66, 70–78 (2017).

    CAS 
    PubMed 

    Google Scholar 

  • Eid, F.-E., ElHefnawi, M. & Heath, L. S. DeNovo: virus–host sequence-based protein–protein interaction prediction. Bioinformatics 32, 1144–1150 (2015).

    PubMed 

    Google Scholar 

  • Calderone, A., Licata, L. & Cesareni, G. VirusMentha: a new resource for virus–host protein interactions. Nucleic Acids Res. 43, D588–D592 (2015).

    CAS 
    PubMed 

    Google Scholar 

  • Weis, C. et al. Direct antimicrobial resistance prediction from clinical MALDI-TOF mass spectra using machine learning. Nat. Med. 28, 164–174 (2022).

    CAS 
    PubMed 

    Google Scholar 

  • Wirbel, J. et al. Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox. Genome Biol. 22, 93 (2021).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Vujkovic-Cvijin, I. et al. Host variables confound gut microbiota studies of human disease. Nature 587, 448–454 (2020).

    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hernán, M. A. The C-word: scientific euphemisms do not improve causal inference from observational data. Am. J. Public. Health 108, 616–619 (2018). This work emphasizes the importance of using the term ‘causal’, in particular when analysing data from observational studies, and highlights the need to distinguish between association and causation and address confounding factors properly.

    PubMed 
    PubMed Central 

    Google Scholar 



  • Source link

    Leave a Reply

    Your email address will not be published. Required fields are marked *