Deep learning based attention enhanced phylogenetic radial basis function networks (AE-PRBFN) for genomic codon usage classification across species

Machine Learning


  • Hershberg, R. & Petrov, D. A. Selection on codon bias. Annu. Rev. Genet. 42(1), 287–299 (2008).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Plotkin, J. B. & Kudla, G. Synonymous but not the same: The causes and consequences of codon bias. Nat. Rev. Genet. 12(1), 32–42 (2011).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Murray, E. E., Lotzer, J. & Eberle, M. Codon usage in plant genes. Nucleic Acids Res. 17(2), 477–498 (1989).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • D. sequencing, assembly Barry Kerrie 5 Lucas Susan 5 Harmon-Smith Miranda 5 Lail Kathleen 5 Tice Hope 5 Schmutz (Leader) Jeremy 4 Grimwood Jane 4 McKenzie Neil 7 Bevan Michael W. michael. bevan@ bbsrc. ac. uk 7 k, G. analysis, annotation Haberer Georg 16 Spannagl Manuel 16 Mayer (Leader) Klaus 16 Rattei Thomas 17 Mitros Therese 6 Rokhsar Dan 6 Lee Sang-Jik 18 Rose Jocelyn KC 18 Mueller Lukas A. 19 York Thomas L. 19, and C. genomics Salse (Leader) Jerome 27 Murat Florent 27 Abrouk Michael 27 Haberer Georg 16 Spannagl Manuel 16 Mayer Klaus 16 Bruggmann Remy 13 Messing Joachim 13 You Frank M. 8 Luo Ming-Cheng 8 Dvorak Jan 8, “Genome sequencing and analysis of the model grass brachypodium distachyon,” Nature, 463 (7282), 763–768, 2010.

  • Libbrecht, M. W. & Noble, W. S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16(6), 321–332 (2015).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Sheng, J., She, X., Liu, X., Wang, J. & Hu, Z. Comparative analysis of codon usage patterns in chloroplast genomes of five Miscanthus species and related species. PeerJ 9, e12173. https://doi.org/10.7717/peerj.12173 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zou, J. et al. A primer on deep learning in genomics. Nat. Genet. 51(1), 12–18. https://doi.org/10.1038/s41588-018-0295-5 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Chen, Y., Li, Y., Narayan, R., Subramanian, A., & Xie, X. (2016). Gene expression inference with deep learning. Bioinformatics, 32 (12), 1832–1839. https://doi.org/10.1093/bioinformatics/btw074

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (5998–6008).

  • Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596(7873), 583–589. https://doi.org/10.1038/s41586-021-03819-2 (2021).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhang, R., Liu, X., Wang, Y., Chen, Z. & Li, H. MCA framework: A novel multi-dimensional cooperative analysis framework for Alzheimer’s disease diagnosis. J. King Saud Univ. – Comput. Inf. Sci. 37(10), 353. https://doi.org/10.1007/s44443-025-00344-4 (2025).

    Article 

    Google Scholar 

  • Wang, H., Liu, Y., Zhang, X., Zhao, J., & Li, Q. (2025). LDSL framework: A lightweight dual-stream learning framework for wheat disease detection. Plant Methods, 21(1), 1–19. https://doi.org/10.1186/s13007-025-01455-9

    Article 
    CAS 

    Google Scholar 

  • Gardner, J. D., Baker, J., Venditti, C. & Organ, C. L. Phylogenetically informed predictions outperform predictive equations in real and simulated data. Nat. Commun. 16(1), 6130. https://doi.org/10.1038/s41467-025-61036-1 (2025).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mauro, V. & Chappell, S. A. A critical analysis of codon optimization in human therapeutics. Trends Mol. Med. 20(11), 604–613 (2014).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • S. Schnable et al., “The B73 maize genome: Complexity, diversity, and dynamics,” science, 326 (5956), 1112–1115, 2009.

  • Buhmann, M. D. Radial basis functions. Acta Numer. 9, 1–38 (2000).

    Article 
    MathSciNet 

    Google Scholar 

  • Orr, M. J. Regularization in the selection of radial basis function centers. Neural Comput. 7(3), 606–623 (1995).

    Article 
    ADS 

    Google Scholar 

  • Ghosh, J. & Nag, A. “An overview of radial basis function networks,.” In Radial basis function networks 2: new advances in design 1–36 (2001).

  • Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 12(7), 878 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ma, Y., Zhang, L., Chen, H., Wang, Z. & Liu, Q. ECMRN: Efficient cross-modal reparameterization network for RGB-D tasks via prompt tuning. Knowl.-Based Syst. 298, 114321. https://doi.org/10.1016/j.knosys.2025.114321 (2025).

    Article 

    Google Scholar 

  • Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982).

    Article 
    ADS 
    MathSciNet 

    Google Scholar 

  • Hotopp, J. C. D. Horizontal gene transfer between bacteria and animals. Trends Genet. 27(4), 157–163 (2011).

    Article 

    Google Scholar 

  • D. M. Powers, “Evaluation: From precision, recall and f-measure to ROC, informedness, markedness and correlation,” arXiv preprint arXiv:2010.16061, 2020.

  • Chicco, D. & Jurman, G. The advantages of the matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 1–13 (2020).

    Article 

    Google Scholar 

  • Boughorbel, S., Jarray, F. & El-Anbari, M. Optimal classifier for imbalanced data using matthews correlation coefficient metric. PLoS One 12(6), e0177678 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 10(3), e0118432 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • F. Pedregosa et al., “Scikit-learn: Machine learning in python,” the Journal of machine Learning research, 12, 2825–2830, 2011.

    MathSciNet 

    Google Scholar 

  • G. Varoquaux, “Cross-validation failure: Small sample sizes lead to large error bars,” Neuroimage, 180, 68–77, 2018.

    Article 
    PubMed 

    Google Scholar 

  • G. Eraslan et al., “Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function,” Science, 376 (6594), eabl4290, 2022.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hallee, L. & Khomtchouk, B. B. Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life. Sci. Rep. 13(1), 2088 (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ma, X.-K. et al. CIRCexplorer3: A clear pipeline for direct comparison of circular and linear RNA expression. Genomics Proteomics Bioinformatics 17(5), 511–521 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Kellogg, E. A. Flowering plants. monocots Vol. 13 (Springer, 2016).

    Google Scholar 

  • Lowe, D. & Broomhead, D. Multivariable functional interpolation and adaptive networks. Complex Syst. 2(3), 321–355 (1988).

    MathSciNet 

    Google Scholar 

  • Sokolova, M. & Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002 (2009).

    Article 

    Google Scholar 

  • Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986).

    Article 
    ADS 

    Google Scholar 

  • A. Field, Discovering statistics using IBM SPSS statistics. Sage publications limited, 2024.

  • M. Pagel, “Inferring the historical patterns of biological evolution,” Nature, 401 (6756), 877–884, 1999.

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 

  • I. W. G. S. C. (IWGSC). Et al., “Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361(6403), eaar7191 (2018).

    Article 

    Google Scholar 

  • Purugganan, M. D. Evolutionary insights into the nature of plant domestication. Curr. Biol. 29(14), R705–R714 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Student, “The probable error of a mean,” Biometrika, 1–25, 1908.

  • Moody, J. & Darken, C. J. Fast learning in networks of locally-tuned processing units. Neural Comput. 1(2), 281–294 (1989).

    Article 

    Google Scholar 

  • D. Kingma, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.

  • A. Vaswani et al., “Attention is all you need,” Advances in neural information processing systems, 30, 2017.

  • Felsenstein, J. Phylogenies and the comparative method. Am. Nat. 125(1), 1–15 (1985).

    Article 

    Google Scholar 

  • Efron, B. The jackknife, the bootstrap and other resampling plans (SIAM, 1982).

    Book 

    Google Scholar 

  • C. M. Bishop and N. M. Nasrabadi, Pattern recognition and machine learning, 4. Springer, 2006.

  • A. Scheben and D. Hojsgaard, “Can we use gene-editing to induce apomixis in sexual plants?” Genes, 11 (7), 781, 2020.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • S. K. Sahu, M. Waseem, and M. M. Aslam, “Editorial: Bioinformatics, big data 2023, 2023, doi: https://doi.org/10.3389/fpls.2023.1271305.

  • D. Arthur and S. Vassilvitskii, “K-means++: The advantages of careful seeding,” Stanford, 2006.

  • Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J. & Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35(9), 833–844 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,” arXiv preprint arXiv:1802.03426, 2018.

  • I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning, 1. MIT press Cambridge, 2016.

  • Prechelt, L. Early stopping-but when? In Neural networks: Tricks of the trade 55–69 (Springer, 2002).

  • R. Kohavi et al., “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Ijcai, Montreal, Canada, 1995, 1137–1145.

  • Hasin, Y., Seldin, M. & Lusis, A. Multi-omics approaches to disease. Genome Biol. 18, 1–15 (2017).

    Article 

    Google Scholar 

  • Meyer, R. S. & Purugganan, M. D. Evolution of crop species: Genetics of domestication and diversification. Nat. Rev. Genet. 14(12), 840–852 (2013).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Vicario, S., Moriyama, E. N. & Powell, J. R. Codon usage in twelve species of drosophila. BMC Evol. Biol. 7, 1–17 (2007).

    Article 

    Google Scholar 

  • Kamilaris, A., Kartakoullis, A. & Prenafeta-Boldú, F. X. A review on the practice of big data analysis in agriculture. Comput. Electron. Agric. 143, 23–37 (2017).

    Article 

    Google Scholar 

  • Sharp, M. & Li, W.-H. The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15(3), 1281–1295 (1987).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • A. Şen, K. Kargar, E. Akgün, and M. Ç. Pınar, “Codon optimization: A mathematical programing approach,” Bioinformatics, 36 (13), 4012–4020, 2020.

    Article 
    PubMed 

    Google Scholar 

  • Gustafsson, C., Govindarajan, S. & Minshull, J. Codon bias and heterologous protein expression. Trends Biotechnol. 22(7), 346–353 (2004).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Rocha, E. Codon usage bias from tRNA’s point of view: Redundancy, specialization, and efficient decoding for translation optimization. Genome Res. 14(11), 2279–2286 (2004).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hallee, L. & Khomtchouk, B. B. Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life. Sci. Rep. 13(1), 2088. https://doi.org/10.1038/s41598-023-28965-7 (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521(7553), 436–444 (2015).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 

  • C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, et al., “Array programming with NumPy,” Nature, 585 (7825), 357–362, Sep. 2020. doi: https://doi.org/10.1038/s41586-020-2649-2.

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, Prettenhofer, R. Weiss, V. Dubourg, et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, 12, 2825–2830, 2011.

    MathSciNet 

    Google Scholar 

  • Cock, J. A. et al. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11), 1422 (2009).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • J. Cock et al., “Biopython: Freely available python tools for computational molecular biology and bioinformatics,” Bioinformatics, 25 (11), 1422, 2009.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • D. W. Mount and D. W. Mount, Bioinformatics: Sequence and genome analysis, 564. Cold spring harbor laboratory press Cold Spring Harbor, NY, 2001.

  • B. Alberts, “Molecular biology of the cell 4th edition,” (No Title), 2002.

  • A. M. Lesk, Introduction to bioinformatics. Oxford university press, 2019.

  • Goulet, D. R. et al. Codon optimization using a recurrent neural network. J. Comput. Biol. 30(1), 70–81 (2023).

  • Yates, A. D. et al. Ensembl genomes 2022: An expanding genome resource for non-vertebrates. Nucleic Acids Res. 50(D1), D996–D1003 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Sueoka, N. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. U. S. A. 85(8), 2653–2657. https://doi.org/10.1073/pnas.85.8.2653 (1988).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Quax, T. E., Claassens, N. J., Söll, D. & van der Oost, J. Codon bias as a means to fine-tune gene expression. Mol. Cell 59(2), 149–161 (2015).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Jolliffe, I. T. & Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci. 374(2065), 20150202 (2016).

    ADS 
    MathSciNet 

    Google Scholar 

  • Ringnér, M. What is principal component analysis?. Nat. Biotechnol. 26(3), 303–304 (2008).

    Article 
    PubMed 

    Google Scholar 

  • E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38–44 (2019).

  • R. Leinonen, H. Sugawara, M. Shumway, and I. N. S. D. Collaboration, “The sequence read archive,” Nucleic acids research, 39 (suppl_1), D19–D21, 2010.

    PubMed 
    PubMed Central 

    Google Scholar 

  • Compeau, E., Pevzner, A. & Tesler, G. How to apply de bruijn graphs to genome assembly. Nat. Biotechnol. 29(11), 987–991 (2011).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • S. Haykin, Neural networks and learning machines, 3/e. Pearson Education India, 2009.

  • A. Ng et al., “Sparse autoencoder,” CS294A Lecture notes, 72 (2011), 1–19, 2011.

  • Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006).

    Article 
    ADS 
    MathSciNet 
    CAS 
    PubMed 

    Google Scholar 

  • Nakamura, Y., Gojobori, T. & Ikemura, T. Codon usage tabulated from international DNA sequence databases: Status for the year 2000. Nucleic Acids Res. 28(1), 292. https://doi.org/10.1093/nar/28.1.292 (2000).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Yang, Z. & Nielsen, R. Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Mol. Biol. Evol. 25(3), 568–579 (2008).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • T. Hastie, R. Tibshirani, J. Friedman, et al., “The elements of statistical learning.” Citeseer, 2009.

  • S. S. Shapiro and M. B. Wilk, “An analysis of variance test for normality (complete samples),” Biometrika, 52 (3–4), 591–611, 1965.

    Article 
    MathSciNet 

    Google Scholar 

  • Zaidi, S.-A. et al. New plant breeding technologies for food security. Science 363(6434), 1390–1391 (2019).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 

  • Smith, S. A. & Brown, J. W. Constructing a broadly inclusive seed plant phylogeny. Am. J. Bot. 105(3), 302–314 (2018).

    Article 
    PubMed 

    Google Scholar 

  • Salzberg, S. L. Next-generation genome annotation: We still struggle to get it right. Genome Biol. 20(1), 92 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 



  • Source link