Hershberg, R. & Petrov, D. A. Selection on codon bias. Annu. Rev. Genet. 42(1), 287–299 (2008).
Google Scholar
Plotkin, J. B. & Kudla, G. Synonymous but not the same: The causes and consequences of codon bias. Nat. Rev. Genet. 12(1), 32–42 (2011).
Google Scholar
Murray, E. E., Lotzer, J. & Eberle, M. Codon usage in plant genes. Nucleic Acids Res. 17(2), 477–498 (1989).
Google Scholar
D. sequencing, assembly Barry Kerrie 5 Lucas Susan 5 Harmon-Smith Miranda 5 Lail Kathleen 5 Tice Hope 5 Schmutz (Leader) Jeremy 4 Grimwood Jane 4 McKenzie Neil 7 Bevan Michael W. michael. bevan@ bbsrc. ac. uk 7 k, G. analysis, annotation Haberer Georg 16 Spannagl Manuel 16 Mayer (Leader) Klaus 16 Rattei Thomas 17 Mitros Therese 6 Rokhsar Dan 6 Lee Sang-Jik 18 Rose Jocelyn KC 18 Mueller Lukas A. 19 York Thomas L. 19, and C. genomics Salse (Leader) Jerome 27 Murat Florent 27 Abrouk Michael 27 Haberer Georg 16 Spannagl Manuel 16 Mayer Klaus 16 Bruggmann Remy 13 Messing Joachim 13 You Frank M. 8 Luo Ming-Cheng 8 Dvorak Jan 8, “Genome sequencing and analysis of the model grass brachypodium distachyon,” Nature, 463 (7282), 763–768, 2010.
Libbrecht, M. W. & Noble, W. S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16(6), 321–332 (2015).
Google Scholar
Sheng, J., She, X., Liu, X., Wang, J. & Hu, Z. Comparative analysis of codon usage patterns in chloroplast genomes of five Miscanthus species and related species. PeerJ 9, e12173. https://doi.org/10.7717/peerj.12173 (2021).
Google Scholar
Zou, J. et al. A primer on deep learning in genomics. Nat. Genet. 51(1), 12–18. https://doi.org/10.1038/s41588-018-0295-5 (2019).
Google Scholar
Chen, Y., Li, Y., Narayan, R., Subramanian, A., & Xie, X. (2016). Gene expression inference with deep learning. Bioinformatics, 32 (12), 1832–1839. https://doi.org/10.1093/bioinformatics/btw074
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (5998–6008).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596(7873), 583–589. https://doi.org/10.1038/s41586-021-03819-2 (2021).
Google Scholar
Zhang, R., Liu, X., Wang, Y., Chen, Z. & Li, H. MCA framework: A novel multi-dimensional cooperative analysis framework for Alzheimer’s disease diagnosis. J. King Saud Univ. – Comput. Inf. Sci. 37(10), 353. https://doi.org/10.1007/s44443-025-00344-4 (2025).
Google Scholar
Wang, H., Liu, Y., Zhang, X., Zhao, J., & Li, Q. (2025). LDSL framework: A lightweight dual-stream learning framework for wheat disease detection. Plant Methods, 21(1), 1–19. https://doi.org/10.1186/s13007-025-01455-9
Google Scholar
Gardner, J. D., Baker, J., Venditti, C. & Organ, C. L. Phylogenetically informed predictions outperform predictive equations in real and simulated data. Nat. Commun. 16(1), 6130. https://doi.org/10.1038/s41467-025-61036-1 (2025).
Google Scholar
Mauro, V. & Chappell, S. A. A critical analysis of codon optimization in human therapeutics. Trends Mol. Med. 20(11), 604–613 (2014).
Google Scholar
S. Schnable et al., “The B73 maize genome: Complexity, diversity, and dynamics,” science, 326 (5956), 1112–1115, 2009.
Buhmann, M. D. Radial basis functions. Acta Numer. 9, 1–38 (2000).
Google Scholar
Orr, M. J. Regularization in the selection of radial basis function centers. Neural Comput. 7(3), 606–623 (1995).
Google Scholar
Ghosh, J. & Nag, A. “An overview of radial basis function networks,.” In Radial basis function networks 2: new advances in design 1–36 (2001).
Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 12(7), 878 (2016).
Google Scholar
Ma, Y., Zhang, L., Chen, H., Wang, Z. & Liu, Q. ECMRN: Efficient cross-modal reparameterization network for RGB-D tasks via prompt tuning. Knowl.-Based Syst. 298, 114321. https://doi.org/10.1016/j.knosys.2025.114321 (2025).
Google Scholar
Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982).
Google Scholar
Hotopp, J. C. D. Horizontal gene transfer between bacteria and animals. Trends Genet. 27(4), 157–163 (2011).
Google Scholar
D. M. Powers, “Evaluation: From precision, recall and f-measure to ROC, informedness, markedness and correlation,” arXiv preprint arXiv:2010.16061, 2020.
Chicco, D. & Jurman, G. The advantages of the matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 1–13 (2020).
Google Scholar
Boughorbel, S., Jarray, F. & El-Anbari, M. Optimal classifier for imbalanced data using matthews correlation coefficient metric. PLoS One 12(6), e0177678 (2017).
Google Scholar
Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 10(3), e0118432 (2015).
Google Scholar
F. Pedregosa et al., “Scikit-learn: Machine learning in python,” the Journal of machine Learning research, 12, 2825–2830, 2011.
Google Scholar
G. Varoquaux, “Cross-validation failure: Small sample sizes lead to large error bars,” Neuroimage, 180, 68–77, 2018.
Google Scholar
G. Eraslan et al., “Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function,” Science, 376 (6594), eabl4290, 2022.
Google Scholar
Hallee, L. & Khomtchouk, B. B. Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life. Sci. Rep. 13(1), 2088 (2023).
Google Scholar
Ma, X.-K. et al. CIRCexplorer3: A clear pipeline for direct comparison of circular and linear RNA expression. Genomics Proteomics Bioinformatics 17(5), 511–521 (2019).
Google Scholar
Kellogg, E. A. Flowering plants. monocots Vol. 13 (Springer, 2016).
Lowe, D. & Broomhead, D. Multivariable functional interpolation and adaptive networks. Complex Syst. 2(3), 321–355 (1988).
Google Scholar
Sokolova, M. & Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002 (2009).
Google Scholar
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986).
Google Scholar
A. Field, Discovering statistics using IBM SPSS statistics. Sage publications limited, 2024.
M. Pagel, “Inferring the historical patterns of biological evolution,” Nature, 401 (6756), 877–884, 1999.
Google Scholar
I. W. G. S. C. (IWGSC). Et al., “Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361(6403), eaar7191 (2018).
Google Scholar
Purugganan, M. D. Evolutionary insights into the nature of plant domestication. Curr. Biol. 29(14), R705–R714 (2019).
Google Scholar
Student, “The probable error of a mean,” Biometrika, 1–25, 1908.
Moody, J. & Darken, C. J. Fast learning in networks of locally-tuned processing units. Neural Comput. 1(2), 281–294 (1989).
Google Scholar
D. Kingma, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
A. Vaswani et al., “Attention is all you need,” Advances in neural information processing systems, 30, 2017.
Felsenstein, J. Phylogenies and the comparative method. Am. Nat. 125(1), 1–15 (1985).
Google Scholar
Efron, B. The jackknife, the bootstrap and other resampling plans (SIAM, 1982).
Google Scholar
C. M. Bishop and N. M. Nasrabadi, Pattern recognition and machine learning, 4. Springer, 2006.
A. Scheben and D. Hojsgaard, “Can we use gene-editing to induce apomixis in sexual plants?” Genes, 11 (7), 781, 2020.
Google Scholar
S. K. Sahu, M. Waseem, and M. M. Aslam, “Editorial: Bioinformatics, big data 2023, 2023, doi: https://doi.org/10.3389/fpls.2023.1271305.
D. Arthur and S. Vassilvitskii, “K-means++: The advantages of careful seeding,” Stanford, 2006.
Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J. & Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35(9), 833–844 (2017).
Google Scholar
L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,” arXiv preprint arXiv:1802.03426, 2018.
I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning, 1. MIT press Cambridge, 2016.
Prechelt, L. Early stopping-but when? In Neural networks: Tricks of the trade 55–69 (Springer, 2002).
R. Kohavi et al., “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Ijcai, Montreal, Canada, 1995, 1137–1145.
Hasin, Y., Seldin, M. & Lusis, A. Multi-omics approaches to disease. Genome Biol. 18, 1–15 (2017).
Google Scholar
Meyer, R. S. & Purugganan, M. D. Evolution of crop species: Genetics of domestication and diversification. Nat. Rev. Genet. 14(12), 840–852 (2013).
Google Scholar
Vicario, S., Moriyama, E. N. & Powell, J. R. Codon usage in twelve species of drosophila. BMC Evol. Biol. 7, 1–17 (2007).
Google Scholar
Kamilaris, A., Kartakoullis, A. & Prenafeta-Boldú, F. X. A review on the practice of big data analysis in agriculture. Comput. Electron. Agric. 143, 23–37 (2017).
Google Scholar
Sharp, M. & Li, W.-H. The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15(3), 1281–1295 (1987).
Google Scholar
A. Şen, K. Kargar, E. Akgün, and M. Ç. Pınar, “Codon optimization: A mathematical programing approach,” Bioinformatics, 36 (13), 4012–4020, 2020.
Google Scholar
Gustafsson, C., Govindarajan, S. & Minshull, J. Codon bias and heterologous protein expression. Trends Biotechnol. 22(7), 346–353 (2004).
Google Scholar
Rocha, E. Codon usage bias from tRNA’s point of view: Redundancy, specialization, and efficient decoding for translation optimization. Genome Res. 14(11), 2279–2286 (2004).
Google Scholar
Hallee, L. & Khomtchouk, B. B. Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life. Sci. Rep. 13(1), 2088. https://doi.org/10.1038/s41598-023-28965-7 (2023).
Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521(7553), 436–444 (2015).
Google Scholar
C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, et al., “Array programming with NumPy,” Nature, 585 (7825), 357–362, Sep. 2020. doi: https://doi.org/10.1038/s41586-020-2649-2.
Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, Prettenhofer, R. Weiss, V. Dubourg, et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, 12, 2825–2830, 2011.
Google Scholar
Cock, J. A. et al. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11), 1422 (2009).
Google Scholar
J. Cock et al., “Biopython: Freely available python tools for computational molecular biology and bioinformatics,” Bioinformatics, 25 (11), 1422, 2009.
Google Scholar
D. W. Mount and D. W. Mount, Bioinformatics: Sequence and genome analysis, 564. Cold spring harbor laboratory press Cold Spring Harbor, NY, 2001.
B. Alberts, “Molecular biology of the cell 4th edition,” (No Title), 2002.
A. M. Lesk, Introduction to bioinformatics. Oxford university press, 2019.
Goulet, D. R. et al. Codon optimization using a recurrent neural network. J. Comput. Biol. 30(1), 70–81 (2023).
Yates, A. D. et al. Ensembl genomes 2022: An expanding genome resource for non-vertebrates. Nucleic Acids Res. 50(D1), D996–D1003 (2022).
Google Scholar
Sueoka, N. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. U. S. A. 85(8), 2653–2657. https://doi.org/10.1073/pnas.85.8.2653 (1988).
Google Scholar
Quax, T. E., Claassens, N. J., Söll, D. & van der Oost, J. Codon bias as a means to fine-tune gene expression. Mol. Cell 59(2), 149–161 (2015).
Google Scholar
Jolliffe, I. T. & Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci. 374(2065), 20150202 (2016).
Google Scholar
Ringnér, M. What is principal component analysis?. Nat. Biotechnol. 26(3), 303–304 (2008).
Google Scholar
E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38–44 (2019).
R. Leinonen, H. Sugawara, M. Shumway, and I. N. S. D. Collaboration, “The sequence read archive,” Nucleic acids research, 39 (suppl_1), D19–D21, 2010.
Google Scholar
Compeau, E., Pevzner, A. & Tesler, G. How to apply de bruijn graphs to genome assembly. Nat. Biotechnol. 29(11), 987–991 (2011).
Google Scholar
S. Haykin, Neural networks and learning machines, 3/e. Pearson Education India, 2009.
A. Ng et al., “Sparse autoencoder,” CS294A Lecture notes, 72 (2011), 1–19, 2011.
Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006).
Google Scholar
Nakamura, Y., Gojobori, T. & Ikemura, T. Codon usage tabulated from international DNA sequence databases: Status for the year 2000. Nucleic Acids Res. 28(1), 292. https://doi.org/10.1093/nar/28.1.292 (2000).
Google Scholar
Yang, Z. & Nielsen, R. Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Mol. Biol. Evol. 25(3), 568–579 (2008).
Google Scholar
T. Hastie, R. Tibshirani, J. Friedman, et al., “The elements of statistical learning.” Citeseer, 2009.
S. S. Shapiro and M. B. Wilk, “An analysis of variance test for normality (complete samples),” Biometrika, 52 (3–4), 591–611, 1965.
Google Scholar
Zaidi, S.-A. et al. New plant breeding technologies for food security. Science 363(6434), 1390–1391 (2019).
Google Scholar
Smith, S. A. & Brown, J. W. Constructing a broadly inclusive seed plant phylogeny. Am. J. Bot. 105(3), 302–314 (2018).
Google Scholar
Salzberg, S. L. Next-generation genome annotation: We still struggle to get it right. Genome Biol. 20(1), 92 (2019).
Google Scholar
