Feature learning augmented with sampling and heuristics (FLASH) improves model performance and biomarker identification

Machine Learning


  • Kreitmaier, P., Katsoula, G. & Zeggini, E. Insights from multi-omics integration in complex disease primary tissues. Trends Genet. 39, 46–58 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Li, Y. & Ning, K. Biomedical applications: The need for multi-omics. In Methodologies of Multi-Omics Data Integration and Data Mining: Techniques and Applications, 13–31 (Springer, 2023).

  • Yang, L., Yang, Y., Huang, L., Cui, X. & Liu, Y. From single-to multi-omics: future research trends in medicinal plants. Brief. Bioinforma. 24, bbac485 (2023).

    Article 

    Google Scholar 

  • Brooks, T. G., Lahens, N. F., Mrčela, A. & Grant, G. R. Challenges and best practices in omics benchmarking. Nat. Rev. Genet. 25, 326–339 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Neagu, A.-N. et al. Omics-based investigations of breast cancer. Molecules 28, 4768 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Liao, J. G. & Chin, K.-V. Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23, 1945–1951 (2007).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Kumar Myakalwar, A. et al. Less is more: Avoiding the LIBS dimensionality curse through judicious feature selection for explosive detection. Sci. Rep. 5, 13169 (2015).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Liu, H. et al. Evolving feature selection. IEEE Intell. Syst. 20, 64–76 (2005).

    Article 
    CAS 

    Google Scholar 

  • Chen, Y., Gu, Y., Hu, Z. & Sun, X. Sample-specific perturbation of gene interactions identifies breast cancer subtypes. Brief. Bioinform. 22, bbaa268 (2020).

    Article 
    PubMed Central 

    Google Scholar 

  • Buus, R. et al. Molecular drivers of onco DX, prosigna, EndoPredict, and the breast cancer index: A TransATAC study. J. Clin. Oncol. 39, 126–135 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Curigliano, G. et al. Incorporating clinicopathological and molecular risk prediction tools to improve outcomes in early hr+/her2–breast cancer. NPJ Breast Cancer 9, 56 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lim, C. X. et al. Healthcare professionals’ and consumers’ knowledge, attitudes, perspectives, and education needs in oncology pharmacogenomics: A systematic review. Clin. Transl. Sci. 16, 2467–2482 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Krystel-Whittemore, M., Tan, P. H. & Wen, H. Y. Predictive and prognostic biomarkers in breast tumours. Pathology 56, 186–191 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • MotieGhader, H., Masoudi-Sobhanzadeh, Y., Ashtiani, S. H. & Masoudi-Nejad, A. mRNA and microRNA selection for breast cancer molecular subtype stratification using meta-heuristic based algorithms. Genomics 112, 3207–3217 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Bommert, A., Welchowski, T., Schmid, M. & Rahnenführer, J. Benchmark of filter methods for feature selection in high-dimensional gene expression survival data. Brief. Bioinform 23, bbab354 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Jović, A., Brkić, K. & Bogunović, N. A review of feature selection methods with applications. In 38th international convention on information and communication technology, electronics and microelectronics (MIPRO), 1200–1205 (2015).

  • Pirgazi, J., Alimoradi, M., Esmaeili Abharian, T. & Olyaee, M. H. An efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets. Sci. Rep. 9, 18580 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kundu, R., Chattopadhyay, S., Cuevas, E. & Sarkar, R. AltWOA: Altruistic whale optimization algorithm for feature selection on microarray datasets. Comput. Biol. Med. 144, 105349 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Wang, A., Liu, H., Yang, J. & Chen, G. Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data. Comput. Biol. Med. 142, 105208 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Qu, C. et al. Improving feature selection performance for classification of gene expression data using harris hawks optimizer with variable neighborhood learning. Brief. Bioinform 22, bbab097 (2021).

    Article 
    PubMed 

    Google Scholar 

  • Pashaei, E. Mutation-based binary aquila optimizer for gene selection in cancer classification. Comput. Biol. Chem. 101, 107767 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Peng, C. et al. MGRFE: Multilayer recursive feature elimination based on an embedded genetic algorithm for cancer classification. IEEE ACM Trans. Comput. Biol. Bioinform. 18, 621–632 (2021).

    Article 
    PubMed 

    Google Scholar 

  • Gao, S. et al. RIFS2D: A two-dimensional version of a randomly restarted incremental feature selection algorithm with an application for detecting low-ranked biomarkers. Comput. Biol. Med. 133, 104405 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Pudjihartono, N., Fadason, T., Kempa-Liehr, A. W. & O’Sullivan, J. M. A review of feature selection methods for machine learning-based disease risk prediction. Front. Bioinforma. 2, 927312 (2022).

    Article 

    Google Scholar 

  • Hazra, A. & Gogtay, N. Biostatistics series module 3: comparing groups: numerical variables. Indian J. Dermatol. 61, 251–260 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Brunner, E. & Munzel, U. The nonparametric behrens-fisher problem: asymptotic theory and a small-sample approximation. Biometrical J. 42, 17–25 (2000).

    Article 

    Google Scholar 

  • Ahmed, S. K. How to choose a sampling technique and determine sample size for research: A simplified guide for researchers. Oral. Oncol. Rep. 12, 100662 (2024).

    Article 

    Google Scholar 

  • Lohr, S. L. Sampling: Design and Analysis (Chapman and Hall/CRC, 2021).

  • Mangal, A. & Holm, E. A. A comparative study of feature selection methods for stress hotspot classification in materials. Integrating Mater. Manuf. Innov. 7, 87–95 (2018).

    Article 

    Google Scholar 

  • Danasingh, A. A. G. S., Subramanian, Aa. B. & Epiphany, J. L. Identifying redundant features using unsupervised learning for high-dimensional data. SN Appl. Sci. 2, 1367 (2020).

    Article 

    Google Scholar 

  • Lü, X., Meng, L., Chen, C. & Wang, P. Fuzzy removing redundancy restricted boltzmann machine: Improving learning speed and classification accuracy. IEEE Trans. Fuzzy Syst. 28, 2495–2509 (2019).

    Google Scholar 

  • Zhang, B. & Cao, P. Classification of high dimensional biomedical data based on feature selection using redundant removal. PloS one 14, e0214406 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ding, C. & Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinforma. Comput. Biol. 3, 185–205 (2005).

    Article 
    CAS 

    Google Scholar 

  • Kraskov, A., Stögbauer, H. & Grassberger, P. Estimating mutual information. Phys. Rev. E-Stat. Nonlinear Soft Matter Phys. 69, 066138 (2004).

    Article 

    Google Scholar 

  • Han, Y., Huang, L. & Zhou, F. A dynamic recursive feature elimination framework (dRFE) to further refine a set of OMIC biomarkers. Bioinformatics 37, 2183–2189 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Marjit, S., Bhattacharyya, T., Chatterjee, B. & Sarkar, R. Simulated annealing aided genetic algorithm for gene selection from microarray data. Comput. Biol. Med. 158, 106854 (2023).

    Article 
    PubMed 

    Google Scholar 

  • Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67, 301–320 (2005).

    Article 

    Google Scholar 

  • Arboretti, R., Barzizza, E., Biasetton, N. & Disegna, M. A review of multivariate permutation tests: Findings and trends. J. Multivariate Anal 207, 105421 (2025).

    Article 

    Google Scholar 

  • Cohen, J. Statistical Power Analysis for the Behavioral Sciences (Routledge, 2013).

  • Eswaran, J. et al. Transcriptomic landscape of breast cancers through mrna sequencing. Sci. Rep. 2, 264 (2012).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Horvath, A. et al. Novel insights into breast cancer genetic variance through rna sequencing. Sci. Rep. 3, 2256 (2013).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kretschmer, C., Conradi, A., Kemmner, W. & Sterner-Kock, A. Latent transforming growth factor binding protein 4 (ltbp4) is downregulated in mouse and human dcis and mammary carcinomas. Cell. Oncol. 34, 419–434 (2011).

    Article 
    CAS 

    Google Scholar 

  • Kretschmer, C. et al. Identification of early molecular markers for breast cancer. Mol. cancer 10, 15 (2011).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Piñero, J. et al. The disgenet knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 48, D845–D855 (2020).

    PubMed 

    Google Scholar 

  • Haan, J. C. et al. Mammaprint and blueprint comprehensively capture the cancer hallmarks in early-stage breast cancer patients. Genes Chromosomes Cancer 61, 148–160 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Jairath, N. K. et al. A systematic review of the evidence for the decipher genomic classifier in prostate cancer. Eur. Urol. 79, 374–383 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Koc, M. A. et al. Molecular and translational biology of the blood-based veristrat® proteomic test used in cancer immunotherapy treatment guidance. J. Mass Spectrom. Adv. Clin. lab 30, 51–60 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Misra, P. & Yadav, A. S. Improving the classification accuracy using recursive feature elimination with cross-validation. Int. J. Emerg. Technol. 11, 659–665 (2020).

    Google Scholar 

  • Chan, J. Y.-L. et al. A correlation-embedded attention module to mitigate multicollinearity: An algorithmic trading application. Mathematics 10, 1231 (2022).

    Article 

    Google Scholar 

  • Atenafu, E. G., Hamid, J. S., Stephens, D., To, T. & Beyene, J. A small p-value from an observed data is not evidence of adequate power for future similar-sized studies: A cautionary note. Contemp. Clin. trials 30, 155–157 (2009).

    Article 
    PubMed 

    Google Scholar 

  • Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap (Chapman and Hall/CRC, 1994).

  • Bui, P. H. D., Nguyen, L. Y. B., Ngo, L. D. & Nguyen, H. T. T-test-based feature selection on dna microarrays gene expression data for leukemia classification. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, 207–218 (Springer, 2025).

  • Koul, N. & Manvi, S. S. Feature selection from gene expression data using simulated annealing and partial least squares regression coefficients. Glob. Transit. Proc. 3, 251–256 (2022).

    Article 

    Google Scholar 

  • Rotimi, S. O. et al. Gene expression profiling analysis reveals putative phytochemotherapeutic target for castration-resistant prostate cancer. Front. Oncol. 9, 714 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Van Rijsbergen, C. J. Foundation of evaluation. J. Documentation 30, 365–373 (1974).

    Article 

    Google Scholar 

  • Clifford, G. D. et al. Recent advances in heart sound analysis. Physiological Meas. 38, E10–E25 (2017).

    Article 

    Google Scholar 

  • Ren, Y. et al. Gender specificity improves the early-stage detection of clear cell renal cell carcinoma based on methylomic biomarkers. Biomark. Med. 12, 607–618 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Guo, D., Li, J., Jiang, S.-H., Li, X. & Chen, Z. Intelligent assistant driving method for tunnel boring machine based on big data. Acta Geotechnica 17, 1019–1030 (2022).

    Article 

    Google Scholar 

  • Grandini, M., Bagli, E. & Visani, G. Metrics for multi-class classification: an overview. Preprint at https://arxiv.org/abs/2008.05756 (2020).

  • Conti Bellocchi, M. C. et al. Development and validation of a risk score for prediction of clinical success after duodenal stenting for malignant gastric outlet obstruction. Expert Rev. Gastroenterol. Hepatol. 16, 393–399 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Moore, J. H. & Williams, S. M. New strategies for identifying gene-gene interactions in hypertension. Ann. Med. 34, 88–95 (2002).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Satopaa, V., Albrecht, J., Irwin, D. & Raghavan, B. Finding a” kneedle” in a haystack: Detecting knee points in system behavior. In 2011 31st International Conference on Distributed Computing Systems Workshops, 166–171 (IEEE, 2011).

  • Chiaretti, S. et al. Gene expression profile of adult t-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103, 2771–2778 (2004).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Dabba, A., Tari, A., Meftali, S. & Mokhtari, R. Gene selection and classification of microarray data method based on mutual information and moth flame algorithm. Expert Syst. Appl. 166, 114012 (2021).

    Article 

    Google Scholar 

  • Pomeroy, S. L. et al. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Alon, U. et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gravier, E. et al. A prognostic DNA signature for T1T2 node-negative breast cancer patients. Genes Chromosomes Cancer 49, 1125–1134 (2010).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Alter, M. D. et al. Autism and increased paternal age related changes in global levels of gene expression regulation. PLoS One 6, e16715 (2011).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rousseaux, S. et al. Ectopic activation of germline and placental genes identifies aggressive metastasis-prone lung cancers. Sci. Transl. Med. 5, 186ra66 (2013).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Levy, H. et al. Transcriptional signatures as a disease-specific and predictive inflammatory biomarker for type 1 diabetes. Genes Immun. 13, 593–604 (2012).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tian, E. et al. The role of the wnt-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma. N. Engl. J. Med. 349, 2483–2494 (2003).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Shamir, R. et al. Analysis of blood-based gene expression in idiopathic parkinson disease. Neurology 89, 1676–1683 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Sun, L. et al. Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer Cell 9, 287–300 (2006).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Putluri, N. et al. Pathway-centric integrative analysis identifies rrm2 as a prognostic marker in breast cancer associated with poor survival and tamoxifen resistance. Neoplasia 16, 390–402 (2014).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tang, W. et al. Correction: Integrated proteotranscriptomics of breast cancer reveals globally increased protein-mrna concordance associated with subtypes and survival. Genome Med. 17, 69 (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Terunuma, A. et al. Myc-driven accumulation of 2-hydroxyglutarate is associated with breast cancer prognosis. J. Clin. Investig. 124, 398–412 (2014).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Juul, N. et al. Assessment of an rna interference screen-derived mitotic and ceramide pathway metagene as a predictor of response to neoadjuvant paclitaxel for primary triple-negative breast cancer: a retrospective analysis of five clinical trials. lancet Oncol. 11, 358–365 (2010).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Li, Y. et al. Amplification of laptm4b and ywhaz contributes to chemotherapy resistance and recurrence of breast cancer. Nat. Med. 16, 214–218 (2010).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Silver, D. P. et al. Efficacy of neoadjuvant cisplatin in triple-negative breast cancer. J. Clin. Oncol. 28, 1145–1153 (2010).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Turashvili, G. et al. Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis. BMC cancer 7, 55 (2007).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Li, S.-Y. et al. Tumor circadian clock strength influences metastatic potential and predicts patient prognosis in luminal a breast cancer. Proc. Natl. Acad. Sci. 121, e2311854121 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Seo, J.-S. et al. The transcriptional landscape and mutational profile of lung adenocarcinoma. Genome Res. 22, 2109–2119 (2012).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Martuscello, R. T. et al. Gene expression analysis of the cerebellar cortex in essential tremor. Neurosci. Lett. 721, 134540 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar 



  • Source link