Convergence of machine learning and genomics for precision oncology

Machine Learning


  • Suehnholz, S. P. et al. Quantifying the expanding landscape of clinical actionability for patients with cancer. Cancer Discov. 14, 49–65 (2023).

    Article 
    PubMed Central 

    Google Scholar 

  • Horak, P. & Fröhling, S. Measuring progress in precision oncology. Cancer Discov. 14, 18–19 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 1, 1–16 (2017).

    Article 

    Google Scholar 

  • Griffith, M. et al. CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat. Genet. 49, 170–174 (2017).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Reardon, B. et al. Integrating molecular profiles into clinical frameworks through the Molecular Oncology Almanac to prospectively guide precision oncology. Nat. Cancer 2, 1102–1112 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Luchini, C., Lawlor, R. T., Milella, M. & Scarpa, A. Molecular tumor boards in clinical practice. Trends Cancer 6, 738–744 (2020).

    Article 
    PubMed 

    Google Scholar 

  • Gladstone, B. P. et al. Systematic review and meta-analysis of molecular tumor board data on clinical effectiveness and evaluation gaps. NPJ Precis. Oncol. 9, 96 (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Nichetti, F. et al. Real-world outcomes of molecular tumor board treatment recommendations. JCO Precis. Oncol. 9, e2400387 (2025).

    Article 
    PubMed 

    Google Scholar 

  • The AACR Project GENIE Consortium et al. AACR Project GENIE: powering precision medicine through an international consortium. Cancer Discov. 7, 818–831 (2017).

    Article 
    PubMed Central 

    Google Scholar 

  • Pugh, T. J. et al. AACR project GENIE: 100,000 cases and beyond. Cancer Discov. 12, 2044–2057 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang, S. & Ye, K. Deep-learning based representation and recognition for genome variants — from SNVs to structural variants. Natl Sci. Rev. 11, nwae335 (2024).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018). This paper, the publication of DeepVariant, brought the proliferation of machine learning to bioinformatics, demonstrating that traditional heuristic and statistical approaches to variant calling could be outperformed.

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • AlDubayan, S. H. et al. Detection of pathogenic variants with germline genetic testing using deep learning vs standard methods in patients with prostate cancer and melanoma. JAMA 324, 1957–1969 (2020).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Olson, N. D. et al. PrecisionFDA Truth Challenge V2: calling variants from short and long reads in difficult-to-map regions. Cell Genom. 2, 100129 (2022). This paper illustrates the methodological shift of variant callers towards using machine learning while also highlighting challenge areas for future developers.

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Mandiracioglu, B. et al. ECOLE: learning to call copy number variants on whole exome sequencing data. Nat. Commun. 15, 132 (2024).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Popic, V. et al. Cue: a deep-learning framework for structural variant discovery and genotyping. Nat. Methods 20, 559–568 (2023).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Behera, S. et al. Comprehensive genome analysis and variant detection at scale using DRAGEN. Nat. Biotechnol. 43, 1177–1191 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Yi, R., Chang, P.-C., Baid, G. & Carroll, A. Learning from data-rich problems: a case study on genetic variant calling. Preprint at https://doi.org/10.48550/arXiv.1911.05151 (2019).

  • Scheffler, K. et al. Somatic small-variant calling methods in Illumina DRAGENTM Secondary Analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.03.23.534011 (2023).

  • Park, J. et al. Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic. Nat. Biotechnol. https://doi.org/10.1038/s41587-025-02839-x (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Betschart, R. O. et al. Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment. Sci. Rep. 12, 21502 (2022).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Roy, S. et al. Standards and guidelines for validating next-generation sequencing bioinformatics pipelines. J. Mol. Diagn. 20, 4–27 (2018).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • van de Haar, J. et al. ESMO recommendations on clinical reporting of genomic test results for solid cancers. Ann. Oncol. 35, 954–967 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • den Dunnen, J. T. et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum. Mutat. 37, 564–569 (2016).

    Article 

    Google Scholar 

  • Holmes, J. B., Moyer, E., Phan, L., Maglott, D. & Kattman, B. SPDI: data model for variants and applications at NCBI. Bioinformatics 36, 1902–1907 (2020).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Wang, M. et al. hgvs: a python package for manipulating sequence variants using HGVS nomenclature: 2018 update. Hum. Mutat. 39, 1803–1813 (2018).

    Article 
    PubMed 

    Google Scholar 

  • Lefter, M. et al. Mutalyzer 2: next generation HGVS nomenclature checker. Bioinformatics 37, 2811–2817 (2021).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • van Giffen, B., Herhausen, D. & Fahse, T. Overcoming the pitfalls and perils of algorithms: a classification of machine learning biases and mitigation methods. J. Bus. Res. 144, 93–106 (2022).

    Article 

    Google Scholar 

  • Singh, D. & Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 97, 105524 (2020).

    Article 

    Google Scholar 

  • Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 50, D20–D26 (2022).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Freeman, P. J., Hart, R. K., Gretton, L. J., Brookes, A. J. & Dalgleish, R. VariantValidator: accurate validation, mapping, and formatting of sequence variation descriptions. Hum. Mutat. 39, 61–68 (2018).

    Article 
    PubMed 

    Google Scholar 

  • Freeman, P. J. et al. Standardizing variant naming in literature with VariantValidator to increase diagnostic rates. Nat. Genet. 56, 2284–2286 (2024).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wagner, A. H. et al. The GA4GH Variation Representation Specification: a computational framework for variation representation and federated identification. Cell Genom. 1, 100027 (2021). This paper shows that VRS enables semantically precise, computable variant representation that facilitates further downstream bioinformatic applications and machine learning models.

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Arbesfeld, J. A. et al. Mapping MAVE data for use in human genomics applications. Genome Biol. 26, 179 (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pagel, K. A. et al. Integrated informatics analysis of cancer-related variants. JCO Clin. Cancer Inform. 4, 310–317 (2020).

    Article 
    PubMed 

    Google Scholar 

  • Bruijn, I. et al. Genome Nexus: a comprehensive resource for the annotation and interpretation of genomic variants in cancer. JCO Clin. Cancer Inform. 6, e2100144 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Durkie, M. et al. ACGS Best Practice Guidelines for Variant Classification in Rare Disease (ACGS, 2024).

  • Horak, P. et al. Standards for the classification of pathogenicity of somatic variants in cancer (oncogenicity): joint recommendations of Clinical Genome Resource (ClinGen), Cancer Genomics Consortium (CGC), and Variant Interpretation for Cancer Consortium (VICC). Genet. Med. 24, 986–998 (2022).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Brandes, N., Goldman, G., Wang, C. H., Ye, C. J. & Ntranos, V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat. Genet. 55, 1512–1522 (2023).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023). This paper shows DeepMind’s AlphaMissense and introduces it as a transformative deep learning model for missense variant effect prediction that was rigorously evaluated for its utility within pathogenicity assessments.

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Kurtovic-Kozaric, A. et al. Comprehensive evaluation of AlphaMissense predictions by evidence quantification for variants of uncertain significance. Front. Genet. 15, 1487608 (2024).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Muiños, F., Martínez-Jiménez, F., Pich, O., Gonzalez-Perez, A. & Lopez-Bigas, N. In silico saturation mutagenesis of cancer genes. Nature 596, 428–432 (2021).

    Article 
    PubMed 

    Google Scholar 

  • Demajo, S. et al. Identification of clonal hematopoiesis driver mutations through in silico saturation mutagenesis. Cancer Discov. 14, 1717–1731 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Vihinen, M. Problems in variation interpretation guidelines and in their implementation in computational tools. Mol. Genet. Genom. Med. 8, e1206 (2020).

    Article 

    Google Scholar 

  • Fayer, S. et al. Closing the gap: systematic integration of multiplexed functional data resolves variants of uncertain significance in BRCA1, TP53, and PTEN. Am. J. Hum. Genet. 108, 2248–2258 (2021).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Rubin, A. F. et al. MaveDB 2024: a curated community database with over seven million variant effects from multiplexed functional assays. Genome Biol. 26, 13 (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Arafeh, R., Shibue, T., Dempster, J. M., Hahn, W. C. & Vazquez, F. The present and future of the cancer dependency map. Nat. Rev. Cancer 25, 59–73 (2025).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Brixi, G. et al. Genome modeling and design across all domains of life with Evo 2. Preprint at bioRxiv https://doi.org/10.1101/2025.02.18.638918 (2025).

  • Avsec, Ž. et al. AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model. Preprint at bioRxiv https://doi.org/10.1101/2025.06.25.661532 (2025).

  • Li, M. M. et al. Standards and guidelines for the interpretation and reporting of sequence variants in cancer. J. Mol. Diagn. 19, 4–23 (2017).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Mateo, J. et al. A framework to rank genomic alterations as targets for cancer precision medicine: the ESMO Scale for Clinical Actionability of Molecular Targets (ESCAT). Ann. Oncol. 29, 1895–1902 (2018).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • He, M. M. et al. Variant Interpretation for Cancer (VIC): a computational tool for assessing clinical impacts of somatic variants. Genome Med. 11, 53 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Li, Q. et al. CancerVar: an artificial intelligence-empowered platform for clinical interpretation of somatic mutations in cancer. Sci. Adv. 8, eabj1624 (2022).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Ruzicka, J. et al. Clinical evaluation of an AI system for streamlined variant interpretation in genetic testing. Preprint at medRxiv https://doi.org/10.1101/2025.02.04.25321641 (2025).

  • Lammert, J. et al. Large language models for precision oncology: clinical decision support through expert-guided learning. J. Clin. Oncol. 42, e13609 (2024).

    Article 

    Google Scholar 

  • Klein, H. et al. MatchMiner: an open-source platform for cancer precision medicine. NPJ Precis. Oncol. 6, 69 (2022). The authors introduce a clinical trial matching platform and a structured format for enrolment criteria to facilitate clinical trial matching for precision oncology, addressing a historically intractable problem within the field.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lotter, W. et al. Artificial intelligence in oncology: current landscape, challenges, and future directions. Cancer Discov. 14, 711–726 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wong, C. et al. Scaling clinical trial matching using large language models: a case study in oncology. In Proc. 8th Machine Learning for Healthcare Conference 846–862 (PMLR, 2023).

  • Jin, Q. et al. Matching patients to clinical trials with large language models. Nat. Commun. 15, 9074 (2024).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Cerami, E. et al. MatchMiner-AI: an open-source solution for cancer clinical trial matching. Preprint at https://doi.org/10.48550/arXiv.2412.17228 (2024).

  • Reisle, C. et al. Evaluating language models for biomedical fact-checking: a benchmark dataset for cancer variant interpretation verification. Preprint at bioRxiv https://doi.org/10.1101/2025.09.10.675443 (2025).

  • Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems 33, 9459–9474 (Curran Associates, 2020).

  • Jun, H. et al. Implementing a context-augmented large language model to guide precision cancer medicine. Preprint at medRxiv https://doi.org/10.1101/2025.05.09.25327312 (2025).

  • Schick, T. et al. Toolformer: language models can teach themselves to use tools. In Advances in Neural Information Processing Systems 36, 68539–68551 (Curran Associates, 2023).

  • Yao, S. et al. ReAct: synergizing reasoning and acting in language models. Preprint at https://doi.org/10.48550/arXiv.2210.03629 (2023).

  • Gao, S. et al. TxAgent: an AI agent for therapeutic reasoning across a universe of tools. Preprint at https://doi.org/10.48550/arXiv.2503.10970 (2025).

  • Ferber, D. et al. Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology. Nat. Cancer 6, 1337–1349 (2025). This study is one of the most prominent illustrations of agentic AI systems being applied to precision oncology to support a wide array of clinical decision-making tasks.

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Benary, M. et al. Leveraging large language models for decision support in personalized oncology. JAMA Netw. Open 6, e2343689 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Verlingue, L. et al. Artificial intelligence in oncology: ensuring safe and effective integration of language models in clinical practice. Lancet Reg. Health Eur. 46, 101064 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Elemento, O., Khozin, S. & Sternberg, C. N. The use of artificial intelligence for cancer therapeutic decision-making. NEJM AI 2, AIra2401164 (2025).

    Article 

    Google Scholar 

  • Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

  • Yang, K., Qinami, K., Fei-Fei, L., Deng, J. & Russakovsky, O. Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy. In Proc. 2020 Conference on Fairness, Accountability, and Transparency 547–558 (ACM, 2020).

  • Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Acebedo, A. et al. Collaborating across sectors in service of open science, precision oncology, and patients: an overview of the AACR Project GENIE (Genomics Evidence Neoplasia Information Exchange) Biopharma Collaborative (BPC). ESMO Real World Data Digit. Oncol. 7, 100097 (2025).

    Article 

    Google Scholar 

  • Painter, C. A. et al. The Angiosarcoma Project: enabling genomic and clinical discoveries in a rare cancer through patient-partnered research. Nat. Med. 26, 181–187 (2020).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Crowdis, J. et al. A patient-driven clinicogenomic partnership for metastatic prostate cancer. Cell Genom. 2, 100169 (2022).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Lee, E., Jung, S. Y., Hwang, H. J. & Jung, J. Patient-level cancer prediction models from a nationwide patient cohort: model development and validation. JMIR Med. Inform. 9, e29807 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Placido, D. et al. A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories. Nat. Med. 29, 1113–1122 (2023).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Buk Cardoso, L. et al. Machine learning for predicting survival of colorectal cancer patients. Sci. Rep. 13, 8874 (2023).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Moon, I. et al. Machine learning for genetics-based classification and treatment response prediction in cancer of unknown primary. Nat. Med. 29, 2057–2067 (2023).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Jee, J. et al. Automated real-world data integration improves cancer outcome prediction. Nature 636, 728–736 (2024). This paper shows MSKCC leveraging their data warehouse to develop a machine learning model to predict clinical outcomes, a paradigm that will continue to define clinicogenomic discoveries in the near term.

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Rieke, N. et al. The future of digital health with federated learning. NPJ Digit. Med. 3, 1–7 (2020).

    Article 

    Google Scholar 

  • Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10, 12598 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pati, S. et al. Federated learning enables big data for rare cancer boundary detection. Nat. Commun. 13, 7346 (2022).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Brauneck, A. et al. Federated machine learning in data-protection-compliant research. Nat. Mach. Intell. 5, 2–4 (2023).

    Article 

    Google Scholar 

  • Ogier du Terrail, J. et al. Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer. Nat. Med. 29, 135–146 (2023).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Stark, Z. et al. A call to action to scale up research and clinical genomic data sharing. Nat. Rev. Genet. 26, 141–147 (2024). This study outline several steps to data sharing and harmonization that can enable clinicogenomic datasets of thousands of patients with cancer, enabling biological discovery and machine learning models that generalize across institutions.

    Article 
    PubMed 

    Google Scholar 

  • Fiume, M. et al. Federated discovery and sharing of genomic data using Beacons. Nat. Biotechnol. 37, 220–224 (2019). This study describes the Beacon protocol of GA4GH for federated data sharing, and it has become ubiquitous with federated learning within genomics.

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Elhussein, A., Baymuradov, U., Elhadad, N., Natarajan, K. & Gürsoy, G. A framework for sharing of clinical and genetic data for precision medicine applications. Nat. Med. 30, 3578–3589 (2024).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Cho, H. et al. Secure and federated genome-wide association studies for biobank-scale datasets. Nat. Genet. 57, 809–814 (2025).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Hanser, T. et al. Data-driven federated learning in drug discovery with knowledge distillation. Nat. Mach. Intell. 7, 423–436 (2025).

    Article 

    Google Scholar 

  • Riba, M. et al. The 1+Million Genomes Minimal Dataset for Cancer. Nat. Genet. 56, 733–736 (2024).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Kehl, K. L. et al. Assessment of deep natural language processing in ascertaining oncologic outcomes from radiology reports. JAMA Oncol. 5, 1421–1429 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kehl, K. L. et al. Natural language processing to ascertain cancer outcomes from medical oncologist notes. JCO Clin. Cancer Inform. 4, 680–690 (2020).

    Article 
    PubMed 

    Google Scholar 

  • Sushil, M. et al. CORAL: expert-curated oncology reports to advance language model inference. NEJM AI 1, AIdbp2300110 (2024).

    Article 

    Google Scholar 

  • Hoes, L. R. et al. Patients with rare cancers in the Drug Rediscovery Protocol (DRUP) benefit from genomics-guided treatment. Clin. Cancer Res. 28, 1402–1411 (2022).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Helland, Å et al. Improving public cancer care by implementing precision medicine in Norway: IMPRESS-Norway. J. Transl. Med. 20, 225 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mohammad, S. F. H. et al. The evolution of precision oncology: the ongoing impact of the Drug Rediscovery Protocol (DRUP). Acta Oncol. 63, 34885 (2024).

    Google Scholar 

  • Nikolski, M. et al. Roadmap for a European cancer data management and precision medicine infrastructure. Nat. Cancer 5, 367–372 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Sweeney, S. M. et al. Challenges to using big data in cancer. Cancer Res. 83, 1175–1182 (2023).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Seligson, N. D. et al. Recommendations for patient similarity classes: results of the AMIA 2019 Workshop on Defining Patient Similarity. J. Am. Med. Inform. Assoc. 27, 1808–1812 (2020). This study provides a conceptual roadmap for the development and implementation of patient similarity approaches within medicine broadly.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Allam, A., Dittberner, M., Sintsova, A., Brodbeck, D. & Krauthammer, M. Patient similarity analysis with longitudinal health data. Preprint at https://doi.org/10.48550/arXiv.2005.06630 (2020).

  • Jia, Z., Zeng, X., Duan, H., Lu, X. & Li, H. A patient-similarity-based model for diagnostic prediction. Int. J. Med. Inf. 135, 104073 (2020).

    Article 

    Google Scholar 

  • Navaz, A. N. et al. A novel patient similarity network (PSN) framework based on multi-model deep learning for precision medicine. J. Pers. Med. 12, 768 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang, N. et al. Sequential data-based patient similarity framework for patient outcome prediction: algorithm development. J. Med. Internet Res. 24, e30720 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Savcisens, G. et al. Using sequences of life-events to predict human lives. Nat. Comput. Sci. 4, 43–56 (2023). This study excellently illustrates the power of sequence models to model temporal relationships while maintaining interpretability.

    Article 
    PubMed 

    Google Scholar 

  • Manuilova, I. et al. Identifications of similarity metrics for patients with cancer: protocol for a scoping review. JMIR Res. Protoc. 13, e58705 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature 598, 348–352 (2021).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Osipov, A. et al. The molecular twin artificial-intelligence platform integrates multi-omic data to predict outcomes for pancreatic adenocarcinoma patients. Nat. Cancer 5, 299–314 (2024).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Najgebauer, H. et al. CELLector: genomics-guided selection of cancer in vitro models. Cell Syst. 10, 424–432.e6 (2020).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Sinha, R., Luna, A., Schultz, N. & Sander, C. A pan-cancer survey of cell line tumor similarity by feature-weighted molecular profiles. Cell Rep. Methods 1, 100039 (2021).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Zhao, Y. et al. CUP-AI-Dx: a tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence. EBioMedicine 61, 103030 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Vibert, J. et al. Identification of tissue of origin and guided therapeutic applications in cancers of unknown primary using deep learning and RNA sequencing (TransCUPtomics). J. Mol. Diagn. 23, 1380–1392 (2021).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Darmofal, M. et al. Deep-learning model for tumor-type prediction using targeted clinical genomic sequencing data. Cancer Discov. 14, 1064–1081 (2024).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Bick, A. G. et al. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024).

    Article 

    Google Scholar 

  • Subhashini, R. & Kumar, V. J. S. Evaluating the performance of similarity measures used in document clustering and information retrieval. In Proc. First International Conference on Integrated Intelligent Computing 27–31 (IEEE, 2010).

  • Parimbelli, E., Marini, S., Sacchi, L. & Bellazzi, R. Patient similarity for precision medicine: a systematic review. J. Biomed. Inform. 83, 87–96 (2018).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Cross, J. L., Choma, M. A. & Onofrey, J. A. Bias in medical AI: implications for clinical decision-making. PLoS Digit. Health 3, e0000651 (2024). This study outlines several biases that must be considered for successful AI applications within medicine broadly, especially model developers.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Collins, G. S. et al. Evaluation of clinical prediction models (part 1): from development to external validation. BMJ 384, e074819 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hantel, A. et al. Perspectives of oncologists on the ethical implications of using artificial intelligence for cancer care. JAMA Netw. Open 7, e244077 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Dai, L., Zhu, H. & Liu, D. Patient similarity: methods and applications. Preprint at https://doi.org/10.48550/arXiv.2012.01976 (2020).

  • Aldrighetti, C. M., Niemierko, A., Van Allen, E., Willers, H. & Kamran, S. C. Racial and ethnic disparities among participants in precision oncology clinical studies. JAMA Netw. Open 4, e2133205 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kamran, S. C. et al. Tumor mutations across racial groups in a real-world data registry. JCO Precis. Oncol. 5, 1654–1658 (2021).

    Article 
    PubMed 

    Google Scholar 

  • Cheung, A. T. M. et al. Racial and ethnic disparities in a real-world precision oncology data registry. NPJ Precis. Oncol. 7, 1–6 (2023).

    Google Scholar 

  • Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kehl, K. L. et al. Shareable artificial intelligence to extract cancer outcomes from electronic health records for precision oncology research. Nat. Commun. 15, 1–11 (2024).

    Article 

    Google Scholar 

  • Ehrmann, D. E., Joshi, S., Goodfellow, S. D., Mazwi, M. L. & Eytan, D. Making machine learning matter to clinicians: model actionability in medical decision-making. NPJ Digit. Med. 6, 1–5 (2023).

    Article 

    Google Scholar 

  • Vaccaro, M., Almaatouq, A. & Malone, T. When combinations of humans and AI are useful: a systematic review and meta-analysis. Nat. Hum. Behav. 8, 2293–2303 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Riley, R. D. et al. Evaluation of clinical prediction models (part 2): how to undertake an external validation study. BMJ 384, e074820 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Riley, R. D. et al. Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study. BMJ 384, e074821 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • la Roi-Teeuw, H. M. et al. Don’t be misled: 3 misconceptions about external validation of clinical prediction models. J. Clin. Epidemiol. 172, 111387 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Petersen, C. et al. Recommendations for the safe, effective use of adaptive CDS in the US healthcare system: an AMIA position paper. J. Am. Med. Inform. Assoc. 28, 677–684 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ong, J. C. L. et al. Medical ethics of large language models in medicine. NEJM AI 1, AIra2400038 (2024).

    Article 

    Google Scholar 

  • Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3, e745–e750 (2021). This critical review encourages model developers to focus on model validation instead of interpretability.

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Gilbert, S. & Kather, J. N. Guardrails for the use of generalist AI in cancer care. Nat. Rev. Cancer 24, 357–358 (2024).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Zhou, L. et al. Larger and more instructable language models become less reliable. Nature 634, 61–68 (2024).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Lipkova, J. & Kather, J. N. The age of foundation models. Nat. Rev. Clin. Oncol. 21, 769–770 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Okun, S. A., Lu, D., Sew, K., Subramaniam, A. & Lockwood, W. W. MET activation in lung cancer and response to targeted therapies. Cancers 17, 281 (2025).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Rodon, J. et al. Genomic and transcriptomic profiling expands precision cancer medicine: the WINTHER trial. Nat. Med. 25, 751–758 (2019).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Vaske, O. M. et al. Comparative tumor RNA sequencing analysis for difficult-to-treat pediatric and young adult patients with cancer. JAMA Netw. Open 2, e1913968 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wong, M. et al. Whole genome, transcriptome and methylome profiling enhances actionable target discovery in high-risk pediatric cancer. Nat. Med. 26, 1742–1753 (2020).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Yates, J. & Van Allen, E. M. New horizons at the interface of artificial intelligence and translational cancer research. Cancer Cell 43, 708–727 (2025).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Rehm, H. L. et al. GA4GH: international policies and standards for data sharing across genomic research and healthcare. Cell Genom. 1, 100029 (2021).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Shick, A. A. et al. Transparency of artificial intelligence/machine learning-enabled medical devices. NPJ Digit. Med. 7, 1–4 (2024).

    Article 

    Google Scholar 

  • Bonneville, R. et al. Landscape of microsatellite instability across 39 cancer types. JCO Precis. Oncol. 1, 1–15 (2017).

    Article 

    Google Scholar 

  • Nguyen, L. et al. Pan-cancer landscape of homologous recombination deficiency. Nat. Commun. 11, 5584 (2020).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Jia, P. et al. MSIsensor-pro: fast, accurate, and matched-normal-sample-free detection of microsatellite instability. Genom. Proteom. Bioinform. 18, 65–71 (2020).

    Article 

    Google Scholar 

  • Niu, B. et al. MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30, 1015–1016 (2014).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Ziegler, J. et al. A deep multiple instance learning framework improves microsatellite instability detection from tumor next generation sequencing. Nat. Commun. 16, 136 (2025). This paper presents a deep learning model that increases performance of MSI detection relative to status quo bioinformatic tools while also enabling tissue conservation.

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Sztupinszki, Z. et al. Migrating the SNP array-based homologous recombination deficiency measures to next generation sequencing data of breast cancer. NPJ Breast Cancer 4, 1–4 (2018).

    Article 
    CAS 

    Google Scholar 

  • Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B. S. & Swanton, C. deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Díaz-Gay, M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, btad756 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gulhan, D. C., Lee, J. J.-K., Melloni, G. E. M., Cortés-Ciriano, I. & Park, P. J. Detecting the mutational signature of homologous recombination deficiency in clinical samples. Nat. Genet. 51, 912–919 (2019).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Laprovitera, N. et al. Cancer of unknown primary: challenges and progress in clinical management. Cancers 13, 451 (2021).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Belenkaya, R. et al. Extending the OMOP common data model and standardized vocabularies to support observational cancer research. JCO Clin. Cancer Inform. 5, 12–20 (2021).

    Article 
    PubMed 

    Google Scholar 



  • Source link