Convergence of machine learning and genomics for precision oncology

Suehnholz, S. P. et al. Quantifying the expanding landscape of clinical actionability for patients with cancer. Cancer Discov. 14, 49–65 (2023).

Article
PubMed Central

Google Scholar

Horak, P. & Fröhling, S. Measuring progress in precision oncology. Cancer Discov. 14, 18–19 (2024).

Article
PubMed

Google Scholar

Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 1, 1–16 (2017).

Article

Google Scholar

Griffith, M. et al. CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat. Genet. 49, 170–174 (2017).

Article
PubMed
PubMed Central
CAS

Google Scholar

Reardon, B. et al. Integrating molecular profiles into clinical frameworks through the Molecular Oncology Almanac to prospectively guide precision oncology. Nat. Cancer 2, 1102–1112 (2021).

Article
PubMed
PubMed Central

Google Scholar

Luchini, C., Lawlor, R. T., Milella, M. & Scarpa, A. Molecular tumor boards in clinical practice. Trends Cancer 6, 738–744 (2020).

Article
PubMed

Google Scholar

Gladstone, B. P. et al. Systematic review and meta-analysis of molecular tumor board data on clinical effectiveness and evaluation gaps. NPJ Precis. Oncol. 9, 96 (2025).

Article
PubMed
PubMed Central

Google Scholar

Nichetti, F. et al. Real-world outcomes of molecular tumor board treatment recommendations. JCO Precis. Oncol. 9, e2400387 (2025).

Article
PubMed

Google Scholar

The AACR Project GENIE Consortium et al. AACR Project GENIE: powering precision medicine through an international consortium. Cancer Discov. 7, 818–831 (2017).

Article
PubMed Central

Google Scholar

Pugh, T. J. et al. AACR project GENIE: 100,000 cases and beyond. Cancer Discov. 12, 2044–2057 (2022).

Article
PubMed
PubMed Central

Google Scholar

Wang, S. & Ye, K. Deep-learning based representation and recognition for genome variants — from SNVs to structural variants. Natl Sci. Rev. 11, nwae335 (2024).

Article
PubMed
PubMed Central
CAS

Google Scholar

Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018). This paper, the publication of DeepVariant, brought the proliferation of machine learning to bioinformatics, demonstrating that traditional heuristic and statistical approaches to variant calling could be outperformed.

Article
PubMed
CAS

Google Scholar

Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).

Article
PubMed
PubMed Central
CAS

Google Scholar

AlDubayan, S. H. et al. Detection of pathogenic variants with germline genetic testing using deep learning vs standard methods in patients with prostate cancer and melanoma. JAMA 324, 1957–1969 (2020).

Article
PubMed
PubMed Central
CAS

Google Scholar

Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019).

Article
PubMed
PubMed Central
CAS

Google Scholar

Olson, N. D. et al. PrecisionFDA Truth Challenge V2: calling variants from short and long reads in difficult-to-map regions. Cell Genom. 2, 100129 (2022). This paper illustrates the methodological shift of variant callers towards using machine learning while also highlighting challenge areas for future developers.

Article
PubMed
PubMed Central
CAS

Google Scholar

Mandiracioglu, B. et al. ECOLE: learning to call copy number variants on whole exome sequencing data. Nat. Commun. 15, 132 (2024).

Article
PubMed
PubMed Central
CAS

Google Scholar

Popic, V. et al. Cue: a deep-learning framework for structural variant discovery and genotyping. Nat. Methods 20, 559–568 (2023).

Article
PubMed
PubMed Central
CAS

Google Scholar

Behera, S. et al. Comprehensive genome analysis and variant detection at scale using DRAGEN. Nat. Biotechnol. 43, 1177–1191 (2024).

Article
PubMed
PubMed Central

Google Scholar

Yi, R., Chang, P.-C., Baid, G. & Carroll, A. Learning from data-rich problems: a case study on genetic variant calling. Preprint at https://doi.org/10.48550/arXiv.1911.05151 (2019).

Scheffler, K. et al. Somatic small-variant calling methods in Illumina DRAGEN^TM Secondary Analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.03.23.534011 (2023).

Park, J. et al. Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic. Nat. Biotechnol. https://doi.org/10.1038/s41587-025-02839-x (2025).

Article
PubMed
PubMed Central

Google Scholar

Betschart, R. O. et al. Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment. Sci. Rep. 12, 21502 (2022).

Article
PubMed
PubMed Central
CAS

Google Scholar

Roy, S. et al. Standards and guidelines for validating next-generation sequencing bioinformatics pipelines. J. Mol. Diagn. 20, 4–27 (2018).

Article
PubMed
CAS

Google Scholar

van de Haar, J. et al. ESMO recommendations on clinical reporting of genomic test results for solid cancers. Ann. Oncol. 35, 954–967 (2024).

Article
PubMed

Google Scholar

Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).

Article
PubMed
PubMed Central

Google Scholar

den Dunnen, J. T. et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum. Mutat. 37, 564–569 (2016).

Article

Google Scholar

Holmes, J. B., Moyer, E., Phan, L., Maglott, D. & Kattman, B. SPDI: data model for variants and applications at NCBI. Bioinformatics 36, 1902–1907 (2020).

Article
PubMed
CAS

Google Scholar

Wang, M. et al. hgvs: a python package for manipulating sequence variants using HGVS nomenclature: 2018 update. Hum. Mutat. 39, 1803–1813 (2018).

Article
PubMed

Google Scholar

Lefter, M. et al. Mutalyzer 2: next generation HGVS nomenclature checker. Bioinformatics 37, 2811–2817 (2021).

Article
PubMed
PubMed Central
CAS

Google Scholar

van Giffen, B., Herhausen, D. & Fahse, T. Overcoming the pitfalls and perils of algorithms: a classification of machine learning biases and mitigation methods. J. Bus. Res. 144, 93–106 (2022).

Article

Google Scholar

Singh, D. & Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 97, 105524 (2020).

Article

Google Scholar

Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 50, D20–D26 (2022).

Article
PubMed
CAS

Google Scholar

Freeman, P. J., Hart, R. K., Gretton, L. J., Brookes, A. J. & Dalgleish, R. VariantValidator: accurate validation, mapping, and formatting of sequence variation descriptions. Hum. Mutat. 39, 61–68 (2018).

Article
PubMed

Google Scholar

Freeman, P. J. et al. Standardizing variant naming in literature with VariantValidator to increase diagnostic rates. Nat. Genet. 56, 2284–2286 (2024).

Article
PubMed
CAS

Google Scholar

McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

Article
PubMed
PubMed Central

Google Scholar

Wagner, A. H. et al. The GA4GH Variation Representation Specification: a computational framework for variation representation and federated identification. Cell Genom. 1, 100027 (2021). This paper shows that VRS enables semantically precise, computable variant representation that facilitates further downstream bioinformatic applications and machine learning models.

Article
PubMed
PubMed Central
CAS

Google Scholar

Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024).

Article
PubMed
CAS

Google Scholar

Arbesfeld, J. A. et al. Mapping MAVE data for use in human genomics applications. Genome Biol. 26, 179 (2025).

Article
PubMed
PubMed Central

Google Scholar

Pagel, K. A. et al. Integrated informatics analysis of cancer-related variants. JCO Clin. Cancer Inform. 4, 310–317 (2020).

Article
PubMed

Google Scholar

Bruijn, I. et al. Genome Nexus: a comprehensive resource for the annotation and interpretation of genomic variants in cancer. JCO Clin. Cancer Inform. 6, e2100144 (2022).

Article
PubMed
PubMed Central

Google Scholar

Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).

Article
PubMed
PubMed Central

Google Scholar

Durkie, M. et al. ACGS Best Practice Guidelines for Variant Classification in Rare Disease (ACGS, 2024).

Horak, P. et al. Standards for the classification of pathogenicity of somatic variants in cancer (oncogenicity): joint recommendations of Clinical Genome Resource (ClinGen), Cancer Genomics Consortium (CGC), and Variant Interpretation for Cancer Consortium (VICC). Genet. Med. 24, 986–998 (2022).

Article
PubMed
PubMed Central
CAS

Google Scholar

Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).

Article
PubMed
CAS

Google Scholar

Brandes, N., Goldman, G., Wang, C. H., Ye, C. J. & Ntranos, V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat. Genet. 55, 1512–1522 (2023).

Article
PubMed
PubMed Central
CAS

Google Scholar

Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023). This paper shows DeepMind’s AlphaMissense and introduces it as a transformative deep learning model for missense variant effect prediction that was rigorously evaluated for its utility within pathogenicity assessments.

Article
PubMed
CAS

Google Scholar

Kurtovic-Kozaric, A. et al. Comprehensive evaluation of AlphaMissense predictions by evidence quantification for variants of uncertain significance. Front. Genet. 15, 1487608 (2024).

Article
PubMed
PubMed Central
CAS

Google Scholar

Muiños, F., Martínez-Jiménez, F., Pich, O., Gonzalez-Perez, A. & Lopez-Bigas, N. In silico saturation mutagenesis of cancer genes. Nature 596, 428–432 (2021).

Article
PubMed

Google Scholar

Demajo, S. et al. Identification of clonal hematopoiesis driver mutations through in silico saturation mutagenesis. Cancer Discov. 14, 1717–1731 (2024).

Article
PubMed
PubMed Central

Google Scholar

Vihinen, M. Problems in variation interpretation guidelines and in their implementation in computational tools. Mol. Genet. Genom. Med. 8, e1206 (2020).

Article

Google Scholar

Fayer, S. et al. Closing the gap: systematic integration of multiplexed functional data resolves variants of uncertain significance in BRCA1, TP53, and PTEN. Am. J. Hum. Genet. 108, 2248–2258 (2021).

Article
PubMed
PubMed Central
CAS

Google Scholar

Rubin, A. F. et al. MaveDB 2024: a curated community database with over seven million variant effects from multiplexed functional assays. Genome Biol. 26, 13 (2025).

Article
PubMed
PubMed Central

Google Scholar

Arafeh, R., Shibue, T., Dempster, J. M., Hahn, W. C. & Vazquez, F. The present and future of the cancer dependency map. Nat. Rev. Cancer 25, 59–73 (2025).

Article
PubMed
CAS

Google Scholar

Brixi, G. et al. Genome modeling and design across all domains of life with Evo 2. Preprint at bioRxiv https://doi.org/10.1101/2025.02.18.638918 (2025).

Avsec, Ž. et al. AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model. Preprint at bioRxiv https://doi.org/10.1101/2025.06.25.661532 (2025).

Li, M. M. et al. Standards and guidelines for the interpretation and reporting of sequence variants in cancer. J. Mol. Diagn. 19, 4–23 (2017).

Article
PubMed
PubMed Central
CAS

Google Scholar

Mateo, J. et al. A framework to rank genomic alterations as targets for cancer precision medicine: the ESMO Scale for Clinical Actionability of Molecular Targets (ESCAT). Ann. Oncol. 29, 1895–1902 (2018).

Article
PubMed
PubMed Central
CAS

Google Scholar

He, M. M. et al. Variant Interpretation for Cancer (VIC): a computational tool for assessing clinical impacts of somatic variants. Genome Med. 11, 53 (2019).

Article
PubMed
PubMed Central

Google Scholar

Li, Q. et al. CancerVar: an artificial intelligence-empowered platform for clinical interpretation of somatic mutations in cancer. Sci. Adv. 8, eabj1624 (2022).

Article
PubMed
PubMed Central
CAS

Google Scholar

Ruzicka, J. et al. Clinical evaluation of an AI system for streamlined variant interpretation in genetic testing. Preprint at medRxiv https://doi.org/10.1101/2025.02.04.25321641 (2025).

Lammert, J. et al. Large language models for precision oncology: clinical decision support through expert-guided learning. J. Clin. Oncol. 42, e13609 (2024).

Article

Google Scholar

Klein, H. et al. MatchMiner: an open-source platform for cancer precision medicine. NPJ Precis. Oncol. 6, 69 (2022). The authors introduce a clinical trial matching platform and a structured format for enrolment criteria to facilitate clinical trial matching for precision oncology, addressing a historically intractable problem within the field.

Article
PubMed
PubMed Central

Google Scholar

Lotter, W. et al. Artificial intelligence in oncology: current landscape, challenges, and future directions. Cancer Discov. 14, 711–726 (2024).

Article
PubMed
PubMed Central

Google Scholar

Wong, C. et al. Scaling clinical trial matching using large language models: a case study in oncology. In Proc. 8th Machine Learning for Healthcare Conference 846–862 (PMLR, 2023).

Jin, Q. et al. Matching patients to clinical trials with large language models. Nat. Commun. 15, 9074 (2024).

Article
PubMed
PubMed Central
CAS

Google Scholar

Cerami, E. et al. MatchMiner-AI: an open-source solution for cancer clinical trial matching. Preprint at https://doi.org/10.48550/arXiv.2412.17228 (2024).

Reisle, C. et al. Evaluating language models for biomedical fact-checking: a benchmark dataset for cancer variant interpretation verification. Preprint at bioRxiv https://doi.org/10.1101/2025.09.10.675443 (2025).

Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems 33, 9459–9474 (Curran Associates, 2020).

Jun, H. et al. Implementing a context-augmented large language model to guide precision cancer medicine. Preprint at medRxiv https://doi.org/10.1101/2025.05.09.25327312 (2025).

Schick, T. et al. Toolformer: language models can teach themselves to use tools. In Advances in Neural Information Processing Systems 36, 68539–68551 (Curran Associates, 2023).

Yao, S. et al. ReAct: synergizing reasoning and acting in language models. Preprint at https://doi.org/10.48550/arXiv.2210.03629 (2023).

Gao, S. et al. TxAgent: an AI agent for therapeutic reasoning across a universe of tools. Preprint at https://doi.org/10.48550/arXiv.2503.10970 (2025).

Ferber, D. et al. Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology. Nat. Cancer 6, 1337–1349 (2025). This study is one of the most prominent illustrations of agentic AI systems being applied to precision oncology to support a wide array of clinical decision-making tasks.

Article
PubMed
PubMed Central
CAS

Google Scholar

Benary, M. et al. Leveraging large language models for decision support in personalized oncology. JAMA Netw. Open 6, e2343689 (2023).

Article
PubMed
PubMed Central

Google Scholar

Verlingue, L. et al. Artificial intelligence in oncology: ensuring safe and effective integration of language models in clinical practice. Lancet Reg. Health Eur. 46, 101064 (2024).

Article
PubMed
PubMed Central

Google Scholar

Elemento, O., Khozin, S. & Sternberg, C. N. The use of artificial intelligence for cancer therapeutic decision-making. NEJM AI 2, AIra2401164 (2025).

Article

Google Scholar

Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

Yang, K., Qinami, K., Fei-Fei, L., Deng, J. & Russakovsky, O. Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy. In Proc. 2020 Conference on Fairness, Accountability, and Transparency 547–558 (ACM, 2020).

Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

Article
PubMed
PubMed Central
CAS

Google Scholar

Acebedo, A. et al. Collaborating across sectors in service of open science, precision oncology, and patients: an overview of the AACR Project GENIE (Genomics Evidence Neoplasia Information Exchange) Biopharma Collaborative (BPC). ESMO Real World Data Digit. Oncol. 7, 100097 (2025).

Article

Google Scholar

Painter, C. A. et al. The Angiosarcoma Project: enabling genomic and clinical discoveries in a rare cancer through patient-partnered research. Nat. Med. 26, 181–187 (2020).

Article
PubMed
CAS

Google Scholar

Crowdis, J. et al. A patient-driven clinicogenomic partnership for metastatic prostate cancer. Cell Genom. 2, 100169 (2022).

Article
PubMed
PubMed Central
CAS

Google Scholar

Lee, E., Jung, S. Y., Hwang, H. J. & Jung, J. Patient-level cancer prediction models from a nationwide patient cohort: model development and validation. JMIR Med. Inform. 9, e29807 (2021).

Article
PubMed
PubMed Central

Google Scholar

Placido, D. et al. A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories. Nat. Med. 29, 1113–1122 (2023).

Article
PubMed
PubMed Central
CAS

Google Scholar

Buk Cardoso, L. et al. Machine learning for predicting survival of colorectal cancer patients. Sci. Rep. 13, 8874 (2023).

Article
PubMed
PubMed Central
CAS

Google Scholar

Moon, I. et al. Machine learning for genetics-based classification and treatment response prediction in cancer of unknown primary. Nat. Med. 29, 2057–2067 (2023).

Article
PubMed
PubMed Central
CAS

Google Scholar

Jee, J. et al. Automated real-world data integration improves cancer outcome prediction. Nature 636, 728–736 (2024). This paper shows MSKCC leveraging their data warehouse to develop a machine learning model to predict clinical outcomes, a paradigm that will continue to define clinicogenomic discoveries in the near term.

Article
PubMed
PubMed Central
CAS

Google Scholar

Rieke, N. et al. The future of digital health with federated learning. NPJ Digit. Med. 3, 1–7 (2020).

Article

Google Scholar

Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10, 12598 (2020).

Article
PubMed
PubMed Central

Google Scholar

Pati, S. et al. Federated learning enables big data for rare cancer boundary detection. Nat. Commun. 13, 7346 (2022).

Article
PubMed
PubMed Central
CAS

Google Scholar

Brauneck, A. et al. Federated machine learning in data-protection-compliant research. Nat. Mach. Intell. 5, 2–4 (2023).

Article

Google Scholar

Ogier du Terrail, J. et al. Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer. Nat. Med. 29, 135–146 (2023).

Article
PubMed
CAS

Google Scholar

Stark, Z. et al. A call to action to scale up research and clinical genomic data sharing. Nat. Rev. Genet. 26, 141–147 (2024). This study outline several steps to data sharing and harmonization that can enable clinicogenomic datasets of thousands of patients with cancer, enabling biological discovery and machine learning models that generalize across institutions.

Article
PubMed

Google Scholar

Fiume, M. et al. Federated discovery and sharing of genomic data using Beacons. Nat. Biotechnol. 37, 220–224 (2019). This study describes the Beacon protocol of GA4GH for federated data sharing, and it has become ubiquitous with federated learning within genomics.

Article
PubMed
PubMed Central
CAS

Google Scholar

Elhussein, A., Baymuradov, U., Elhadad, N., Natarajan, K. & Gürsoy, G. A framework for sharing of clinical and genetic data for precision medicine applications. Nat. Med. 30, 3578–3589 (2024).

Article
PubMed
PubMed Central
CAS

Google Scholar

Cho, H. et al. Secure and federated genome-wide association studies for biobank-scale datasets. Nat. Genet. 57, 809–814 (2025).

Article
PubMed
PubMed Central
CAS

Google Scholar

Hanser, T. et al. Data-driven federated learning in drug discovery with knowledge distillation. Nat. Mach. Intell. 7, 423–436 (2025).

Article

Google Scholar

Riba, M. et al. The 1+Million Genomes Minimal Dataset for Cancer. Nat. Genet. 56, 733–736 (2024).

Article
PubMed
CAS

Google Scholar

Kehl, K. L. et al. Assessment of deep natural language processing in ascertaining oncologic outcomes from radiology reports. JAMA Oncol. 5, 1421–1429 (2019).

Article
PubMed
PubMed Central

Google Scholar

Kehl, K. L. et al. Natural language processing to ascertain cancer outcomes from medical oncologist notes. JCO Clin. Cancer Inform. 4, 680–690 (2020).

Article
PubMed

Google Scholar

Sushil, M. et al. CORAL: expert-curated oncology reports to advance language model inference. NEJM AI 1, AIdbp2300110 (2024).

Article

Google Scholar

Hoes, L. R. et al. Patients with rare cancers in the Drug Rediscovery Protocol (DRUP) benefit from genomics-guided treatment. Clin. Cancer Res. 28, 1402–1411 (2022).

Article
PubMed
PubMed Central
CAS

Google Scholar

Helland, Å et al. Improving public cancer care by implementing precision medicine in Norway: IMPRESS-Norway. J. Transl. Med. 20, 225 (2022).

Article
PubMed
PubMed Central

Google Scholar

Mohammad, S. F. H. et al. The evolution of precision oncology: the ongoing impact of the Drug Rediscovery Protocol (DRUP). Acta Oncol. 63, 34885 (2024).

Google Scholar

Nikolski, M. et al. Roadmap for a European cancer data management and precision medicine infrastructure. Nat. Cancer 5, 367–372 (2024).

Article
PubMed

Google Scholar

Sweeney, S. M. et al. Challenges to using big data in cancer. Cancer Res. 83, 1175–1182 (2023).

Article
PubMed
PubMed Central
CAS

Google Scholar

Seligson, N. D. et al. Recommendations for patient similarity classes: results of the AMIA 2019 Workshop on Defining Patient Similarity. J. Am. Med. Inform. Assoc. 27, 1808–1812 (2020). This study provides a conceptual roadmap for the development and implementation of patient similarity approaches within medicine broadly.

Article
PubMed
PubMed Central

Google Scholar

Allam, A., Dittberner, M., Sintsova, A., Brodbeck, D. & Krauthammer, M. Patient similarity analysis with longitudinal health data. Preprint at https://doi.org/10.48550/arXiv.2005.06630 (2020).

Jia, Z., Zeng, X., Duan, H., Lu, X. & Li, H. A patient-similarity-based model for diagnostic prediction. Int. J. Med. Inf. 135, 104073 (2020).

Article

Google Scholar

Navaz, A. N. et al. A novel patient similarity network (PSN) framework based on multi-model deep learning for precision medicine. J. Pers. Med. 12, 768 (2022).

Article
PubMed
PubMed Central

Google Scholar

Wang, N. et al. Sequential data-based patient similarity framework for patient outcome prediction: algorithm development. J. Med. Internet Res. 24, e30720 (2022).

Article
PubMed
PubMed Central

Google Scholar

Savcisens, G. et al. Using sequences of life-events to predict human lives. Nat. Comput. Sci. 4, 43–56 (2023). This study excellently illustrates the power of sequence models to model temporal relationships while maintaining interpretability.

Article
PubMed

Google Scholar

Manuilova, I. et al. Identifications of similarity metrics for patients with cancer: protocol for a scoping review. JMIR Res. Protoc. 13, e58705 (2024).

Article
PubMed
PubMed Central

Google Scholar

Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature 598, 348–352 (2021).

Article
PubMed
PubMed Central
CAS

Google Scholar

Osipov, A. et al. The molecular twin artificial-intelligence platform integrates multi-omic data to predict outcomes for pancreatic adenocarcinoma patients. Nat. Cancer 5, 299–314 (2024).

Article
PubMed
PubMed Central
CAS

Google Scholar

Najgebauer, H. et al. CELLector: genomics-guided selection of cancer in vitro models. Cell Syst. 10, 424–432.e6 (2020).

Article
PubMed
CAS

Google Scholar

Sinha, R., Luna, A., Schultz, N. & Sander, C. A pan-cancer survey of cell line tumor similarity by feature-weighted molecular profiles. Cell Rep. Methods 1, 100039 (2021).

Article
PubMed
PubMed Central
CAS

Google Scholar

Zhao, Y. et al. CUP-AI-Dx: a tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence. EBioMedicine 61, 103030 (2020).

Article
PubMed
PubMed Central

Google Scholar

Vibert, J. et al. Identification of tissue of origin and guided therapeutic applications in cancers of unknown primary using deep learning and RNA sequencing (TransCUPtomics). J. Mol. Diagn. 23, 1380–1392 (2021).

Article
PubMed
CAS

Google Scholar

Darmofal, M. et al. Deep-learning model for tumor-type prediction using targeted clinical genomic sequencing data. Cancer Discov. 14, 1064–1081 (2024).

Article
PubMed
PubMed Central
CAS

Google Scholar

Bick, A. G. et al. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024).

Article

Google Scholar

Subhashini, R. & Kumar, V. J. S. Evaluating the performance of similarity measures used in document clustering and information retrieval. In Proc. First International Conference on Integrated Intelligent Computing 27–31 (IEEE, 2010).

Parimbelli, E., Marini, S., Sacchi, L. & Bellazzi, R. Patient similarity for precision medicine: a systematic review. J. Biomed. Inform. 83, 87–96 (2018).

Article
PubMed
CAS

Google Scholar

Cross, J. L., Choma, M. A. & Onofrey, J. A. Bias in medical AI: implications for clinical decision-making. PLoS Digit. Health 3, e0000651 (2024). This study outlines several biases that must be considered for successful AI applications within medicine broadly, especially model developers.

Article
PubMed
PubMed Central

Google Scholar

Collins, G. S. et al. Evaluation of clinical prediction models (part 1): from development to external validation. BMJ 384, e074819 (2024).

Article
PubMed
PubMed Central

Google Scholar

Hantel, A. et al. Perspectives of oncologists on the ethical implications of using artificial intelligence for cancer care. JAMA Netw. Open 7, e244077 (2024).

Article
PubMed
PubMed Central

Google Scholar

Dai, L., Zhu, H. & Liu, D. Patient similarity: methods and applications. Preprint at https://doi.org/10.48550/arXiv.2012.01976 (2020).

Aldrighetti, C. M., Niemierko, A., Van Allen, E., Willers, H. & Kamran, S. C. Racial and ethnic disparities among participants in precision oncology clinical studies. JAMA Netw. Open 4, e2133205 (2021).

Article
PubMed
PubMed Central

Google Scholar

Kamran, S. C. et al. Tumor mutations across racial groups in a real-world data registry. JCO Precis. Oncol. 5, 1654–1658 (2021).

Article
PubMed

Google Scholar

Cheung, A. T. M. et al. Racial and ethnic disparities in a real-world precision oncology data registry. NPJ Precis. Oncol. 7, 1–6 (2023).

Google Scholar

Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).

Article
PubMed
PubMed Central

Google Scholar

Kehl, K. L. et al. Shareable artificial intelligence to extract cancer outcomes from electronic health records for precision oncology research. Nat. Commun. 15, 1–11 (2024).

Article

Google Scholar

Ehrmann, D. E., Joshi, S., Goodfellow, S. D., Mazwi, M. L. & Eytan, D. Making machine learning matter to clinicians: model actionability in medical decision-making. NPJ Digit. Med. 6, 1–5 (2023).

Article

Google Scholar

Vaccaro, M., Almaatouq, A. & Malone, T. When combinations of humans and AI are useful: a systematic review and meta-analysis. Nat. Hum. Behav. 8, 2293–2303 (2024).

Article
PubMed
PubMed Central

Google Scholar

Riley, R. D. et al. Evaluation of clinical prediction models (part 2): how to undertake an external validation study. BMJ 384, e074820 (2024).

Article
PubMed
PubMed Central

Google Scholar

Riley, R. D. et al. Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study. BMJ 384, e074821 (2024).

Article
PubMed
PubMed Central

Google Scholar

la Roi-Teeuw, H. M. et al. Don’t be misled: 3 misconceptions about external validation of clinical prediction models. J. Clin. Epidemiol. 172, 111387 (2024).

Article
PubMed

Google Scholar

Petersen, C. et al. Recommendations for the safe, effective use of adaptive CDS in the US healthcare system: an AMIA position paper. J. Am. Med. Inform. Assoc. 28, 677–684 (2021).

Article
PubMed
PubMed Central

Google Scholar

Ong, J. C. L. et al. Medical ethics of large language models in medicine. NEJM AI 1, AIra2400038 (2024).

Article

Google Scholar

Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3, e745–e750 (2021). This critical review encourages model developers to focus on model validation instead of interpretability.

Article
PubMed
CAS

Google Scholar

Gilbert, S. & Kather, J. N. Guardrails for the use of generalist AI in cancer care. Nat. Rev. Cancer 24, 357–358 (2024).

Article
PubMed
CAS

Google Scholar

Zhou, L. et al. Larger and more instructable language models become less reliable. Nature 634, 61–68 (2024).

Article
PubMed
PubMed Central
CAS

Google Scholar

Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

Article
PubMed
PubMed Central
CAS

Google Scholar

Lipkova, J. & Kather, J. N. The age of foundation models. Nat. Rev. Clin. Oncol. 21, 769–770 (2024).

Article
PubMed

Google Scholar

Okun, S. A., Lu, D., Sew, K., Subramaniam, A. & Lockwood, W. W. MET activation in lung cancer and response to targeted therapies. Cancers 17, 281 (2025).

Article
PubMed
PubMed Central
CAS

Google Scholar

Rodon, J. et al. Genomic and transcriptomic profiling expands precision cancer medicine: the WINTHER trial. Nat. Med. 25, 751–758 (2019).

Article
PubMed
PubMed Central
CAS

Google Scholar

Vaske, O. M. et al. Comparative tumor RNA sequencing analysis for difficult-to-treat pediatric and young adult patients with cancer. JAMA Netw. Open 2, e1913968 (2019).

Article
PubMed
PubMed Central

Google Scholar

Wong, M. et al. Whole genome, transcriptome and methylome profiling enhances actionable target discovery in high-risk pediatric cancer. Nat. Med. 26, 1742–1753 (2020).

Article
PubMed
CAS

Google Scholar

Yates, J. & Van Allen, E. M. New horizons at the interface of artificial intelligence and translational cancer research. Cancer Cell 43, 708–727 (2025).

Article
PubMed
CAS

Google Scholar

Rehm, H. L. et al. GA4GH: international policies and standards for data sharing across genomic research and healthcare. Cell Genom. 1, 100029 (2021).

Article
PubMed
PubMed Central
CAS

Google Scholar

Shick, A. A. et al. Transparency of artificial intelligence/machine learning-enabled medical devices. NPJ Digit. Med. 7, 1–4 (2024).

Article

Google Scholar

Bonneville, R. et al. Landscape of microsatellite instability across 39 cancer types. JCO Precis. Oncol. 1, 1–15 (2017).

Article

Google Scholar

Nguyen, L. et al. Pan-cancer landscape of homologous recombination deficiency. Nat. Commun. 11, 5584 (2020).

Article
PubMed
PubMed Central
CAS

Google Scholar

Jia, P. et al. MSIsensor-pro: fast, accurate, and matched-normal-sample-free detection of microsatellite instability. Genom. Proteom. Bioinform. 18, 65–71 (2020).

Article

Google Scholar

Niu, B. et al. MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30, 1015–1016 (2014).

Article
PubMed
CAS

Google Scholar

Ziegler, J. et al. A deep multiple instance learning framework improves microsatellite instability detection from tumor next generation sequencing. Nat. Commun. 16, 136 (2025). This paper presents a deep learning model that increases performance of MSI detection relative to status quo bioinformatic tools while also enabling tissue conservation.

Article
PubMed
PubMed Central
CAS

Google Scholar

Sztupinszki, Z. et al. Migrating the SNP array-based homologous recombination deficiency measures to next generation sequencing data of breast cancer. NPJ Breast Cancer 4, 1–4 (2018).

Article
CAS

Google Scholar

Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

Article
PubMed
PubMed Central
CAS

Google Scholar

Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B. S. & Swanton, C. deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016).

Article
PubMed
PubMed Central

Google Scholar

Díaz-Gay, M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, btad756 (2023).

Article
PubMed
PubMed Central

Google Scholar

Gulhan, D. C., Lee, J. J.-K., Melloni, G. E. M., Cortés-Ciriano, I. & Park, P. J. Detecting the mutational signature of homologous recombination deficiency in clinical samples. Nat. Genet. 51, 912–919 (2019).

Article
PubMed
CAS

Google Scholar

Laprovitera, N. et al. Cancer of unknown primary: challenges and progress in clinical management. Cancers 13, 451 (2021).

Article
PubMed
PubMed Central
CAS

Google Scholar

Belenkaya, R. et al. Extending the OMOP common data model and standardized vocabularies to support observational cancer research. JCO Clin. Cancer Inform. 5, 12–20 (2021).

Article
PubMed

Google Scholar