Open and sustainable AI: challenges, opportunities and the road ahead in the life sciences

Machine Learning


  • Walsh, I. et al. DOME: recommendations for supervised machine learning validation in biology. Nat. Methods 18, 1122–1127 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Luo, M. et al. Artificial intelligence for life sciences: a comprehensive guide and future trends. Innov. Life 2, 100105 (2024).

    Article 
    CAS 

    Google Scholar 

  • Paysan-Lafosse, T. et al. The Pfam protein families database: embracing AI/ML. Nucleic Acids Res. 53, D523–D534 (2025).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kapoor, S. & Narayanan, A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns 4, 100804 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Clark, T. et al. AI-readiness for biomedical data: Bridge2AI recommendations. Preprint at bioRxiv https://doi.org/10.1101/2024.10.23.619844 (2024).

  • Tedersoo, L. et al. Data sharing practices and data availability upon request differ across scientific disciplines. Sci. Data 8, 192 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Laurinavichyute, A., Yadav, H. & Vasishth, S. Share the code, not just the data: a case study of the reproducibility of articles published in the Journal of Memory and Language under the open data policy. J. Mem. Lang. 125, 104332 (2022).

    Article 

    Google Scholar 

  • Alper, P. et al. RDMkit: A research data management toolkit for life sciences. Patterns 6, 101345 (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pistoia Alliance. The FAIR toolkit for life science industry. https://fairtoolkit.pistoiaalliance.org (2020).

  • Ouyang, W. et al. BioImage Model Zoo: a community-driven resource for accessible deep learning in bioimage analysis. Preprint at bioRxiv https://doi.org/10.1101/2022.06.07.495102 (2022).

  • Avsec, Ž et al. The Kipoi repository accelerates community exchange and reuse of predictive models for genomics. Nat. Biotechnol. 37, 592–600 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Akhtar, M. et al. Croissant: a metadata format for ML-ready datasets. In Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning (eds Hulsebos, M., Interlandi, M. & Shankar, S.) 1–6 (Association for Computing Machinery, 2024).

  • Research Data Alliance. RDA FAIR for Machine Learning (FAIR4ML) Interest Group. https://www.rd-alliance.org/groups/fair-machine-learning-fair4ml-ig/activity (2022).

  • Beam, A. L., Manrai, A. K. & Ghassemi, M. Challenges to the reproducibility of machine learning models in health care. JAMA 323, 305–306 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Unsal, S. et al. Learning functional properties of proteins with language models. Nat. Mach. Intell. 4, 227–245 (2022).

    Article 

    Google Scholar 

  • Sapkota, R., Roumeliotis, K. I. & Karkee, M. AI agents vs. agentic AI: A conceptual taxonomy, applications and challenges. Inf. Fusion 126, 103599 (2026).

    Article 

    Google Scholar 

  • Schwartz, R., Dodge, J., Smith, N. A. & Etzioni, O. Green AI. ACM 63, 54–63 (2020).

    Article 

    Google Scholar 

  • White, M. et al. The Model Openness Framework: promoting completeness and openness for reproducibility, transparency, and usability in artificial intelligence. Preprint at https://doi.org/10.48550/arXiv.2403.13784 (2024).

  • Lekadir, K. et al. FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ 388, e081554 (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kapoor, S. et al. REFORMS: consensus-based recommendations for machine-learning-based science. Sci. Adv. 10, eadk3452 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Machine Learning Commons. MLCommons: better AI for everyone. https://mlcommons.org (2025).

  • FAIR Advanced Research and Reproducibility (FARR) Research Coordination Network. FARR RCN. https://www.farr-rcn.org (2025).

  • Rai, A. Explainable AI: from black box to glass box. J. Acad. Mark. Sci. 48, 137–141 (2020).

    Article 

    Google Scholar 

  • Afroogh, S., Akbari, A., Malone, E., Kargar, M. & Alambeigi, H. Trust in AI: progress, challenges, and future directions. Humanit. Soc. Sci. Commun. 11, 1568 (2024).

    Article 

    Google Scholar 

  • Leslie, D. Understanding Artificial Intelligence Ethics and Safety: a Guide for the Responsible Design and Implementation of AI Systems in the Public Sector (The Alan Turing Institute, 2019).

  • Dignum, V. Responsible artificial intelligence: from principles to practice. Preprint at https://doi.org/10.48550/arXiv.2205.10785 (2022).

  • Ahdritz, G. et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat. Methods 21, 1514–1524 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Collins, G. S. et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 384, e078378 (2024).

    Article 

    Google Scholar 

  • Schmied, C. et al. Community-developed checklists for publishing images and image analyses. Nat. Methods 21, 170–181 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Liu, X. et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat. Med. 26, 1364–1374 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Cruz Rivera, S. et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat. Med. 26, 1351–1363 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kaggle. Kaggle: your machine learning and data science community. https://www.kaggle.com (2025).

  • Wolf, T. et al. HuggingFace’s Transformers: state-of-the-art natural language processing. Preprint at https://doi.org/10.48550/arXiv.1910.03771 (2019).

  • Turon, G., Legese, A., Arora, D. & Duran-Frigola, M. Ersilia Model Hub: a repository of AI/ML models for infectious and neglected tropical diseases. Zenodo https://doi.org/10.5281/ZENODO.7274645 (2025).

  • European Organization For Nuclear Research (CERN) & OpenAIRE. Zenodo https://doi.org/10.25495/7GXK-RD71 (2013).

  • Leo, S. et al. Recording provenance of workflow runs with RO-Crate. PLoS ONE 19, e0309210 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Huerta, E. A. et al. FAIR for AI: an interdisciplinary and international community building perspective. Sci. Data 10, 487 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Castro, L. J. et al. FAIR4ML-schema. Zenodo https://doi.org/10.5281/ZENODO.14002310 (2024).

  • Pistoia Alliance. Pistoia Alliance organisation website. https://www.pistoiaalliance.org (2025).

  • Open Data Institute. A framework for AI-ready data. https://theodi.hacdn.io/media/documents/A_framework_for_AI-ready_data.pdf (2025).

  • Scientific Computing World. Pistoia Alliance launches DataFAIRy to drive AI adoption. https://www.scientific-computing.com/news/pistoia-alliance-launches-datafairy-drive-ai-adoption (2024).

  • Desai, A., Abdelhamid, M. & Padalkar, N. R. What is reproducibility in artificial intelligence and machine learning research? AI Mag. 46, e70004 (2025).

  • Carter, R. E., Attia, Z. I., Lopez-Jimenez, F. & Friedman, P. A. Pragmatic considerations for fostering reproducible research in artificial intelligence. NPJ Digit. Med. 2, 42 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tiwari, D. D. et al. BioModelsML: building a FAIR and reproducible collection of machine learning models in life sciences and medicine for easy reuse. Preprint at bioRxiv https://doi.org/10.1101/2023.05.22.540599 (2023).

  • Merkel, D. Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014, 2 (2014).

    Google Scholar 

  • Anaconda. Conda https://anaconda.org/anaconda/conda (2025).

  • Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).

    Article 
    PubMed 

    Google Scholar 

  • Köster, J. & Rahmann, S. Snakemake: a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).

    Article 
    PubMed 

    Google Scholar 

  • Galaxy Community, T. he et al. The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Res. 52, W83–W94 (2024).

    Article 

    Google Scholar 

  • Heil, B. J. et al. Reproducibility standards for machine learning in the life sciences. Nat. Methods 18, 1132–1135 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Bisong, E. Google Colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform Ch. 7, 59–64 (Apress, 2019).

  • Anthony, L. F. W., Kanding, B. & Selvan, R. Carbontracker: tracking and predicting the carbon footprint of training deep learning models. Preprint at https://doi.org/10.48550/arXiv.2007.03051 (2020).

  • Ritchie, H. et al. Hardware and energy cost to train notable AI systems. Our World in Data https://ourworldindata.org/grapher/hardware-and-energy-cost-to-train-notable-ai-systems (2023).

  • Gailhofer, P. et al. The Role of Artificial Intelligence in the European Green Deal (European Parliament, 2023).

  • Bolón-Canedo, V. et al. A review of green artificial intelligence: towards a more sustainable future. Neurocomputing 599, 128096 (2024).

    Article 

    Google Scholar 

  • EMBL. Sustainability: reports and resources. https://www.embl.org/about/info/sustainability/reports-resources (2025).

  • Yamada, T. et al. Frugal machine learning: making AI more efficient, accessible, and sustainable. Preprint at https://doi.org/10.36227/techrxiv.173385981.11102720/v1 (2024).

  • Tornede, T. et al. Towards green automated machine learning: status quo and future directions. J. Artif. Intell. Res. 77, 427–457 (2023).

    Article 

    Google Scholar 

  • Johnson, S. G., Simon, G. & Aliferis, C. Regulatory aspects and ethical legal societal implications (ELSI). In Artificial Intelligence and Machine Learning in Health Care and Medical Sciences (eds Simon, G. J. & Aliferis, C.) Ch. 16, 659–692 (Springer, 2024).

  • Jefferson, E. et al. GRAIMatter: guidelines and resources for AI model access from TrusTEd research environments (GRAIMatter). Int. J. Popul. Data Sci. 7, 2005 (2022).

    PubMed Central 

    Google Scholar 

  • European Commission. AI for Health: evaluation of applications & datasets (AHEAD). CORDIS https://cordis.europa.eu/project/id/101183031 (2024).

  • European Commission. HORIZON Europe: ELIXIR-STEERS project. CORDIS https://cordis.europa.eu/project/id/101131096 (2024).

  • SustAInML. Sustainable AI and Machine Learning. https://sustainml.eu (2021).

  • Software Sustainability Institute. Green DiSC: a digital sustainability certification. https://www.software.ac.uk/GreenDiSC (2025).

  • Geoscience and Remote Sensing Society (GRSS). GeoCroissant: a metadata framework for geospatial ML-ready datasets. https://www.grss-ieee.org/events/geocroissant-a-metadata-framework-for-geospatial-ml-ready-datasets (2024).

  • Mitchell, M. et al. Model cards for model reporting. In Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency (eds Friedler, S. A. & Wilson, C.) 220–229 (Association for Computing Machinery, 2019).

  • Pushkarna, M., Zaldivar, A. & Kjartansson, O. Data cards: purposeful and transparent dataset documentation for responsible AI. Preprint at https://doi.org/10.48550/ARXIV.2204.01075 (2022).

  • Dasoulas, I., Yang, D. & Dimou, A. MLSea: a semantic layer for discoverable machine learning. In The Semantic Web (eds Meroño Peñuela, A. et al.) Ch. 11, 178–198 (Springer, 2024).

  • SciLifeLab Data Centre. SciLifeLab: funder requirements and FAIR ML models. https://serve.scilifelab.se/docs/model-serving/fair (2025).

  • Van Geest, G. et al. Using Glittr.org to find, compare and re-use online materials for training and education. PLoS ONE 19, e0308729 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Data Carpentry. Data Carpentry lessons. https://datacarpentry.org/lessons (2025).

  • The Turing Way Community. The Turing way: a handbook for reproducible, ethical and collaborative research. Zenodo https://doi.org/10.5281/ZENODO.15213042 (2025).

  • ONNX. ONNX: Open Neural Network Exchange. https://onnx.ai/ (2025).

  • Attafi, O. A. et al. DOME registry: implementing community-wide recommendations for reporting supervised machine learning in biology. GigaScience 13, giae094 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kurtzer, G. M., Sochat, V. & Bauer, M. W. Singularity: scientific containers for mobility of compute. PLoS ONE 12, e0177459 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Docker. Docker Hub container image library. https://hub.docker.com (2025).

  • Yuen, D. et al. The Dockstore: enhancing a community platform for sharing reproducible and accessible computational protocols. Nucleic Acids Res. 49, W624–W632 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Clyburne-Sherin, A., Fei, X. & Green, S. A. Computational reproducibility via containers in psychology. Meta Psychol. 3, 892 (2019).

    Article 

    Google Scholar 

  • Kryshtafovych, A. et al. Critical assessment of methods of protein structure prediction (CASP): round XV. Proteins 91, 1539–1549 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Xiong, Z. et al. Crowdsourced identification of multi-target kinase inhibitors for RET- and TAU- based disease: the Multi-Targeting Drug DREAM Challenge. PLoS Comput. Biol. 17, e1009302 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Capella-Gutierrez, S. et al. Lessons learned: recommendations for establishing critical periodic scientific benchmarking. Preprint at bioRxiv https://doi.org/10.1101/181677 (2017).

  • Ash, J. T. & Adams, R. P. On warm-starting neural network training. In Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) 3884–3894 (Curran Associates, 2020).

  • Tmamna, J. et al. Pruning deep neural networks for green energy-efficient models: a survey. Cogn. Comput. 16, 2931–2952 (2024).

    Article 

    Google Scholar 

  • Krishnan, S. & Faust, A. Quantization for fast and environmentally sustainable reinforcement learning. Google Research Blog https://research.google/blog/quantization-for-fast-and-environmentally-sustainable-reinforcement-learning (2021).

  • Yuan, Y. et al. The impact of knowledge distillation on the energy consumption and runtime efficiency of NLP models. In Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering – Software Engineering for AI (eds Cleland-Huang, J., Bosch, J., Muccini, H. & Lewis, G. A.) 129–133 (Association for Computing Machinery, 2024).

  • Tabbakh, A. et al. Towards sustainable AI: a comprehensive framework for Green AI. Discov. Sustain. 5, 408 (2024).

    Article 

    Google Scholar 

  • Guo, D. et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature 645, 633–638 (2025).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Green Software Foundation. Green software patterns. https://patterns.greensoftware.foundation (2025).

  • Green Software Foundation. Green Software Foundation. https://greensoftware.foundation (2025).

  • TOP500.org. Green500 List: November 2023. https://top500.org/lists/green500/2023/11 (2023).

  • Performance Optimisation and Productivity Centre of Excellence in HPC. https://pop-coe.eu (2025).

  • Schmidt, V. et al. Machine learning CO2 impact calculator. https://mlco2.github.io/impact (2025).

  • GitHub. Official Repository of MICCAI FLARE Challenges. https://github.com/JunMa11/FLARE (2025).

  • Henderson, P. et al. Towards the systematic reporting of the energy and carbon footprints of machine learning. J. Mach. Learn. Res. 21, 10039–10081 (2020).

    Google Scholar 

  • Ravi, N. et al. FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy. Sci. Data 9, 657 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Farrell, G. OSAI ecosystem components data. Zenodo https://doi.org/10.5281/zenodo.15391273 (2025).

  • RSQKit Community. Research software quality kit (RSQKit). Zenodo https://doi.org/10.5281/zenodo.14923572 (2025).

  • Gavriilidis, G. I. et al. APNet, an explainable sparse deep learning model to discover differentially active drivers of severe COVID-19. Bioinformatics 41, btaf063 (2025).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • D’Anna, F. et al. A research data management (RDM) community for ELIXIR. F1000Res. 13, 230 (2024).

  • BY-COVID. Infectious Diseases Toolkit (IDTk). https://www.infectious-diseases-toolkit.org (2025).

  • Mungall, C. Open knowledge bases in the age of generative AI. F1000Res. https://doi.org/10.7490/F1000RESEARCH.1120248.1 (2025).

  • Yiyao, L. et al. OmicsNavigator: an LLM-driven multi-agent system for autonomous zero-shot biological analysis in spatial omics. Preprint at bioRxiv https://doi.org/10.1101/2025.07.21.665821 (2025).

  • Huang, K. et al. Biomni: a general-purpose biomedical AI agent. Preprint at bioRxiv https://doi.org/10.1101/2025.05.30.656746 (2025).

  • Wei, J. et al. From AI for science to agentic science: a survey on autonomous scientific discovery. Preprint at https://doi.org/10.48550/arXiv.2508.14111 (2025).

  • Kim, J. et al. The cost of dynamic reasoning: demystifying AI agents and test-time scaling from an AI infrastructure perspective. Preprint at https://doi.org/10.48550/arXiv.2506.04301 (2025).

  • European Commission. The EU AI Act. https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai (2024).

  • National Science Foundation. National Artificial Intelligence Research Resource (NAIRR) pilot. https://www.nsf.gov/focus-areas/artificial-intelligence/nairr (2024).

  • The White House. America’s AI action plan. https://www.whitehouse.gov/wp-content/uploads/2025/07/Americas-AI-Action-Plan.pdf (2025).

  • Declaration on Research Assessment (DORA). https://sfdora.org/about-dora (2025).

  • CoARA. Coalition for Advancing Research Assessment. https://coara.org (2025).

  • Wang, Y. et al. SimpleFold: folding proteins is simpler than you think. Preprint at https://doi.org/10.48550/arXiv.2509.18480 (2025).

  • Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • AlphaFold3: why did Nature publish it without its code? Nature 629, 728 (2024).

  • Callaway, E. AI protein-prediction tool AlphaFold3 is now more open. Nature 635, 531–532 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Global Alliance for Genomics & Health (GA4GH). https://www.ga4gh.org (2025).

  • Pascucci, E. et al. Progressing towards personalised medicine: the Genomic Data Infrastructure (GDI) project. Eur. J. Public Health 34, ckae144.1956 (2024).

    Article 
    PubMed Central 

    Google Scholar 

  • Heredia, I. et al. AI4EOSC: a federated cloud platform for artificial intelligence in scientific research. Preprint at https://arxiv.org/abs/2512.16455 (2025).



  • Source link