OpenBind’s first data and model release marks a milestone in AI-powered drug discovery

AI News


Laser crystal analysis laboratory

image:

Researcher Jasmine Aschenbrenner loads a sample into the diamond light source’s crystallography beamline.

view more

Credit: Stuart March-DNDi

The UK-led OpenBind initiative has reached a major milestone with the release of its first publicly available dataset and predictive AI model. This is a breakthrough step in using artificial intelligence to accelerate the discovery of new drugs. This release shows how engineering the generation of AI-enabled data is not only feasible, but essential for advancing AI tools for scientific fields plagued by data scarcity. With this OpenBind release, both high-quality standardized experimental data and newly trained predictive models, OpenBind v1, are freely accessible to researchers around the world and can be immediately used to discover treatments and power next-generation AI models.

Although AI has brought about a step change in the accuracy of protein structure predictions, its impact on drug discovery remains muted by a global lack of reliable experimental data that measure in atomic detail how drug molecules bind to disease-associated proteins. OpenBind aims to fill this critical gap. The collaboration between structural biologists and AI experts, led by Diamond Light Source and supported in its foundational phase by the Department of Science, Innovation and Technology (DSIT), is the first effort to generate these important datasets openly and continuously at industrial scale and specifically designed for AI.

This first release shows that OpenBind’s pipeline is now up and running and has generated 800 high-quality measurements in just seven months. Previously, creating and releasing such large datasets took years. The integrated operation will combine automated chemistry, robust binding measurements and high-throughput crystallography at Diamond’s XChem fragment screening facility with engineering data release processes and AI model training using the UK’s Isambard-AI computing cluster. This lays the foundation for transformative advances in drug discovery, with future data tranches planned to address global health challenges such as COVID-19, malaria, dengue, Zika and cancer, where the rapid development of new treatments remains critical.

“AlphaFold2 revolutionizes protein structure prediction by leveraging decades of experimental data on protein structures in the PDB,” said Professor Mohamed Al-Quraisi of Columbia University. “While no equivalent data set yet exists, OpenBind aims to create one, and in the process create the next generation of computational tools for modeling drug-protein interactions.”

The initial dataset also reflects valuable learnings from earlier experimental cycles of this effort. Standardized workflows, strong metadata practices, and high levels of automation have proven critical to ensuring the consistency and reproducibility needed for AI, while also highlighting opportunities to further streamline data processing and release frequency.

Dr Fergus Imrie from the University of Oxford said: “High quality experimental data is essential for developing new and improved AI models and this first data release shows that OpenBind is laying the foundations for this. “The lessons from these early cycles are already helping us improve the speed, consistency, and reproducibility of our pipeline, which will be important as OpenBind grows.”

Professor Frank von Delft, Principal Beamline Scientist at Diamond Light Source, said: “Such rapid progress would not have been possible without the contributions of our consortium members and operations team. Thanks to their expertise and commitment, we have been able to reach this ambitious milestone. We will now put into practice the lessons learned during this foundational phase to strengthen our long-term operations that link the mass production of AI data with active discovery projects.”

Building on this foundation, OpenBind will expand to include more targets, larger chemistry series, and deeper datasets, alongside community blind challenges to validate AI models on newly generated experimental data. Ultimately, OpenBind aims to create a global open data engine that can support the development of faster, more accurate, and more equitable treatments.

-end-

Detailed information:

About open binding
OpenBind is a UK-led collaboration coordinated by Diamond Light Source, bringing together experts in structural biology, chemistry, biophysics, AI and global health drug discovery. The initiative builds on the experience of leading open science consortia such as COVID Moonshot and ASAP, and works closely with global health partners to help prioritize targets related to diseases such as malaria and tuberculosis.

Team expertise
The consortium includes the following experts:

  • High-throughput X-ray crystallography
  • automated microscale chemistry
  • Protein-ligand binding assay
  • Machine learning model development
  • Fair data curation and large data infrastructure

This cross-disciplinary approach is core to OpenBind’s ability to generate high-quality, AI-enabled datasets.

To see the members of the consortium, please visit the website: https://openbind.uk/team/


Disclaimer: AAAS and EurekAlert! We are not responsible for the accuracy of news releases posted on EurekAlert! Use of Information by Contributing Institutions or via the EurekAlert System.



Source link