AI Advances Sustainable Scientific Research

By its very nature, cutting-edge scientific research is a resource-intensive process: electricity, water, not to mention specific chemicals and specialist equipment. There’s an element of trial and error in exploring all new ideas, and behind every important discovery is a string of failed experiments. But while these failures and wrong turns are a crucial part of the process, there is an associated cost—not just in time and money, but in the resources consumed in the pursuit of this knowledge.

Scientific progress should not come at the cost of sustainability and—whether it’s streamlining the experimental process and tailoring project design, or optimizing lab operations and reducing waste—AI tools are transforming the way many researchers approach their work.

In this article, we will look at a spectrum of tools and programs helping researchers from across the sciences work more sustainably.

AI-powered electronic lab notebooks

A typical synthetic chemistry project can involve hundreds, if not thousands, of individual experiments, each necessitating detailed analysis and characterization. Maintaining thorough and up-to-date records remains a hefty organizational task, although the introduction of electronic lab notebooks (ELNs) over the last 20 years has already gone a long way towards streamlining this onerous but essential job.

Intended to replace traditional paper records, ELNs create a digital entry for each experiment, storing methods, data, and analyses in an easily searchable and machine-readable format. These digitized records foster ready collaboration between teams and provide the perfect input for machine learning models, often prompting the user for missing information or to highlight duplicate experiments.

However, for computational chemist Professor Jonathan Hirst, this interactive interface presented an interesting opportunity to go beyond basic record-keeping functionality and begin to challenge users on the design of their experiments. In 2023, his team at the University of Nottingham launched AI4Green, the first electronic lab notebook with a central focus on sustainability.¹ The software integrates the core functions of an ELN with a panel of simple apps and AI tools to calculate the green metrics of a planned reaction and propose sustainable alternatives where appropriate.

The user first sketches the intended reaction, adding key details such as reagents and quantities to the accompanying table. The software then automatically populates the rest of the table with hazards and chemical data imported from external databases, highlighting any particular safety or sustainability concerns. The associated summary also evaluates various other aspects of the reaction, including the reagents, temperature, catalyst recovery, and isolation method, prompting the user to consider each component and how it influences the sustainability.

In particular, solvents are a key target of these interventions—it’s estimated that as much as 90% of waste produced during pharmaceutical manufacture consists of solvent, the majority of which is ultimately incinerated.^2,3 “We have some nice solvent selection tools where you can find a similar solvent with similar properties which is more environmentally friendly, and you can compare pairs of solvents side by side using our flash cards,” said Hirst.⁴

Post-experiment, AI4Green also enables the user to evaluate their reactions and draw trends from their data to identify a greener solvent for future iterations, specifically using the interactive principal component analysis (PCA) tool.⁵ PCA is a data representation technique that simplifies complex data sets into a more readable form; chemists often use this method to visualize the relationship between different solvent properties. Usually, this is a one-way process based on existing data, but Hirst’s interactive tool lets the researcher incorporate their own empirical observations into this representation. “The user can drag points together and then the model will update the plot in a mathematically rigorous fashion within those constraints to help them identify a green alternative solvent from the data that they have,” he explained.

AI is becoming an increasingly important part of these supporting add-ons, and since launching, AI4Green has incorporated a number of other tools developed for chemists including AstraZeneca’s open-source AI route scouting software which helps users evaluate different pathways when planning a new chemical synthesis.^6,7 The group is now working on a machine learning model for lifecycle analysis, which will ultimately provide chemists with a much broader overview of the impacts of their choices, right from reagent sourcing, through to purification.

But while the coding behind the software is complex, the user interface isn’t. Hirst has designed the ELN with the needs of the average organic chemist in mind. “We’ve deployed it live on the cloud so anyone with a web browser can register and use it with no specific skills required at all – we’ve worked hard to make it intuitive,” he said. “The biggest barrier is really mindset and chemists’ willingness to do something different from what they’ve done before. I think creating a more open-source environment is going to be one of the critical ways to drive sustainability and help people develop confidence in AI suggestions.”

AI apps assist scientists in making new materials

Taking it up a notch, AI applications more actively guide the direction of research to help scientists target the most productive experiments. Data-driven models can dramatically accelerate the discovery process—spotting patterns, analyzing data, and identifying the most impactful variables. Such tools are particularly valuable where there is a huge amount of theoretical research space to explore.

The field of sustainable concrete design is the perfect example. Concrete, and particularly the cement binder, is a huge contributor to global CO₂ emissions, with an estimated 8% of the annual total from the manufacture of cement products.⁸ Research groups around the world are investigating alternative formulations to reduce or replace this problematic ingredient with greener alternatives (including fly ash, biochar and even coffee grounds), balancing sustainability considerations against mechanical properties and social factors like cost.^9,10,11 However, the sheer volume of possible combinations, in addition to the extended experimental times needed to validate properties such as compressive strength, restricts the rate of progress in this area.

Sequential learning, which combines a machine learning model with a decision-making rule to extrapolate from initial data, was therefore an obvious tool to streamline this process, reasoned materials informatician Christoph Vӧlker, now head of industrial AI at Iteratec. In 2021, while based at Bundesanstalt für Materialforschung und -prüfung in Germany, Vӧlker and his team developed a sequential learning program to evaluate alkali-activated binders as an alternative to cement and found several suitable candidates within just 11 experiments.¹²

But despite this success, and subsequent applications of the same methodology, relatively few researchers in the sustainable concrete space adopted AI-augmented approaches.¹³ The challenge, suggested Vӧlker, is that actually building and coding these models is a technically demanding task and remains a barrier, hindering many researchers from employing artificial intelligence in their own work.

Aiming to democratize these tools, Vӧlker’s team therefore developed their sequential learning program into an open-source and user-friendly app called SLAMD (Sequential Learning App for Materials Discovery).¹⁴ The user first outlines the desired properties of their concrete formulation, inputting existing experimental results and relevant literature as basic training data. The program then evaluates this initial information, making suggestions of the most promising experiments to try next, according to various factors weighted by the researcher. Once these experiments are complete, this data is fed back into the system for a second iteration, which provides even more targeted suggestions.

This cycle of inputting training data, machine learning analysis, and validation by experiment rapidly focuses the investigation on the most impactful variables and optimizes the approach to reduce the overall volume of experiments. Crucially, this smaller experimental burden not only accelerates discovery, but also decreases the environmental impact of the research process itself, requiring fewer resources, less power, and less money.

Generative AI and digital twins accelerate clinical trials

The time, money, and sustainability gains provided by AI solutions are most significant for large and experimentally complex studies, such as clinical trials. “Trials are the last stage of drug discovery and the majority of the drug development costs will go to this part,” said Jimeng Sun, a computer science professor from the University of Illinois Urbana-Champaign and co-founder of Keiji AI 3,400.

Over the course of the pharmaceutical development process, an initial screen of hundreds or thousands of compounds is whittled down to a final pool of just three or four candidates, but even this small handful is too expensive and too slow to thoroughly investigate in vivo.

Fortunately, advances in AI are already informing these critical scientific and financial decisions, with prediction and analytics models guiding the design and direction of clinical trials more efficiently than ever before. “Trial outcome prediction plays a role in what industry calls portfolio management, i.e., prioritizing which candidate is most likely to work. The actual experiment is still necessary, but this is a more systematic way to determine where to invest and which trials to run,” explained Sun.

Traditionally, decision makers would look at the historical track record for trials of similar types of drugs and benchmark from that figure. Machine learning methods, on the other hand, combine information from multiple different sources and use this more complex spread of data to determine which factors are most significant for trial success.

Sun’s team employed this approach in their first prediction model, HINT (Hierarchical Interaction Network), which used training data from over 8000 past trials to predict the outcome of more than 3,400 recent drug studies.¹⁵

“We leveraged multiple sources of data—the molecular structure, the disease indications, the trial protocol—and then we augmented that with some knowledge that’s related to drug discovery, for example, wet lab properties, historical track records, etc,” explained Sun. “All this information was put together through a graph neural network machine learning model which considers the interaction of all these components to finally make a trial prediction.”

Notably, this initial model correctly predicted the success of Merck’s Sitagliptin (diabetes) and Bayer’s Afibercept (glaucoma). It also anticipated the costly failures of the promising drugs Entresto (heart failure) and Fivipiprant (asthma), which cost an estimated $240 million in unsuccessful trials.¹⁵ The team later developed a second iteration of this model called SPOT (Sequential Predictive mOdeling of clinical Trial outcome), which weights input data according to time and therefore aligns predictions more closely with the structures and protocols of modern trials.¹⁶

Building on the success of HINT and SPOT, they most recently reported the Clinical Trial Outcome (CTO) benchmark, which establishes a next-generation dataset with over 125,000 trials, richer multimodal features, and continuous temporal updates.¹⁷ This enables more robust, forward-looking evaluation of trial outcome prediction models under real-world distribution shifts, setting a new standard for scalable and deployable clinical AI, said Sun.

But even with a solid prediction of success, the cost and difficulty of recruiting suitable patients for a trial can still delay advances in healthcare. A robust phase three trial requires a few thousand patients, split roughly 50:50 between treatment and control arms. Studying each individual patient consequently amounts to a huge investment in both time and money and the challenge is further compounded for rare or aggressive diseases, where even finding sufficient patients for a valid trial can take years. “In these cases, sometimes the control arm is not even implemented at all because the patients are just so rare. And of course, for individual patients on the trials, they will probably prefer the treatment arms,” said Sun. “But from a scientific point of view, we do still need the control arms.”

A digital twin is a dynamic virtual replica of a particular individual patient, which can simulate that person’s health trajectory following different treatment regimens. “This is especially useful for control arms. Instead of using a real patient as a control arm, we can simulate the patient’s trajectory and compare that directly to the same patient receiving treatment,” explained Sun. “This reduces the trial recruitment process. You don’t need as many patients so it can speed up the trials and also put the patients on the new treatment options.”

In 2023, Sun’s team reported their first digital patient twin method, TWIN, a generative model that combines historic data from electronic health records— including details of prescribed medications, existing treatment regimens, and any adverse effects—and uses these to simulate various patient outcomes under different conditions.¹⁸ “To train a digital twin model, you code based on a cohort of patients to see what happened and the model learns from that large number,” Sun explained. “Once it’s trained, the application is then used for individual patients to simulate what will happen to that patient.”

This simulation approach is already being trialed by pharma companies, and Sun hopes this method can help developers design and refine their trials in the future. He is currently looking at reducing the training data demand for producing these patient-specific models. “At the moment, if you want to simulate a patient trajectory, you need patient-level data to do that,” said Sun “We’re working on whether we can leverage publications about other trials with aggregate statistics. Ultimately, we want to reverse engineer a digital twin model that can produce individual patient-level data from the statistics of a cohort.”

Regardless of research field, there are tools available at all levels of complexity, making the integration of AI into scientific workflows accessible to everyone. This isn’t just relevant for researchers—departmental support staff can also incorporate these tools and systems into their own work, for example, introducing ELNs to teaching labs or developing a digital twin of building systems to optimize heating, lighting, and air flow. As more people familiarize themselves with these tools, the transformative impact of AI in science will grow, leading to a more efficient and sustainable future for research.

1. Boobier S, Davies JC, Derbenev IN, Handley CM, Hirst JD. AI4Green: An open-source ELN for green and sustainable chemistry. J Chem Inf Model. 2023;63(10):2895-2901. doi: 10.1021/acs.jcim.3c00306

2. Richardson P, Bryan MC, Diorazio L, et al. Solvent sustainability in drug discovery: Where are we now, and how can we improve? J Med Chem. 2025;68(24):25625-25664. doi: 10.1021/acs.jmedchem.5c01220

3. Seyler C, Capello C, Hellweg S, et al. Waste-solvent management as an element of green chemistry: A comprehensive study on the Swiss chemical industry. Ind Eng Chem Res. 2006;45(22):7700-7709. doi: 10.1021/ie060525l

4. Heeley J, Boobier S, Hirst JD. Solvent flashcards: a visualisation tool for sustainable chemistry. J Cheminform. 2024;16(1). doi: 10.1186/s13321-024-00854-9

5. Boobier S, Heeley J, Gärtner T, Hirst JD. Interactive knowledge-based kernel PCA for solvent selection. ACS Sustain Chem Eng. 2025;13(11):4349-4368. doi: 10.1021/acssuschemeng.4c07974

6. Blackshaw TM, Davies JC, Spoerer KT, Hirst JD. Enhancing Monte Carlo tree search for retrosynthesis. J Chem Inf Model. 2025;65(13):6537-6546. doi: 10.1021/acs.jcim.5c00417

7. Genheden S, Thakkar A, Chadimová V, Reymond JL, Engkvist O, Bjerrum E. AiZynthFinder: A fast, robust and flexible open-source software for retrosynthetic planning. J Cheminform. 2020;12(1). doi: 10.1186/s13321-020-00472-1

8. Afrin H, Huda N, Abbasi R. An overview of eco-friendly alternatives as the replacement of cement in concrete. IOP Conf Ser Mater Sci Eng. 2021;1200(1):012003. doi: 10.1088/1757-899x/1200/1/012003

9. Lazik PR, Garrecht H. Wood ashes from electrostatic filter as a replacement for the fly ashes in concrete. In: Fib Symposium. The International Federation for Structural Concrete; 2020:118-124. doi: 10.36756/jcm.v2.2.6

10. Akinyemi BA, Adesina A. Recent advancements in the use of biochar for cementitious applications: A review. J Build Eng. 2020;32. doi: 10.1016/j.jobe.2020.101705

11. Roychand R, Kilmartin-Lynch S, Saberian M, Li J, Zhang G, Li CQ. Transforming spent coffee grounds into a valuable resource for the enhancement of concrete strength. J Clean Prod. 2023;419. doi: 10.1016/j.jclepro.2023.138205

12. Völker C, Firdous R, Stephan D, Kruschwitz S. Sequential learning to accelerate discovery of alkali-activated binders. J Mater Sci. 2021;56(28):15859-15881. doi: 10.1007/s10853-021-06324-z

13. Völker C, Moreno Torres B, Rug T, et al. Data driven design of alkali-activated concrete using sequential learning. J Clean Prod. 2023;418. doi: 10.1016/j.jclepro.2023.138221

14. Völker C, Moreno Torres B, Ahmad Zia G, et al. Presenting SLAMD – a sequential learning based software for the inverse design of sustainable cementitious materials. NanoWorld J. 2023;9. doi: 10.17756/nwj.2023-s2-032

15. Fu T, Huang K, Xiao C, Glass LM, Sun J. HINT: Hierarchical interaction network for clinical-trial-outcome predictions. Patterns. 2022;3(4). doi: 10.1016/j.patter.2022.100445

16. Wang Z, Xiao C, Sun J. SPOT: Sequential predictive modeling of clinical trial outcome with meta-learning. ArXiv. 2023. https://arxiv.org/abs/2304.05352

17. Gao C, Pradeepkumar J, Das T, Thati S, Sun J. A large-scale database for clinical trial outcomes and features. Nature Health. 2026. doi: 10.1038/s44360-026-00081-6

18. Das T, Wang Z, Sun J. TWIN: Personalized Clinical Trial Digital Twin Generation. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery; 2023:402-413. doi: 10.1145/3580305.3599534

Source link

创建Binance账户 commented on AI jobs in financial services: $350k for junior hires: Your article helped me a lot, is there any more re
1win commented on Do AI apps really need a GPU or NPU?: Saved as a favorite, I really like your website!
m777 commented on Create the content you envision: Everything is very open with a precise description
discount كازينو commented on Apple to process data from AI apps in virtual black box: Hi there! I know this is kinda off topic but I'd f
binance konto commented on Artificial Intelligence May Be Coming For Your Job: Your point of view caught my eye and was very inte

AI Advances Sustainable Scientific Research

AI-powered electronic lab notebooks

AI apps assist scientists in making new materials

Generative AI and digital twins accelerate clinical trials

RECENT POSTS

Introducing the next generation of Amazon OpenSearch Serverless for building agent AI applications

YouTube’s new AI labels make it more clear what “real” videos are not

Vietnam proposes expanding AI cooperation with EAEU

AI-powered electronic lab notebooks

AI apps assist scientists in making new materials

Generative AI and digital twins accelerate clinical trials

Related Posts