Machine learning ensemble selects quantum chemistry method and predicts interaction accuracy with 0.1 accuracy compared to CCSD(T)/CBS

Accurately calculating interactions between molecules is important for many scientific fields, but these calculations often require large amounts of computational resources. Georgia Tech’s Austin M. Wallace, C. David Sherrill, and Giri P. Krishnan are tackling this challenge with a new framework that intelligently selects the best computational method for a given task. Their approach employs machine learning models trained on data from advanced atomic-level simulations to predict the performance of different techniques compared to highly accurate but computationally expensive benchmarks. This allows researchers to identify efficient and reliable methods that significantly reduce computational costs while achieving remarkable precision with errors of less than 0.1 kcal/mol. Importantly, the team’s work demonstrates the power of machine learning not only to improve efficiency, but also to uncover the underlying relationships between different computational theories and improve our understanding of molecular interactions.

This study addresses the important challenges of choosing appropriate computational methods and balancing accuracy and computational cost, especially when assessing how molecules interact. The team trained these Δ-ML models based on features extracted from pre-trained atom pairwise neural networks, allowing them to predict errors and the difference between the results of a given method and highly accurate but computationally expensive “gold standard” coupled clusters with single, double, and perturbed triple excitations in the limit of the estimated complete basis set. The core of this approach involves predicting the difference in errors between different levels of the theory, rather than directly trying to predict the absolute interaction energies, leveraging information from pretrained networks to reduce computational load and improve predictive power.

The researchers demonstrated the effectiveness of this framework using an extended BioFragment dataset consisting of interaction energies of common biomolecular fragments and small organic dimers. This allows for rigorous testing across a variety of chemical systems. In this study, we consistently achieved significantly small mean absolute errors of less than 0.1 kcal/mol, regardless of the quantum chemistry method being evaluated. This level of accuracy represents a significant improvement in our ability to reliably predict interaction energies without resorting to prohibitive computational methods. Furthermore, by analyzing all Δ-ML models, the scientists identified a group of techniques that are consistent with established theoretical hypotheses, providing evidence that machine learning models can learn effectively and transfer modifications between different levels of theory. The researchers developed a system based on machine learning models. The system was trained to estimate the error of a theoretical method compared to a more accurate but computationally expensive “gold standard” method. The model successfully predicts these errors across a variety of methods, with an average absolute error of less than 0.1 kcal/mol, even when comparing very different theoretical approaches. In particular, machine learning models can take advantage of low-cost computation to predict outcomes comparable to those of higher-level techniques, providing a path to balance accuracy and efficiency.

By combining error prediction and calculation time estimation, this framework allows users to choose the appropriate method based on the required accuracy and available resources, rather than relying on chemical intuition. The researchers acknowledge that expanding the dataset with greater chemical diversity and fewer levels of theory could further increase the generalizability of the framework. 1 kcal/mol, regardless of the method used. This study addresses a critical challenge in molecular modeling, where selecting an appropriate computational method based on both accuracy and computational cost remains difficult. The team’s approach utilizes an ensemble of machine learning models trained on data extracted from pre-trained neural networks to predict differences in accuracy between different methods compared to the gold standard, CCSD(T)/CBS.

This study demonstrated that these machine learning models can accurately estimate errors across a wide range of theoretical levels and identify computationally efficient approaches to achieve desired accuracy levels using only a subset of the available data. Experiments using an extended BioFragment dataset containing interaction energies of common biomolecular fragments and small organic dimers confirmed the accuracy of the framework. The team analyzed all machine learning models, uncovered a group of techniques that are consistent with established theoretical assumptions, and provided evidence that these models can effectively learn corrections between any two levels of theory. This breakthrough provides researchers with powerful tools to efficiently navigate the complex landscape of computational chemistry methods.

Accurately predicting the performance of different techniques allows scientists to choose the most appropriate approach for their specific needs, balancing accuracy and computational feasibility. The framework’s ability to map between 80 different levels of theory provides unprecedented insight into the relationships between these techniques and may lead to the development of more accurate and efficient computational techniques. The results support the potential of machine learning to significantly accelerate and improve molecular modeling, with far-reaching implications for fields such as drug discovery and materials science.

👉 More information
🗞 -ML ensemble for selecting quantum chemistry methods for calculating intermolecular interactions
🧠ArXiv: https://arxiv.org/abs/2511.17753

Source link