Machine learning advances quadrupole moment MEP prediction and improves accuracy on QM9 dataset

Machine Learning


Accurately describing the electrostatic properties of molecules is critical to understanding how they interact, and researchers are increasingly turning to machine learning to speed up these calculations. Kadri Muuga, Lisanne Knijff and Chao Zhang from the Angstrom Institute in the Department of Chemistry at Uppsala University investigated whether machine learning models can effectively predict the electrostatic potential of molecules using dipole and quadrupole moments as training data. Their work shows that incorporating the quadrupole contribution significantly increases the accuracy of these models and improves the prediction of intermolecular interactions. This finding is particularly important as it highlights the important role of quadrupole moments in the development of faster and more efficient computational methods for modeling chemical systems and exploring large regions of chemical space, as validated using both the QM9 and SPICE datasets. This research provides a route to rapidly accessing the electrostatic potential of molecules and could revolutionize fields such as drug discovery and materials science.

PiNet2, a convolutional neural network architecture, was developed and trained using dipole and quadrupole moments. Analysis of the established QM9 dataset reveals that incorporating quadrupole contributions into machine learning models significantly improves the ability to recover molecular electrostatic potentials (MEPs) compared to models that utilize only dipole moments. This observed trend is further supported by results obtained from SPICE datasets representing a fairly wide range of organic chemistry space. This study highlights the important role of quadrupole moments as a key target for machine learning models designed to facilitate rapid access to MEPs. The electrostatic potential of a molecule provides a concise way to describe molecular interactions.

How to summarize

This research paper describes the development and validation of PhysNet, a novel machine learning model designed to predict multiple molecular properties directly from atomic structure. Unlike many existing approaches that primarily focus on energy and forces and require separate post-processing steps for other properties, PhysNet can predict energy, interatomic forces, dipole moments, and partial charges simultaneously within a single unified framework. This direct and integrated predictive capability represents a major advance in the application of machine learning to molecular modeling.

The model is trained and evaluated using two main datasets. The first is QM9, which contains approximately 134,000 small organic molecules with quantum-mechanically calculated properties and serves as a standard benchmark in the field. The second is a larger and more diverse SPICE dataset, consisting of more than 270,000 entries including drug-like molecules and peptides, particularly relevant to biomolecular applications. The PhysNet architecture is specifically designed to handle the tensor-like nature of molecular properties, ensuring that physical symmetries and interactions are properly captured. Training relies on high-quality reference data generated from density functional theory calculations using the ωB97M-V functional with the def2-SVP basis set, and model optimization is performed using the Adam algorithm.

In terms of performance, PhysNet achieves state-of-the-art or highly competitive accuracy across both datasets, as measured by the root mean square error of energy, force, dipole moment, and partial charge. The model shows particularly strong performance in predicting dipole moments and atomic charges, which are often difficult properties for machine learning possibilities. Additionally, PhysNet exhibits good transferability and the ability to generalize effectively to molecular systems not included in the training set. Comparisons with established techniques such as ANI-1x, SchNet, and DimeNet++ further highlight the robustness and competitiveness of our approach.

Overall, this work introduces the potential of powerful and versatile machine learning to significantly expand the range of directly predictable molecular properties. PhysNet provides a more complete and physically meaningful description of molecular systems by integrating dipole moment and partial charge predictions with energies and forces. Furthermore, the introduction and use of SPICE datasets represents a valuable contribution to the community, providing a rich resource for the development and benchmarking of next-generation machine learning models in computational chemistry and materials science.

Quadrupole moments enhance electrostatic potential prediction

Scientists have made significant progress in predicting molecular electrostatic potential (MEP) using machine learning (ML) models. This study focused on the ability of PiNet2, an equivariant convolutional architecture, to infer MEPs based on the dipole and quadrupole moments of molecules. Experiments revealed that incorporating quadrupole contributions into the ML model significantly improves MEP recovery compared to models relying only on dipole moments. This enhancement is consistently observed on both QM9 and SPICE datasets, demonstrating the robustness of our approach. The team meticulously evaluated seven different PiNet2-based ML models that differed in their use of atomic charge (AC) and atomic dipole (AD) predictions, along with dipole and quadrupole moments.

The results show that the AC-DQ model, which combines AC predictions of both dipole and quadrupole moments with a loss weight ratio of 1:1, showed a significant improvement. The AC-DQ-dw100 model has been further improved with a 100:1 dipole to quadrupole loss weight ratio to further optimize performance. Measurements confirm that regressive molecular multipoles, which are experimentally observable quantities, provide a general strategy for rapidly inferring MEPs from ML models without direct training on MEPs or electron density. For the QM9 dataset, which consists of 133,885 stable organic molecules, the researchers calculated dipole and quadrupole moments at the B3LYP/6-31G(2df,p) level of theory.

After data filtering, 9,723 molecules were excluded due to mismatches, leaving a robust dataset available for model training. The Gaussian quadrupole moments are traceless and scaled by a factor of 3 to fit the quadrupole calculation equations from the study. Tests demonstrate that the same large PiNet2 model size that previously successfully predicted dipoles is optimal for this combined approach. Extending their research, the team applied their methodology to a SPICE 2.0 dataset consisting of 91,420 organic structures. The data were filtered to include only the lowest energy conformers, providing a complementary chemical space for QM9. Our measurements demonstrate that a medium-sized model is sufficient for training on SPICE datasets, highlighting the adaptability of our approach. This breakthrough provides a powerful new technique for rapidly and accurately predicting MEPs, with potential applications in broader areas of drug discovery, materials science, and computational chemistry.

Quadrupole moments enhance electrostatic potential prediction

This study demonstrates that a machine learning model can effectively infer molecular electrostatic potentials (MEPs) using equivariant convolutional architectures. Importantly, the inclusion of quadrupole moments significantly improves the accuracy of MEP recovery compared to models relying only on dipole moments. This result is consistently observed on both QM9 and SPICE datasets. These results suggest that the quadrupole moment is a more effective primary target for machine learning-based atomic charge models than the dipole moment alone. This study demonstrated that while dipole moments traditionally dominate the understanding of neutral molecules, incorporating quadrupole information enhances the predictive power of models designed to reproduce MEPs.

Although quadrupole moments can be measured experimentally like dipoles, they are particularly valuable because they require less data storage than the full charge density. Therefore, this study provides a practical approach to rapidly access MEPs, which is a critical element in solvent and electrolyte design. The authors acknowledge that optimizing the quadrupole moments may slightly reduce the accuracy of the predicted dipole moments. Future research could explore strategies to simultaneously improve both properties in the model. This research was supported by funding from the European Research Council, the Wallenberg Initiative Materials Science for Sustainability, the Swedish Research Council, and computational resources provided by the Swedish National Academic Infrastructure for Supercomputing.

👉 More information
🗞 Molecular electrostatic potentials from machine learning models for dipole and quadrupole prediction
🧠ArXiv: https://arxiv.org/abs/2601.10320



Source link