Datasets based on phonon information improve machine learning predictions of material properties

Machine Learning


Predicting material properties using machine learning provides a powerful and cost-effective alternative to traditional computational methods, but the accuracy of these models is dependent on the quality of the data used to train them. Pol Benitez, Sibran López, and Edgardo Saucedo from the Polytechnic Institute of Catalonia, along with Teruyasu Mizoguchi from the University of Tokyo and Claudio Cazorla from the Polytechnic University of Catalunya, investigated how the way training data is generated affects the performance of machine learning models. Their work shows that models trained on datasets informed by the physics of lattice vibrations known as phonons consistently outperform models trained on randomly generated data, even when using fewer data points. This discovery challenges the assumption that larger datasets always lead to better predictions, introduces new efficient strategies for building high-quality training data, and has implications for accelerating materials discovery in areas such as energy conversion. The team’s explainability analysis further reveals that physically informed models favor chemically related bonds, highlighting the importance of incorporating physical principles into data generation to improve accuracy and understanding.

Predicting anharmonic material properties using machine learning

This study details the application of machine learning to predict material properties, with a particular focus on antiperovskite materials with potential applications in energy storage and solar cells. Using data generated by graph neural networks and ab initio calculations, researchers have successfully developed and applied machine learning models to predict the properties of these materials, often outperforming the efficiency of traditional methods. This work investigates silver-based chalcohalide antiperovskites and provides a path to accelerate materials discovery and design by overcoming the limitations of time-consuming traditional computational methods.

Data diversity improves prediction of material properties

Scientists have demonstrated that the quality and physical relevance of training data is paramount to predicting accurate material properties using graph neural networks. They designed randomly generated datasets and datasets informed by lattice vibrations to train these models to predict electronic and mechanical properties under realistic conditions. Physically informed datasets constructed using lattice vibration calculations consistently outperformed randomly trained models, even with fewer data points, favoring chemically meaningful bonds in predicting property variations and highlighting the importance of physically guided data generation.

Data quality trumps size in materials prediction

Scientists have achieved a breakthrough in predicting the properties of antiperovskite materials using graph neural networks, demonstrating that quality is more important than data size. They generated a comprehensive data set of the atomic configuration of silver chalcohalide and accurately captured its thermal motion at realistic temperatures. Models trained on datasets informed by lattice vibrations achieved consistently high accuracy and robustness even with fewer data points, placed greater emphasis on chemically meaningful bonds governing bandgap fluctuations, and directly linked predictive performance to physical interpretability.

Physically informed data improves material predictions

This study shows that the performance of graph neural network models in materials science is highly dependent on data quality rather than simple size. The researchers compared a model trained on randomly generated atomic configurations with one trained using data informed by lattice vibrations that provided a physically realistic representation of atomic motion. Physically informed models consistently outperformed randomly trained models when predicting material properties, prioritizing chemically meaningful bonds when making predictions, and highlighting the importance of incorporating physical principles into data generation strategies.

👉 More information
🗞 Why physics still matters: Improving machine learning predictions of material properties with phonon-informed datasets
🧠ArXiv: https://arxiv.org/abs/2511.15222



Source link