In this section, we will verify our proposed method with several numerical experiments under different tasks and different data availability conditions. The data involved in this section is taken from20. It includes 20 (P, V, T, E) data from previous ab initio calculations6 and we take 25 (V, P) points from the simulation of Hugoniot experiment (Sharma et al. 2024)20. Though one can choose arbitrary network structures in our proposed method, simple Multi-Linear Perceptron (MLP) are selected in our model for P-Net and E-Net for all the following experiments. Compared to PINNs or DeepONets, this choice is more appropriate because of the sparse data at hand, easier architecture and training. Codes for the experiment can be found at our Github page: https://github.com/dykuang/DL4EOS(commit b43e688). Unless otherwise specified in specific experiments, the Adam optimizer40 is used for training with a initial learning rate of 0.001. All data will be passed as one batch and the maximum number of epochs is set to 10000 (training takes less than a minute with a single Laptop CPU : AMD Ryzen 7 PRO 7840HS @ 3.80Ghz). The loss weights for \(Loss_P\) and \(Loss_E\) are both set to 1.0, while the weight for physical regularization loss \(Loss_{phy}\) is set to 0.001. Early stopping is adopted to avoid overfitting, where the training will stop if \(Loss_P\) and \(loss_E\) drops below 0.0001 or the maximum epochs are reached.
Supervised joint learning with P–V–T–E data
As the first part of the numerical experiments, we would like to establish some confidence in the effectiveness of the proposed method under the supervised tasks. We perform two sets of numerical experiments here using the aforementioned 20 (P, V, T, E) data. Since this supervised task has E information available, one only needs to replace quantities from Hugoniot (with H as the superscript) with the corresponding quantity from the current data at hand in Algorithm 1 (in SI Appendix A). The first experiment follows classical 5-fold cross validation (5CV) procedure where the total data will be randomly and evenly split into 5 non-overlapping parts (folds). Each time 4 of the 5 folds will be used in training and the remaining fold will be used as the test set. The mean and standard deviation of the prediction errors on the test set are collected from the 5 folds for better statistical analysis among different methods.
As for the methods to include in the benchmark for comparison, we choose the Birch-Murnaghan(BM) model and Vinet model1,30,31 in the pressure (P) prediction and Mie-Gr\(\ddot{\text {u}}\)neisen (MG) model2,29 in the energy (E) prediction as representatives from traditional semi-empirical physical framework. For these semi-empirical physical methods, regression with the least squares method are apply to training data to obtain EOS parameters which are then subsequently used to calculate P or E for the testing dataset1,2,29. We used Eos Fit741,42, version 7.6 for the fitting process. This free software can be downloaded at http://www.rossangel.com/home.htm. Details for the actual fit can be found in SI Appendix B,C,D and Supplementary Tables (D1: fitted thermal paramters at each fold from classical semi-physical models. D2: Predictions by semi-physical models at each fold. D3: summarized performance of methods compared). As an representative for recent machine learning approach, we pick the recent physics informed Gaussian process regression as proposed in20 for comparison. In their work, a two stage fitting scheme is adopted where a regular polynomial regression (PR) with degree \(d=3\) is performed at the preparation stage. The residual is then modeled by their physics informed Gaussian process regression.
We summarize the performance of prediction on the test set in Table 1 for some selected methods with their variations from the classical semi-empirical models and recent machine learning models. A more detaled collection of all experiments we have performed can be found in the supplimentary material (Table D3). Additionally, prediction errors in log scale from different methods as shown in Fig. 3. We would like to highlight several observations from our numerical results as below:
-
The performance of physical informed Gaussian process regression as in20 relies sensitively on the choice of degree d. \(d = 3\) on the used dataset yields the best test accuracy, but parameter as this is usually hard to determine in reality beforehand. As a comparison, our proposed neural network does not rely on the first stage polynomial regression, it alone can already provide prediction satisfactory performance. Even if combining with polynomial regression as seen in Figure E2 from SI Appendix E, its performance is shown with the most robustness in terms of the degree d.
-
Traditional semi-empirical physical models require particular prior formulas and adequate of expertise domain knowledge to start with, while EOSNN as an data driven approach can still function properly without relying on these formulas (Though these formulas can be conveniently enforced in the framework if needed). The classical and commonly adopted BM model and Vinet model1 demonstrate noticeable larger standard deviations on the prediction of pressure from our 5CV experiments when compared to other methods. Among all models adopted for comparison, Vinet model appears to have the lowest mean test error in pressure prediction. We further performed one-sided paired Student’s t-test between our proposed method with the Vinet case and find the difference is NOT of statistical significant with p-value \(= 0.1716\). (Student’s t-test is A commonly used hypothesis test method from statistics for testing if two distributions have the same mean from gathered observations. We use scipy.stats package from python for relevant calculations in this paper. The p-value tells how likely it is that the observed data could have occurred under the null hypothesis. In this case, the null hypothesis is set as “the two methods perform similarly in terms of mean error.” In common statistical practices, a significance level \(\alpha = 0.05\) or 0.01 is set first and a p-value smaller than \(\alpha\) will lead to rejecting the null hypothesis. One-sided test usually is the more appropriate choice than two-sided test if a directional comparison is desired.) Same results with no statistical significant difference hold when comparing our method with \(d=3\) cases for both polynomial regression and polynomial regression + Gaussian process case.
-
In terms of the prediction performance on the energy E, the paired Student’s t-test suggests our method provides a statistically significant (at significance level \(\alpha = 0.01\)) lower error than the adopted MG approach with a p-value \(= 0.002 < \alpha\).
Through the benchmark results and aforementioned observations, we can conclude that our proposed method can help strike the best balance between the flexibility of data driven approaches and the interpretability of traditional semi-empirical models. It also demonstrates a more robust behavior with respect to the choice of prior knowledge and model parameters (such as the degree d in polynomial regression). It can be also easily extended to incorporate further physical priors or domain knowledge when needed, while still being able to learn from data directly as shown in our study in the later section of Case Study when different physical priors are properly injected during the training.

Prediction performance among different models when data in the lower pressure/temperature region are used for training and the rest for testing. Left: Different train/test partition shown on the P-T space. From region I to IV, data points used for training are expanding. Middle: test error on pressure in terms of RMSE when data belonging to different regions are used during training. Right: same plot as the middle but showing the test error on energy predictions.
The second evaluation framework is to split the data into two parts according to temperature and pressure: low temperature and low pressure (LTLP) and the region beyond. We intend to compare different methods’ prediction capability in terms of extrapolation when they are trained with data located in the LTLP subset only. The results are gathered in Fig. 4 (the detailed numerics for making this figure are included in Table F2 from Appendix F in the Supplementary Information). Except for the traditional semi-empirical models, we only includes PR(\(d=3\)) + GP and PR(\(d=3\)) in the results for better visualization, which are seen with the best performance in their own category. Some more details including the exact fitted paramters for benchmarked semi-physical models are contained in the Supplementary tables (D4: fitted thermal paramters on each region from classical semi-physical models. D5: Predictions on test data by semi-physical models when fitted on each region. D6: summarized performance of methods compared per region). The performance on the test set that are outside LTLP region can help measure each method’s capability of extrapolation.
We can observe from the gathered results that predictions are generally improved when data included in training expands from region I to IV, except for the case when predicting E with MGD. This may because the potential physics at high pressure (>500 GPa)/high temperature (>6000 K) are different than the case considered during training at lower pressure/temperature domain6, thus the MGD method as a semi-empirical model does not extrapolates well from training domain to targeted test domain. Methods such as PR or PR + GP have quite high error in predicting pressure when region I or II are used for training. Comparing with other methods, they exhibit more sensitive behaviors when data used in training is limited. The semi-empirical method as MG demonstrates less optimal predictions on energy when compared to data driven approaches. We also notice that machine learning methods (including GP and EOSNN) all underperforms when training data are from region III or region IV, comparing to traditional methods in terms of Pressure prediction. This may because of the data driven nature of this type of methods, since the data at hand in this high pressure/high temperature (pressure > 500 GPa and temperature > 5500K) region is sparse as seen in Figure 4. More data will help make improvements. Our proposed method achieves the best performance when data is most limited and demonstrates the most stable behavior and have decent accuracy on BOTH pressure and energy predictions comparable to the best case other models can provide at the same time in every region. This result combined with the previous 5CV experiment suggests that our method can effectively leverage the available data while still respecting the underlying physical principles, thus being able to generalize well to unseen data, even when trained with limited data.
Partially supervised learning with hugoniot

Prediction of the deterministic model. Top row from left to right are surfaces \(P=P(V,T), E=E(V,P)\) and \(E=E(V,T)\) on the considered (V, T) domain respectively. Bottom Left: True Pressure versus Predicted Pressure. Bottom Right: True Energy versus Predicted Energy.
In this section, we are faced with the more challenging data availability condition where one only have static P–V–T data and P–V observations gathered along Hugoniot without any further information about energy E. Different from the task in the previous section where both the learning of P and E are supervised, the E-learning problem here is mostly unsupervised or partially-supervised. Briefly speaking, we would like to inference energy information off-Hugoniot (a surface) with restricted observations on-Hugoniot (a curve). The absence of E information off-Hugoniot, as well as the joint learning from multiple data sources, makes the problem even more challenging. Classical approaches would require careful calibrations and severe modifications for this prediction task, while our proposed method can easily accommodate this data condition. The training strategy follows Algorithm 1, where \(E^H\) along the Hugoniot is calculated according to:
$$\begin{aligned} E^H = E_0 + \frac{1}{2}(P^H+P_0)(V_0-V^H). \end{aligned}$$
(7)
with \(E_0 = -8.9431 \, eV/atom, P_0 = 12 \, GPa\) and \(V_0 = 5.6648 \, \mathring{A}^3/atom\) following20.
Once the model is properly trained, it can be used to query P and E at arbitrary (V, T) location of interest in the considered domain. Figure 5 (top row) provides a visualization of the prediction of surfaces P(V, T), E(V, P) and E(V, T). Specifically, we would like to examine the performance of E prediction at those 20 off-Hugoniot locations where we have the true values for assessing the prediction. The percentage error in terms of RMSE is shown as the bottom row of Fig. 5. Furthermore, metrics commonly used for gauging goodness-of-fit is gathered in Table 2 (top row of EOSNN category, no regularization case). For better comparison, we also perform the energy estimation using the traditional MG EOS model whose prediction performance is also gathered in Table 2. The details of this estimation using MG EOS model is provided in SI Appendix C. Sharma et al’s original Gaussian process method in20 only reports the training method in supervised tasks where E off Hugoniot and T along Hugoniot is known, hence we did not include it in the comparison in this partially-supervised task.

Empirical density histograms of partial derivatives calculated from learned surfaces. Sign constraints set as regularizations in the algorithm are validated from observations with \(\frac{\partial E}{\partial V}|T< 0, \frac{\partial E}{\partial T}|V > 0, \frac{\partial P}{\partial V}|T < 0\) and \(\frac{\partial P}{\partial T}|V > 0\) shown in the figure. 20 bins are used for making the plot.
Specifically, we also include in Table 2 the special case when the 20 energy E values off Hugoniot are revealed to MG EOS model during training (top row in MG category, annotated with “E known”). This row can be viewed as the best possible case for MG EOS model with the optimal performance. From the table, one can observe that EOSNN outperforms traditional MG formulation based method (“E unknown” case in Table 2) in E prediction. The predicted E from EOSNN correlates quite well (high \(\rho\) value and very low p-value) with the true unknown value. The \(R^2\) is not as high, but still satisfactory with a value greater than 0.6 under this challenging task. We will demonstrate shortly that this value can be massively improved once the model gets more physical priors injected during training in the section of case study. Interestingly and surprisingly, with these physical information as regularizations during its joint learning process without knowing any true E values off Hugoniot, EOSNN is able to produce predictions comparable to traditional MG EOS model which is fitted with true E values. Figure 6 shows the histogram density plot for the partials set as constraints during training. One can observe that these numerical constraints are effective in controlling the prediction’s behavior.
Probabilistic model with uncertainty quantification
As discussed previously, the proposed method can incorporate uncertainty estimation naturally and conveniently. As a demonstration following the basic setting in the previous section, we added zero-mean Gaussian-typed noise (standard deviation \(\sigma = 0.05\)) on all the data after MinMax normalization during training (including both inputs and outputs), i.e. \(X_{r} = X + \epsilon , \epsilon \sim N(0, \sigma )\), where \(X_r\) represents each quantity (after normalization) in \((V^S, T^S, P^S)\) and \((V^H, P^H)\) for simulating scenarios in real applications where noises are present from experiments. The top row of Fig. 7 provides the probabilistic model’s prediction against the true value where the dot represents the mean value and the short bars represent the estimated standard deviation. As the bottom row of Fig. 7, we provide a visualization of model estimated uncertainty on the squared \(V-T\) domain of interest.

Prediction of the proposed probability model (left column for the pressure and the right for the energy). Top: True versus Predicted. Bottom: the uncertainty of prediction indicated by standard deviation of output on the considered V-T domain. The size of the dot corresponds linearly with the prediction error at that location, i.e. the larger the dot is, the less the error.
As one can see from Fig. 7, the produced uncertainty is reasonable and agrees with common logic on the following aspects:
-
The uncertainty is higher in regions where observed data is more sparse. The local uncertainty pattern appears to be “noisy”, this is a nature result from the compound effects of sparse data, noise added to input when sampling data for training and the model’s own uncertainty estimation via dropout39 (dropout rate is set at 0.1 for this experiment).
-
The area with relatively higher uncertainty (red region) is shown spreading larger in the E prediction than the case shown in the P prediction. Part of the reason can be explained by the fact that information on E is much less known during training than P in this task.
-
The uncertainty can help infer (to some extent) the possible prediction error. Particularly in the energy prediction where true energies were NOT revealed during training, the predicted uncertainty from our probability model can still correlates quite well with the error (indicated by the marker size). This is supported by the observation that larger dots generally scatter in regions with higher uncertainties.
Beyond providing a quantitative estimation on the random nature of physical patterns, these uncertainties can also help researchers to design and conduct more data efficient experiments via active learning approaches43. For example, one can plan to collect more data on pressure and temperature conditions that are shown with the largest uncertainties from current estimations. By repeating this process iteratively, one can gradually improve the model’s performance and reduce the uncertainty in the regions of interest. This is particularly useful in experimental settings where data collection can be expensive or time-consuming.
Case study – including more physical prior

Performance of prediction when lower bound of \(C_V\) on the domain of interest is regularized during model training. Top left: Predictions’ quality (black: \(R^2\), blue: RMSE) when different lower bounds are used in regularization during training. Lower bound set at 1.5 gives the highest \(R^2\) and lowest RMSE among all values considered. Bottom left: Empirical density of \(C_V\) on data used. Right column: Predictions of P (top) and E (bottom) when using lower bounds \(C_V =1.5 \, (eV/atom/K/6000)\) in the regularization term during learning process.
In most applications, researchers tend to have certain physical priors through experiments or experiences. These priors, once effectively quantified and being used as extra regularization terms during training, can generally help improve the prediction quality, especially in scenarios where data are limited28,44. In this section, the base task and settings used in the following experiments are the same as introduced in the previous section dealing with the partially supervised learning task with Hugoniot, i.e. given \(P-V-T\) observations from static thermal experiment and \(P-V\) observations along Hugoniot curve, one needs to give predictions for surfaces \(P = P(V,T)\) and \(E = E(V,T)\). E information is totally unknown at these 20 data points used in training, making this part of the E-learning problem unsupervised. Though challenging, the problem also provides a nice testing ground where one can study further how different forms of physical priors affect the prediction quality in the absence of abundant data. On the other hand, in thermal dynamic studies, concepts such as the specific heat \(C_V\) play an important role in many analysis. Following this motivation, We will focus on the exploration of how different priors on \(C_V\) as proper regularization can help better improve the prediction quality of \(E = E(V,T)\).
In previous sections, lower bounds of \(C_V\) is set as 0 globally, which is only a very rough prior since this lower bound is usually a positive number away from 0. Following this observation and as the first example, we try different positive lower bounds during training with values [0.5, 1.0, 1.5, 2.0, 2.5] (eV/atom/K/6000) respectively and track the \(R^2\) score as the goodness of fit for the prediction of E as well as the RMSE.
As one can observe from the left column of Fig. 8, the best prediction in terms of considered metrics are when the lower bound of 1.5 is used during training as regularization. This result is consistent with the true distributions of \(C_V\), suggesting appropriate physical priors (though rough) can indeed improve prediction through proper regularization technique. For showing this, we also prepare the empirical density of numerically estimated \(C_V\) (these info are unknown during model training) as the right panel of Fig. 8 on which we can confirm that 1.5 is the closest value among considered to the true lower bound (an ablation study on the regularization strength is included in Table F3, Appendix F in the Supplementary Information). When lower bound gets larger and closer to 1.5 from 0, the prediction performance reveals better metrics. Once the lower bound gets past 1.5 to larger values, the prediction performance compromises due to violating the underlining truth, which may confuse the model’s learning process and make the training less effective.
With this successful validation as above, we further demonstrate a second example as an additional support that is different from the previous case where one have more precise priors, but only at some (not all) points. In this experiment, we first randomly select 5 data points and estimated their \(C_V\) value (Not entire random, points spread evenly at considered temperature are usually expected and preferred for better performance. Readers can explore other selections easily with our codes). These 5 estimated values will participated in an extra regularization term during training. In this experiment, the regularization term takes the common MSE loss and the strength of regularization is set at \(5\times 10^{-3}\) (an ablation study on the regularization strength is included in Table F4, Appendix F in the Supplementary Information). We collect the error of \(C_V\) at all the 20 points as a plot in the right panel in Fig. 9. It can be noticed that, the prediction from models with aforementioned regularization provides less errors even on rest points beyond the selected five locations. This can be credited to the fact that the proposed joint learning strategy can help relate between local points and global surfaces’ properties. As for the comparison of prediction E at these 20 (V, T) locations whose true energy values not revealed during training, the metrics are gathered in Table 2 as the bottom row. These results provide supports and help validate that the proposed artificial neural network framework can incorporate physical priors conveniently and the added physical priors as proper regularization indeed can help improve the prediction quality.

Prediction quality in terms of \(C_V\) when different number of points on which \(C_V\) priors are enforced as regularization during training. Left: 20 points (diamond) used in this work20 together with the full set of data (dot) as in6. Red dashed lines ( \(\text {-}\,\,\text {-}\)) are drew for guiding the eyes that link points with the same volume. The index are the order to data points as used in20. Right: plot on errors of estimated \(C_V\) from predicted EOS surface with selected points’ \(C_V\) value participated in regularization (points 6, 11, 13, 14, 20, indicated by red in on the left panel).
It is worth mentioning that, other physical quantities of interest such as bulk modulus \(K_T\) or Grüneisen parameter (\(\gamma\)) can be regularized similarly and conveniently within our proposed framework. One can check the Figures E9 and E10 in SI Appendix E and our Github repo for more details. In both cases, we again can observe improved prediction quality with proper regularizations.
