Physics-consistent machine learning with output projection onto physical manifolds

Framework and workflow

A schematic workflow of our approach is illustrated in Fig. 1. Both systems under study are described by a set of coupled ODEs, and their numerical solution is used as the ground truth. We then train two models: an artificial NN data-driven model (Fig. 1a), and a model where the physics priors are included in the loss function, hereafter denoted as loss-based PINN (Fig. 1b). Specifically, our method is applied as follows: (i) one input vector in provided to the already trained NN model and a prediction vector f (x; Θ) is obtained; (ii) p is initilized with the output vector f (x; Θ); (iii) the constrained optimization problem is solved by Eq. (1) and the result of the optimization is the corrected output vector p. This way, p is a point that lies on the physical manifold and thus, by enforcing the physics constraint post-training at inference time, we guarantee that the final prediction is physics-consistent (Fig. 1c). Figure 1c also makes clear the intuition behind the projection method. The same process is applied to each point prediction of the loss-based PINN.

**Fig. 1: Schematic workflow of the approach in this study.**

The NN and the loss-based PINN models are used here as base models to demonstrate the proposed projection method. Note, however, that our approach is model-agnostic and can be applied to the predictions of any ML model (e.g., support vector regression or linear regression), as it requires only the physical laws, defined by g(x, p), and a point prediction from the base model, denoted by f (x; Θ). Moreover, it is worth emphasizing that our loss-based PINN, where the physical constraints are very general physical principles, corresponds to problems where the physics laws are more general than in the usual application of PINNs and where PINNs have seldom been tested.

Our analysis focuses on comparing the accuracy of the results before and after the projection of the models’ predictions. Additionally, we analyze the robustness of our method across varying model complexity and training, and discuss the physical insight into the underlying physics.

Case study 1: spring-mass system

This example highlights the ability of the projection method to handle sequential predictions and how it corrects the trajectories as they gradually deviate from the target values.

System description

The system consists of two masses, m₁ and m₂, connected in series by two springs with spring constants k₁ and k₂, and natural lengths L₁ and L₂, respectively. The first spring is connected to a fixed wall at one end and to mass m₁ at the other, while the second spring connects the two masses. The masses are restricted to moving along the x-axis, and there is no friction. The positions of m₁ and m₂ along the x-axis are denoted by x₁ and x₂, and their velocities by v₁ and v₂, respectively. The forces exerted by the springs on the masses are determined by the displacements from their equilibrium positions, following Hooke’s law.

The differential equations of motion are given by

$$\left\{\begin{array}{l}{m}_{1}{\ddot{x}}_{1}=-{k}_{1}({x}_{1}-{L}_{1})+{k}_{2}({x}_{2}-{x}_{1}-{L}_{2})\quad \\ {m}_{2}{\ddot{x}}_{2}=-{k}_{2}({x}_{2}-{x}_{1}-{L}_{2})\quad \hfill\end{array}\right.,$$

(2)

where ${\ddot{x}}_{1}$ and ${\ddot{x}}_{2}$ denote the accelerations of masses m₁ and m₂. The time-evolution of the system is determined from the solution of these equations, with the initial conditions

$$\left\{\begin{array}{l}{x}_{1}(0)={x}_{0,1},\quad {x}_{2}(0)={x}_{0,2}\quad \\ {v}_{1}(0)={v}_{0,1},\quad {v}_{2}(0)={v}_{0,2}\quad \end{array}\right..$$

(3)

Since there is no friction, the mechanical energy, E, defined by the sum of the kinetic energy of the masses and the elastic potential energy, is conserved. Hereafter we consider m₁ = 1 kg, m₂ = 1 kg, k₁ = 5 N m⁻¹, k₂ = 2 N m⁻¹, L₁ = 0.5 m, and L₂ = 0.5 m.

Data generation

The artificial NN data-driven model takes as input a state vector characterizing the state of the system at the current instant t (i.e., defining the positions and velocities of both masses at time t), and predicts the state vector at time t + Δt, for fixed Δt. This procedure enables the recursive determination of the complete trajectory from a given initial condition, with time-resolution Δt = 50 ms, as a sequence of transitions between states. We used an energy threshold ${E}_{\max }=$ 5 J to create a set of allowed states for the system, corresponding to the range of positions and velocities the masses can have with E < E_max. Eq. (2) was solved using the classical fourth-order Runge–Kutta method (RK4), with arbitrary initial conditions (3) ensuring that $E < {E}_{\max }$.

Except otherwise noted, the dataset consists of N = 100,000 arbitrarily generated input state vectors within the energy threshold and the corresponding next states, determined by performing a single Runge–Kutta computation up to time Δt (not to be confused with the Runge–Kutta time-step). To reduce the impact of the difference in feature magnitude on the model and make the training process more stable, we applied the min–max normalization. Consequently, all features were scaled to the range [−1, 1].

Trajectory prediction

The trajectory of the system is defined by the prediction of its state along 165 sequential time steps, i.e., 8.25 s. The general physical law considered in the loss-based PINN and in the projection method is energy conservation, which translates into the residual \({{{\mathcal{R}}}}_{1}=E
(4)

where a_si and b_si are the stoichiometric coefficients of species s, as they appear on the left- and right-hand sides of a reaction i, respectively. In addition, the average gas temperature T_g and the gas temperature near the tube wall T_nw are calculated from the gas thermal balance equation, the reduced electric field E/N, where N is the gas density, from the quasi-neutrality condition, the electron drift velocity v_d and temperature T_e as integrals over the electron energy distribution function, and the electron density from the discharge current. Further details are given in ref. ²⁵.

Data generation

The models include three input features (P, I, and R) and the 17 outputs just described. We generated the datasets with the LisbOn KInetics Chemistry+Boltzmann (LoKI-B+C) simulation tool^25,26. Except otherwise noted, the dataset comprises N = 1000 uniformly distributed values across the three input features within their specified boundaries. Similarly to the previous case, we applied the min-max normalization to the features, scaling them to the range [−1, 1]. Moreover, we applied a log-transformation to the features demonstrating skewness in their distribution. Further details on data generation and preprocessing are given in the “Data pre-processing” subsection in the “Methods” section.

Prediction of the steady-state plasma properties

The general physical laws constraining the system include: the ideal gas law, relating the gas pressure (input) with the species densities and the gas temperature (outputs); the imposed discharge current (input), expressing electric charge conservation and relating its input value with the tube radius (input), and with the drift velocity and electron density (outputs); and the quasi-neutrality law, relating the electron density (output) with the positive and negative ion densities (outputs). These laws can be represented by the residuals in Eqs. (5)–(7), respectively,

$${{{\mathcal{R}}}}_{1}=P-{\sum}_{i}[{X}_{i}]{k}_{{\rm {B}}}{T}_{{\rm {g}}}$$

(5)

$${{{\mathcal{R}}}}_{2}=I-e{n}_{{\rm {e}}}{v}_{{\rm {d}}}\pi {R}^{2}$$

(6)

$${{{\mathcal{R}}}}_{3}={n}_{{\rm {e}}}-{\sum}_{i}[{X}_{i}^{+}]+{\sum}_{j}[{X}_{j}^{-}]$$

(7)

where [X_i] is the density of species i, ∑_i[X_i] is the total gas density given by the sum of all the species densities, k_B is the Boltzmann constant, e is the charge of the electron, ${\sum }_{i}[{X}_{i}^{+}]$ is the sum of the positively charged ion population, and ${\sum }_{j}[{X}_{j}^{-}]$ is the sum of the negatively charged ion population.

The test-set results of the four models on the prediction of the 17 output variables and on the three physical laws are shown in Fig. 4a, b, respectively, expressed as root mean squared errors (RMSE). Applying the projection method to NN predictions reduces the RMSE of compliance with physical laws by more than 9 orders of magnitude, while improving predictive accuracy of three of the output variables: O₂(X), ${{{\rm{O}}}}_{2}^{+}$, and n_e. As it was the case in the spring-mass system, imposing very general physical laws is ineffective in guiding the loss-based PINN model, which is not able to reduce the RMSE of compliance with physical laws (Fig. 4b) nor of the predicted outputs, when compared with the NN model.

**Fig. 4: Steady-state feature estimation in the low-temperature reactive plasma system.**

Figure 4c, d provides insight into the physical interpretation of the results, by representing the main species in the mixture and examining the effects of each physical constraint individually. Ground-state O₂(X) molecules constitute 88.3% of the total mixture, and the prediction of their density improves primarily due to the ideal gas law constraint (5), which explicitly accounts for species densities. Similarly, the electron density n_e benefits from the imposition of the discharge current constraint (6), as it appears directly in the expression of this law. Finally, ${{{\rm{O}}}}_{2}^{+}$ constitutes <0.0001% of the gas mixture and, as expected, remains unaffected by the ideal gas law (5). It is the dominant positive ion by at least 2 orders of magnitude²⁵, making its prediction inherently sensitive to the quasi-neutrality condition (7). However, this constraint alone is insufficient to correct the prediction, as the electron density sets an “anchor” for the total ion density. Therefore, the accuracy of the ${{{\rm{O}}}}_{2}^{+}$ density in the projection method is coupled with the prediction of n_e, and both the discharge current (6) and quasi-neutrality (7) laws are required to improve the model’s performance regarding this output.

Robustness of the projection method

With sufficient training time and a sufficiently complex architecture, a neural network can produce predictions that closely approximate the target values, reducing (or even eliminating) the need for post-training corrections. Conversely, a poorly trained model may yield predictions that deviate significantly from the manifold defined by the physical laws, making the projection operation ineffective or leading to ambiguities associated with multiple possible solutions. In this section, we analyze the robustness of the projection method with respect to model complexity, quantified by the number of NN parameters (i.e., weights and biases) and the size of the training dataset. The results focus on the low-temperature reactive plasma system described in the “Case study 2: low-temperature reactive plasma” subsection in the “Results and discussion” section.

Ablation study

We start by comparing the errors before and after applying the projection operation to the NN predictions, considering models with different complexities. In order to guarantee a comparable analysis between the models, we considered 18 different architectures, each with 2 hidden layers and a number of neurons ranging from 1 to 1000 in the hidden layers. Both hidden layers have the same number of neurons. For each architecture, 1 NN model was trained with a dataset consisting of N = 1000 data points.

Figure 5a.i shows the RMSE of the predictions of the NN and of the projection method as a function of the number of parameters in the model. These values are calculated by averaging the errors across the 17 outputs. The “RMSE variation rate” is defined as the relative change in RMSE after performing the projection and is considered negative if the RMSE decreases. Figure 5a.ii provides a similar analysis, but for the 3 outputs that improved the most after applying the projection operation, as opposed to the mean of all outputs. The graphs show that the NN projection consistently achieves lower RMSE than the standard NN. The improvement is particularly pronounced for O₂(X), ${{{\rm{O}}}}_{2}^{+}$ and n_e predictions, in simpler models with fewer parameters (~10²). As the number of parameters increases to 10⁵, both methods reach a performance plateau, and the advantage of the projection method becomes marginal, though it still yields slightly better predictions. These results suggest that the projection method is particularly beneficial for more constrained, and hence computationally more efficient, NN architectures.

**Fig. 5: Comparative analysis of the low-temperature reactive plasma system’s model performance before and after applying the projection operation to the NN outputs.**

It is instructive to analyze the predictions of pressure-dependent trends for fixed discharge current and tube radius. Figure 5b.i–iii illustrates how NN architectures of increasing complexity predict n_e as a function of pressure for I = 30 mA and R = 12 mm, and benchmarks these predictions to LoKI simulation values. In the simplest architecture (Fig. 5b.i), the predictions of the NN deviate significantly from the targets in both magnitude and trend (RMSE = 2.55 × 10⁻²). Despite this poor initial performance, the projection successfully aligns the predictions with the LoKI targets (RMSE = 0.53 × 10⁻²), albeit with a visible lack of smoothness. In the intermediate case (Fig. 5b.ii), the projection method smooths the noisy NN curve, reducing the RMSE in ~57%. Finally, in the most complex architecture (Fig. 5b.iii), the NN starts to align with the target values, indicating that sufficient model complexity can compensate for the absence of explicit physical information, though at the cost of increased computational resources.

Small samples

We now compare the errors before and after applying the projection operation to the NN predictions, considering training datasets with different sizes. In order to guarantee a comparable analysis between the models, an architecture with 2 hidden layers and 50 neurons in each layer was used. To mitigate randomness associated with a specific dataset sample, 20 random samples are drawn for each dataset size from a larger dataset. The NN model is then trained on each sample, and the trained models are evaluated on a test set to obtain errors for each of the 17 outputs. The results are shown in Fig. 6a.i, where each point represents the mean error across the 20 samples for each dataset size, considering the average performance on the 17 output predictions. Furthermore, 11 different dataset sizes were analyzed, ranging from 20 to 2500 observations. The aggregate RMSE across all 17 output quantities decreases consistently as the dataset size grows, with both methods showing similar rates of improvement. The projection method maintains a persistent advantage of approximately −2.4% to −2.8% in RMSE variation rate compared with the NN.

**Fig. 6: Comparative analysis of the low-temperature reactive plasma system’s model performance before and after applying the projection operation to the NN outputs.**

Figure 6a.ii depicts a similar RMSE analysis as in Fig. 6a.i, but for the three most relevant individual outputs. A similar conclusion as in the previous section can be drawn: the projected predictions outperform the non-projected counterparts, particularly for smaller dataset sizes, where the model benefits the most from the addition of physical constraints. This is further confirmed by inspection of Fig. 6b, representing the predictions of the electron density as a function of pressure, for the same discharge current and tube radius as in Fig. 5b. With limited training data (Fig. 6b.i), the NN predicts a nonphysical behavior at higher pressures, with a sharp decrease above 800 Pa that contradicts the expected saturation pattern. In contrast, the projection method maintains physically sound predictions across the entire pressure range, effectively compensating for the scarce training data. As expected, the accuracy of the NN progressively improves when trained on larger datasets (Fig. 6b.ii, iii), while the projection method’s predictions remain more accurate and consistently aligned with physical expectations. These results demonstrate the role of the projection method as an efficient procedure to include physical constraints into the predictions in scenarios where scarce data is available, maintaining physically meaningful predictions and substantial error reductions with limited training samples.

Computation time

To evaluate the computational performance of our approach, we compare the total computation time required by both the NN and the NN projection methods. Figure 7a depicts the mean RMSE of the predictions for O₂(X), ${{{\rm{O}}}}_{2}^{+}$, and n_e as a function of the time required to train the models of varying complexity (described in the “Ablation study” subsection in the “Results and discussion” section) and to evaluate their predictions in a test set. For the NN projection method, the total computation time includes the NN training and inference times, as well as the additional time required to project the outputs onto the physically consistent manifold. The curves correspond to an exponential fitting function to the data and are merely a guide to the eye. We observe that models with longer training times benefit the most from the projection method. Moreover, on average, the projection step takes only 1.75 s for a dataset of 500 observations, which corresponds to a modest 4.33% increase in the total computation time when compared with the NN model. This slight increase in computation time of the projection method comes with a reduction of the RMSE by an average of 32.08%, with improvements reaching up to 63.86% in simpler architectures that require more epochs to converge.

Fig. 7: Comparative analysis of the mean RMSE of the predictions of O₂(X), ${{\rm{O}}}_{2}^{+}$ and n_e before and after applying the projection operation to the NN outputs as a function of the total computation time. These results refer to the low-temperature reactive plasma system case study.

Figure 7b illustrates the mean RMSE of the predictions as a function of the total time required to generate data sets of different sizes (described in the “Small samples: subsection in the “Results and discussion” section), train the models, and evaluate their predictions in a test set. As expected, increasing the dataset size leads to lower RMSE values for both models, but at the cost of significantly longer computation times. Clearly, the NN projection is computationally more efficient than the purely data-driven NN, despite the additional projection step. As an example, the NN trained with 200 observations achieved an RMSE of 0.050, requiring a total computation time of 4913 s. In contrast, applying the projection method to the predictions of the NN trained with only 50 observations resulted in a comparable RMSE of 0.048, but with a significantly reduced computation time of 1338 s. This represents a reduction in computational cost by a factor of ~3.7 while maintaining a nearly identical predictive accuracy.