Application of physics encoded neural networks to improve predictability of properties of complex multi-scale systems

Many complex foods are in fact dispersions, which are composed of fluid or solid particles dispersed in a fluid. Their flow behaviour under deformation is expressed by the viscosity. Texture, consumer acceptance, and processing conditions are strongly related to the viscosity. To optimize formulation, it is important to get insights in the dispersion’s viscosity and its controlling factors. Similarly, this also holds for non-food materials like e.g. paints, coatings, and cosmetics. Due to the complex nature of the dispersions, the viscosity depends on the rate of deformation. For our purpose, we focus here on shear deformation.

The shear rate-dependent viscosity depends on the structure of the dispersions and the interactions between the dispersed particles. The interactions control the assembly of (primary) particles into clusters. During deformation of the complex fluid, formation and breakdown of these clusters depend on time, shear rate, particle concentrations, cluster sizes, and inter-particle and inter-cluster interactions and temperature. Many studies have been published to describe the shear-rate dependent viscosity of complex dispersions^20,21,22,23. Models to describe this kind of shear rate-dependent viscosity should contain the structural dynamics. One such model is the constitutive model by Quemada and coworkers^{24,25,26,27,28}, which describes the rheology of complex colloidal systems in a large range of volume fractions using key physical parameters. Because the model fits very well with viscosities of food dispersions^20,21,23,29 (see also Fig. 1), we used the Quemada model to study the predictive power of different NNs, and hybrid neural networks containing physics, for predicting the shear rate dependent viscosity of complex dispersions.

Quemada model

The Quemada model describes the viscosity of complex fluids in terms of an effective volume fraction, which depends on the properties of the primary particles, including diffusion coefficients, interaction parameters, packing fraction of particles in a cluster, and shear rate. Here, we will focus on the semi-stationary regime, where the system is in equilibrium at a certain shear rate. The starting point of the model is the relation between the viscosity of the fluid $\eta$ and the effective volume fraction $\phi _e$. This relation is given by^{24,25,26,27,28}

$$\begin{aligned} \eta = \eta _f \left( 1-\frac{\phi _e}{\phi _m} \right) ^{-2} \end{aligned}$$

(1)

with $\eta _f$ the viscosity of the continuous phase, and $\phi _m$ the maximum volume fraction. The effective volume fraction is given by^{24,25,26,27,28}

$$\begin{aligned} \phi _e = \phi _{pA}/\varphi + (\phi _p-\phi _{pA}) = \phi _p \left( 1 + CS\right) \end{aligned}$$

(2)

with $\phi _p$ the volume fraction of the primary particles, $C=1/\varphi -1$, a compactness factor where $\varphi$ is the volume of the particles in a cluster divided by the volume of that cluster. S is a structural parameter defined as the ratio between the volume fraction of primary particles in the cluster $\phi _{pA}$ and the total volume fraction of primary particles $\phi _p$^{24,25,26,27,28}

$$\begin{aligned} S = \frac{\phi _{pA}}{\phi _p} \end{aligned}$$

(3)

The structural parameter follows a certain kinetic reaction scheme, which, in its basic form, reads^{24,25,26,27,28}

$$\begin{aligned} \frac{\textrm{d}S}{\textrm{d}t} = \kappa _D (S_0-S) – \kappa _h (S-S_\infty ) \end{aligned}$$

(4)

with $\kappa _D$ and $\kappa _h$ characteristic relaxation rates of Brownian (diffusion) and hydrodynamic (shear stress) forces. $S_0$ and $S_\infty$ correspond to the value of the structural parameter at zero and infinite shear rate. Additional terms related to particle interactions can be added²⁸. The steady-state value of the structural parameter at a certain shear-rate $\dot{\gamma }$ is then given by^{24,25,26,27,28}

$$\begin{aligned} S = \frac{S_0 + \theta S_\infty }{1+\theta } \end{aligned}$$

(5)

with

$$\begin{aligned} \theta = \frac{\kappa _h}{\kappa _D}=\frac{\dot{\gamma }}{a^2/D_p}=Pe=\frac{6\pi \eta _fa^3\dot{\gamma }}{k_BT} \end{aligned}$$

(6)

and a is the size of the primary particle, $D_p$ the diffusion coefficient of the primary particles, $k_B$ Boltzmann’s constant, T temperature, and Pe the Péclet number. The above equations give the fluid’s viscosity in the stationary state as a function of the key parameters^{24,25,26,27,28}

$$\begin{aligned} \eta = \eta (\dot{\gamma }, \phi _p, S_0, S_\infty , C, \eta _f, a, T) \end{aligned}$$

(7)

Figure 1 shows an example of a measured flow curve of a pea protein stabilized oil-in-water emulsion, with an oil volume fraction of 0.5. The figure also shows the fit with the Quemada model with $\dot{\gamma }_c = \frac{6\pi \eta _fa^3}{k_BT}=4.7$ (corresponding to an average droplet size of $a=1$ $\mu$m), $CS_0=0.25$ (indicating that less than about 25% of the oil droplets are flocculated at zero shear rate) and $CS_\infty =0$ (indicating that all clusters broke up at a high shear rate).

Neural networks and physics-encoded neural networks

To investigate the predictive power of NNs and PeNNs, shear rate dependent viscosity data was generated with the Quemada model and used to train and test NNs and PeNNs with different architectures. In order to align with physical experiments, we aim to predict $y=\eta$ from a set of input parameters $\{x_i\}$. In an experimental setting, $\eta$ is measured as a function of the shear rate $\dot{\gamma }$, so $\dot{\gamma }\in \{x_i\}$. Various other input characteristics of the dispersion can be measured or are known. For example, when preparing a sample, the volume fraction $\phi _p$ of the primary particles is known, like the volume fraction of oil in an emulsion. Other parameters can be measured and estimated using various different, often indirect, techniques, like e.g. the particle and cluster size distributions from light scattering techniques and inter-particle forces from DLVO theory.

The NNs and PeNNs were built using Python (version 3.9.16) in combination with the open source platform for machine learning TensorFlow (version 2.11.0), which uses Keras (version 2.11.0) as the high-level API. Data was generated and used to train various NNs and PeNNs with different architectures. The shear rate $\dot{\gamma }$ and viscosity $\eta$ can vary over several orders of magnitude. To train, validate, and test the NNs and PeNNs, we used $\log \dot{\gamma }$ and $\log \eta$, so that all input and output parameters are in the order of 1 to 10, and no normalization of the input and output data is needed. We performed some extra calculations using batch normalization within the neural network layers. However, it turned out that the performance of the networks did not improve.

Different architectures of NN’s were investigated, varying in number of hidden layers and number of neurons per layer. The rectified linear (ReLu) activation function was used in all layers except for the final layer, for which we used the linear activation function because we are dealing here with a regression problem^30,31,32 and not a classification problem. As default we used the Adam optimization algorithm to train the NN’s, which is generally accepted to be one of the most efficient algorithms in machine learning. We used a learning rate of $10^{-3}$ with a decay of $5\cdot 10^{-6}$, mean squared error (mse) as loss function or performance metric, and 20% of the learning data for validation. The training was stopped when the loss function of the validation set has stopped improving in 500 epochs using the build-in TensorFlow callback function Early-Stopping.

We also performed hyper parameter tuning by investigating different NN’s with different complexities, optimization routines like Nadam, AdamW, and Lion, learning rates, and activation functions.

We investigated two cases to compare. In case one, we predict the viscosity as a function of two input parameters, while in case two, we predict the viscosity as a function of four input parameters.

Performance of the NNs and PeNNs was assessed using the performance metrics coefficient of determination $R^2$ and the mean square error (mse) between predicted and ground truth viscosity values of the test set, for unbiased evaluation. To check to what extent the performance of the studied networks differ, a graphical comparison of the metrics was performed. We also includied a Friedman test to quantify the performance differences, similar as done by Zamri et al³³. In addition, we compared the predicted and ground truth viscosities graphically.

Case 1: networks with 2 input parameters and 1 output parameter

For case one, two different NNs were studied (see Fig. 2). One with an architecture consisting of two hidden dense layers. The first layer has $n_1=32$ neurons and is densely connected to the $n_{in}=2$ input nodes, while the second layer has $n_2=8$ neurons. Another NN studied has an architecture consisting of three hidden dense layers. The first, second and third layer has $n_1=128$, $n_2=32$ and $n_3=8$ neurons, respectively. For both NNs, the first layer is densely connected to the $n_{in}=2$ input nodes, while the last layer is densely connected with the output layer having $n_{out}=1$ neuron.

The PeNN consists of 3 layers, each corresponding to a physical quantity, in other words, the activation functions are completely physics-based. In this sense, this PeNN is actually totally dominated by physics, to illustrate the importance of physics in a NN. The first layer corresponds to the structure parameter S. It has one input $x=\log \dot{\gamma }$ and activation function $S_{act}= 1/(10^xw + 1)$. Here, w is a trainable parameter and should correspond to $\frac{6\pi \eta _fa^3}{k_BT}$ as can be simply derived from Eqs. 5 and 6. The second layer corresponds to the effective volume fraction $\phi_e$ having two inputs, being the output of the S-layer and the input $\phi_p$. The activation of the $\phi_e$-layer is $\phi _{e,act}= x_1(x_2w + 1)$, with input $\varvec{x} = [x_1, x_2]$ and w a trainable parameter corresponding to C (equation 2). The third layer corresponds to the viscosity with activation function $\eta_{act} = -2\log (1-xw)+b$, where w and b are trainable parameters, corresponding to $1/\phi_m$ and $\log \eta _f$, respectively, and x is the input equal to the output of the effective-volume-fraction-layer.

In case one, only the shear rate $\dot{\gamma }$ and particle volume fraction $\phi _p$ were varied and taken as input parameters for the constitutive model to generate the steady-state fluid viscosity $\eta$. Other parameters were taken constant, being $T=293$ K, $a=5$ nm, $C=2$, $S_0=1$, $S_\infty =0$, and $\eta _f=10^{-3}$ Pa.s.

Case 2: networks with 4 input parameters and 1 output parameter

In case 2, also $S_0$ and $S_\infty$ were varied to generate flow curve data to train NNs and PeNNs with architectures, as depicted in Fig. 3. These parameters correspond to the structure parameter at zero and infinite shear rate, respectively. In this sense, this PeNN is not totally dominated by physics. In general, parameters are difficult to asses as they can be related to handling history, inter-particle forces, amount of protein denaturation, pH, salt concentration, etc. Here, we took two representative input parameters $p_1$ and $p_2$ between 0 and 1.

Similar as above, two different NNs were studied with architectures consisting of two and three dense hidden layers consisting of 32-8 and 128-32-8 neurons. For both, the first layer is densely connected to the $n_{in}=4$ input nodes, while the last layer is densely connected with the output layer having $n_{out}=1$ neuron.

The PeNN consist of 6 layers, of which the first 3 are a dense connected NN with outputs that should mimic the structure factor at low and high shear rate $S_0$ and $S_\infty$, respectively. The following and last 3 layers correspond each to the physical quantities, as explained above. The only exception is that the S-layer now has three inputs $\varvec{x} = [x_1 x_2 x_3]$ corresponding to $[S_0 S_\infty \log \dot{\gamma }$] and physics-based activation function $S_{act}= (x_1 +10^{x_3}wx_2)/(10^{x_3}w + 1)$. Here, again w is a trainable parameter and should correspond to $\frac{6\pi \eta _fa^3}{k_BT}$. The last layers are the same as described above.

Data sets

For both cases 1 and 2, various data sets were generated and used to train and test the performance of the NNs and PeNNs, varying in number of input parameters (as discussed above) as well as varying in number of data points per input parameter. The generated data was split into a training set (75%, randomly chosen) used to train NNs and PeNNs, and a test-holdout set (25%) to test the performance after training. During training, 20% of the set was used for validation.

It is noted that in rheology experiments, in general the number of data points for the shear rate $\dot{\gamma }$ is much larger than that for a parameter like the volume fraction of the primary particles $\phi _p$. This is because it is rather simple to obtain 100 or more data points ($\eta (\dot{\gamma })$) in a viscosity measurement for just one sample (with a certain $\phi _p$). For better comparison with real life experiments we therefore generated the data sets in a similar way: for each of n different $\phi _p$’s (with $\phi _p \in \{\phi _1 \ldots \phi _n\}$), N different $\log \dot{\gamma }$ (with $\dot{\gamma }\in \{\dot{\gamma }_1 \ldots \dot{\gamma }_N\}$ were chosen as input parameters, with $N>>n$. This yields $n\times N$ data triplets ($\eta (\phi _p,\dot{\gamma })$ for case 1, and $n\times N \times n_{p1} \times n_{p2}$ data quintets ($\eta (\phi _p,\dot{\gamma }, p_1, p_2)$ for case 2. Here $n_{p1}$ and $n_{p2}$ are the number of input parameters $p_1$ and $p_2$ respectively. Here we choose $n_{p1}=n_{p2}=2$. The generated data set was randomly split into a training set (75%, thus 0.75nN and $0.75nNn_{p1}n_{p2}$ data points for case 1 and 2, respectively) and a test-holdout set (0.25nN and $0.25nNn_{p1}n_{p2}$ data points for case 1 and 2, respectively). In general, the test-holdout set is used to check the performance of a NN to unseen data. Although the NN did not see the data triplets ($\eta (\phi _p,\dot{\gamma })$ of the test set, it did see numerous data with $\phi _p\in \{\phi _1 \ldots \phi _n\}$. In order to check to what extend the NNs and PeNNs can also generalize to unseen volume fractions, thus how they perform for $\phi _p \notin \{\phi _1 \ldots \phi _n\}$, another set, referred to as the test set, was created by choosing randomly $n_{tst}=10^4$ input parameters $(\phi _p,\log \dot{\gamma })$ with $\phi _p$ between $\phi _{p,min}=0.01$ and $\phi _{p,max}=0.2$ and $\log \dot{\gamma }$ between $\log \dot{\gamma }_{min}=-5$ and $\log \dot{\gamma }_{max}=3$. This test set ($10^4$ data points for case 1 and 2) was used for an unbiased comparison of the performance of the networks.