Given high dimensional spatio-temporal trajectory data acquired through ABM simulations, the main steps of the framework are summarized as follows (see also Fig. 4, for a schematic):
-
a.
Discover low-dimensional latent spaces, on which the emergent dynamics can be described at the mesoscopic or the macroscopic scale.
-
b.
Identify, via machine-learning, black-box mesoscopic IPDEs, ODEs, or (after further dimensional reduction), macroscopic mean-field SDEs.
-
c.
Locate tipping points by exploiting numerical bifurcation analysis of the different surrogate models.
-
d.
Use the identified (NN-based) surrogate mean-field SDEs to perform rare-event analysis (uncertainty quantification) for the catastrophic transitions. This is done here in two ways: (i) performing repeated brute-force simulations around the tipping points, (ii) for this effectively 1D problem, using explicit statistical mechanical (Feynman-Kac) formulas for escape time distributions.

At the first step, and depending on the scale of interest, we discover via Diffusion Maps latent spaces using, mesoscopic fields (probability density functions (pdf) and corresponding spatial derivatives) with the aid of Automatic Relevance Determination (ARD); or macroscopic mean-field quantities, such as statistical moments of the probability density function. At the second step, on the constructed latent spaces, we solve the inverse problem of identifying the evolutionary laws, as IPDEs for the mesosopic field scale, or mean-field SDEs for the macroscopic scale. Finally, at the third step, based on the constructed surrogate models, we perform system level analysis, such as numerical integration at a lower computational cost, numerical bifurcation analysis for the detection and characterization of tipping points, and rare event analysis (uncertainty quantification) for the catastrophic transitions occurring in the neighborhood of the tipping points.
In what follows, we present the elements of the methodology. For further details about the methodology and implementation, see in the Supplementary information (SI).
Discovering low-dimensional latent spaces
The computational modeling of complex systems featuring a multitude of interacting agents poses a significant challenge due to the enormous number of potential states that such systems can have. Thus, a fundamental step, for the development of ROMs that are capable of effectively capturing the collective behavior of ensembles of agents is the discovery of an embedded, in the high-dimensional space, low-dimensional manifold and an appropriate set of variables that can usefully parametrize it.
Let’s denote, by \({{{{{{{{\bf{X}}}}}}}}}_{k}\in {{\mathbb{R}}}^{D}\), k = 1, 2, … the high-dimensional state of the ABM at time t. The goal is then to project/map the high-dimensional data onto lower-dimensional latent manifolds \({{{{{{{\mathcal{M}}}}}}}}\subset {{\mathbb{R}}}^{D}\), that can be defined by a set of coarse-scale variables. The hypothesis of the existence of this manifold is related to the existence of useful ROMs and vice versa.
Here, to discover such a set of coarse-grained coordinates for the latent space, we used DMAPs26,28,54 (see section D1 of the Supplementary Information (SI) for a brief description of the DMAPs algorithm).
For both ABMs, we have some a priori physical insight for the mesoscopic description. For the financial ABM, one can for example use the probability density function (pdf) \(\rho (X)dx={\mathbb{P}}(X
(10)
where F(i), i = 1, 2, …m are m nonlinear integro-differential operators; u(x, t) = [u(1)(x, t), …, u(m)(x, t)] is the vector containing the spatio-temporal fields, \({{{{{{{{\mathcal{D}}}}}}}}}^{{{{{{{{\boldsymbol{\nu }}}}}}}}}{{{{{{{\bf{u}}}}}}}}({{{{{{{\bf{x}}}}}}}},t)\) is the generic multi-index ν-th order spatial derivative at time t:
$$\begin{array}{ll}&{{{{{{{{\mathcal{D}}}}}}}}}^{{{{{{{{\boldsymbol{\nu }}}}}}}}}{{{{{{{\bf{u}}}}}}}}({{{{{{{\bf{x}}}}}}}},\, t):=\left\{\frac{{\partial }^{| {{{{{{{\boldsymbol{\nu }}}}}}}}| }{{{{{{{\bf{u}}}}}}}}({{{{{{{\bf{x}}}}}}}},t)}{\partial {x}_{1}^{{\nu }_{1}}\cdots \partial {x}_{d}^{{\nu }_{d}}},{\nu }_{1},\ldots,{\nu }_{d}\ge 0\right\},\\ &{{{{{{{\rm{where}}}}}}}}\,| {{{{{{{\boldsymbol{\nu }}}}}}}}|={\nu }_{1}+{\nu }_{2}+\cdots+{\nu }_{d},\,\end{array}$$
(11)
\({I}_{1}^{(i)},{I}_{2}^{(i)},\ldots \,\) are a collection of integral features on subdomains \({{{\Omega }}}_{1}^{(i)},{{{\Omega }}}_{2}^{(i)},\cdots \subseteq {{\Omega }}\):
$${I}_{j}^{(i)}({{{{{{{\bf{u}}}}}}}})={\int}_{{{{\Omega }}}_{j}^{(i)}}{K}_{j}^{(i)}({{{{{{{\bf{x}}}}}}}},\, {{{{{{{\bf{u}}}}}}}}({{{{{{{\bf{x}}}}}}}},\, t))d{{\Omega }},\,j=1,2,\ldots ;$$
(12)
\({K}_{j}^{(i)}:{{\mathbb{R}}}^{d}\times {{\mathbb{R}}}^{m}\mapsto {{\mathbb{R}}}^{d}\) are nonlinear maps and \({{{{{{{\boldsymbol{\varepsilon }}}}}}}}\in {{\mathbb{R}}}^{p}\) denotes the (bifurcation) parameters of the system. The right-hand-side of the i-th IPDE depends on say, a number of γ(i) variables and on bifurcation parameters from the set of features:
$${{{{{{{{\mathcal{S}}}}}}}}}^{(i)}= \left\{{{{{{{{\bf{x}}}}}}}},\, {{{{{{{\bf{u}}}}}}}}({{{{{{{\bf{x}}}}}}}},\, t),{{{{{{{\mathcal{D}}}}}}}}{{{{{{{\bf{u}}}}}}}}({{{{{{{\bf{x}}}}}}}},\, t),{{{{{{{{\mathcal{D}}}}}}}}}^{{{{{{{{\bf{2}}}}}}}}}{{{{{{{\bf{u}}}}}}}}({{{{{{{\bf{x}}}}}}}},\, t),\ldots,\right.\\ \left.{{{{\mathcal {D}}}}}^{{{{{{{{\boldsymbol{\nu }}}}}}}}}{{{{{{{\bf{u}}}}}}}}({{{{{{{\bf{x}}}}}}}},\, t),{I}_{1}^{(i)},{I}_{2}^{(i)},\ldots,{{{{{{{\boldsymbol{\varepsilon }}}}}}}}\right\}.$$
(13)
At each spatial point xq, q = 1, 2, …, M and time instant ts, s = 1, 2, …, N, a single sample point (an observation) in the set \({{{{{{{{\mathcal{S}}}}}}}}}^{(i)}\) for the i-th IPDE can be described by a vector \({Z}_{j}^{(i)}\equiv {Z}_{(q,s)}^{(i)}\in {{\mathbb{R}}}^{{\gamma }^{(i)}}\), with j = q + (s − 1)M. Here, we assume that such mesoscopic IPDEs in principle exist, but they are not available in closed-form. Henceforth, we aim to learn the macroscopic laws by employing a Feedforward Neural Network (FNN), in which the effective input layer is constructed by a finite stencil (sliding over the computational domain), mimicking convolutional operations where the applied “filter” involves values of our field variable(s) u(i) on the stencil, and returns features \({Z}_{j}^{(i)}\in {{{{{{{{\mathcal{S}}}}}}}}}^{(i)}\) of these variables at the stencil center-point, i.e., spatial derivatives as well as (local or global) integrals (see Fig. 5a for a schematic).

a Feedforward Neural Network (FNN). the input is constructed by convolution operations, i.e., a combination of sliding Finite Difference (FD) stencils, and, integral operators, for learning mesoscopic models in the form of IPDEs (Eq. (10)); the inputs to the RHS of the IPDE are the features in Eq. (13). b A schematic of the neural network architecture, inspired by numerical stochastic integrators, used to construct macroscopic models in the form of mean-field SDEs.
The FNN is fed with the sample points \({Z}_{j}^{(i)}\in {{\mathbb{R}}}^{{\gamma }^{(i)}}\) and its output is an approximation of the time derivative ut. In sections D2 and D3 of the Supplementary Information (SI) we describe two different approaches (gradient descent and random projections) to train such a FNN. For alternative operator-learning approximation methods, see e.g., ref. 74 and ref. 51.
Remark on learning mean-field ODEs
The proposed framework can be also applied for the simpler task of learning a system of m ODEs in terms of the m variables u = (u(1), u(2), …, u(m)), thus learning i = 1, …, m functions:
$$\frac{d{u}^{(i)}
(14)
where \({{{{{{{\bf{p}}}}}}}}\in {{\mathbb{R}}}^{k}\) denotes the parameter vector.
Feature selection with automatic relevance determination
Training the FNN with the “full” set of inputs \({{{{{{{{\mathcal{S}}}}}}}}}^{(i)}\subseteq {{\mathbb{R}}}^{{\gamma }^{(i)}}\), described in Eq. (13), consisting of all local mean field values as well as all their coarse-scale spatial derivatives (up to some order ν) is prohibitive due to the curse of dimensionality. Therefore, one important task for the training of the FNN is to extract a few “relevant”/dominant variable combinations. Towards this aim, we used Automatic Relevance Determination (ARD) in the framework of Gaussian processes regression (GPR)29. The approach assumes that the collection of all observations \({{{{{{{{\bf{Z}}}}}}}}}^{(i)}=({Z}_{1}^{(i)},{Z}_{2}^{(i)},\ldots,{Z}_{MN}^{(i)})\), of the features \({z}_{l}\in {{{{{{{{\mathcal{S}}}}}}}}}^{(i)}\), are a set of random variables whose finite collections have a multivariate Gaussian distribution with an unknown mean (usually set to zero) and an unknown covariance matrix K. This covariance matrix is commonly formulated by a Euclidean distance-based kernel function k in the input space, whose hyperparameters are optimized based on the training data. Here, we employ a radial basis kernel function (RBF), which is the default kernel function in Gaussian process regression, with ARD:
$$K_{jh}^{(i)}=k(Z^{(i)}_j,Z^{(i)}_h,{{{{{\boldsymbol{\theta}}}}}}^{(i)})=\theta_0^{(i)} \exp\left( -\frac{1}{2}\sum\limits_{l=1}^{\gamma^{(i)}} \frac{z_{l,j}-z_{l,h}}{\theta_l^{(i)}}\right);$$
(15)
\({{{{{{{{\boldsymbol{\theta }}}}}}}}}^{(i)}=[{\theta }_{0}^{(i)},{\theta }_{1}^{(i)},\ldots,{\theta }_{{\gamma }^{(i)}}^{(i)}]\) are a (γ(i) + 1)-dimensional vector of hyper-parameters. The optimal hyperparameter set \({\tilde{{{{{{{{\boldsymbol{\theta }}}}}}}}}}^{(i)}\) can be obtained by minimizing a negative log marginal likelihood over the training data set (Z(i), Y(i)), with inputs the observation Z(i) of the set \({{{{{{{{\mathcal{S}}}}}}}}}^{(i)}\) and corresponding desired output given by the observation Y(i) of the time derivative \({u}_{t}^{(i)}\):
$${\tilde{{{{{\boldsymbol{\theta}}}}}}}^{(i)}={{\arg}\min}_{{{{{{\boldsymbol{\theta}}}}}}^{(i)}} -\log{p({{{{{\mathbf{Y}}}}}}^{(i)}|{{{{{\mathbf{Z}}}}}}^{(i)},{{{{{\boldsymbol{\theta}}}}}}^{(i)})}.$$
(16)
As can be seen in Eq. (15), a large value of θl nullifies the difference between target function values along the l-th dimension, allowing us to designate the corresponding zl feature as “insignificant”. Practically, in order to build a reduced input data domain, we define the normalized effective relevance weights \({W}_{r}^{(i)}(\cdot )\) of each feature input \({z}_{l}\in {{{{{{{{\mathcal{S}}}}}}}}}^{(i)}\), by taking:
$${\bar{W}}_{r}^{(i)}({z}_{l})=exp(-{\tilde{\theta }}_{l}^{(i)}),\,{W}_{r}^{(i)}({z}_{l})=\frac{{\bar{W}}_{r}^{(i)}({z}_{l})}{{\sum }_{l}{\bar{W}}_{r}^{(i)}({z}_{l})}.$$
(17)
Thus, we define a small tolerance tol in order to disregard the components such that \({W}_{r}^{(i)}({z}_{l}) \, < \, tol\). The remaining selected features (\({W}_{r}^{(i)}({z}_{l})\ge tol\)) can still successfully (for all practical purposes) parametrize the approximation of the right-hand-side of the underlying IPDE.
Macroscopic mean-field SDEs via neural networks
Here, we present our approach for the construction of embedded surrogate models in the form of mean-field SDEs. Under the assumption that we are close (in phase- and parameter space) to a previously located tipping point, we can reasonably assume that the effective dimensionality of the dynamics can be reduced to the corresponding normal form. We already have some qualitative insight on the type of the tipping point, based for example on the numerical bifurcation calculations that located it (e.g., for the financial ABM from the analytical FP IPDE, from our surrogate IPDE, or from the EF analysis52,56), while for the epidemic ABM from the EF analysis in ref. 59. For both these two particular problem, we have found that the tipping point corresponds to a saddle-node bifurcation.
Given the nature of the bifurcation (and the single variable corresponding normal form) we identify a one-dimensional SDE, driven by a Wiener process, from data. We note that learning higher-order such SDEs, or SDEs based on the more general Lévy process and the Ornstein-Uhlenbeck process75, is straightforward.
For a diffusion process with drift, say Xt = {xt, t > 0}, the drift, μ(xt) and diffusivity σ2(xt) coefficients over an infinitesimally small-time interval dt, are given by:
$$\begin{array}{ll} &\mu(x_t)={\lim}_{\delta t \rightarrow 0} \frac{1}{\delta t}{\mathbb{E}}(\delta x_t|X_t=x_t),\\ &\sigma^2(x_t)={\lim}_{\delta t \rightarrow 0} \frac{1}{\delta t}{\mathbb{E}}(\delta x_t^2|X_t=x_t),\end{array}$$
(18)
where, δxt = xt+δt − xt.
The 1D SDE driven by a Wiener process Wt reads:
$$d{x}_{t}=\mu ({x}_{t};\varepsilon )dt+\sigma ({x}_{t};\varepsilon )d{W}_{t}.$$
(19)
Here, for simplicity, we assume that the one-dimensional parameter ε, enters into the dynamics, via the drift and diffusivity coefficients. Note that the parameter ε can be either the parameter g introduced in Eq. (3) or the parameter λ for the epidemic ABM (representing the probability that a susceptible individual may get infected). Our goal is to identify the functional form of the drift μ(x, ε) and the diffusivity σ(x, ε) given noisy data close to the tipping point via machine learning. For the training, the data might be collected from either long-time trajectories or short bursts initialized at scattered snapshots, as in the EF framework. These trajectories form our data set of input-output pairs of discrete-time maps. A data point in the collected data set can be written as \(({x}_{0}^{(k)},{h}^{(k)},{x}_{1}^{(k)}{\varepsilon }^{(k)})\), where \({x}_{0}^{(k)}\) and \({x}_{1}^{(k)}\) measures two consecutive states at \({t}_{0}^{(k)}\) and \({t}_{1}^{(k)}\) with (small enough) time step \({h}^{(k)}={t}_{1}^{(k)}-{t}_{0}^{(k)}\) and ε(k) is the parameter value for this pair. Based on the above formulation, going to \({x}_{1}^{(k)}\) from \({x}_{0}^{(k)}\) by:
$${x}_{1}^{(k)}={x}_{0}^{(k)}+\int\nolimits_{{t}_{0}^{(k)}}^{{t}_{1}^{(k)}}\mu ({x}_{t};{\varepsilon }^{(k)})dt+\int\nolimits_{{t}_{0}^{(k)}}^{{t}_{1}^{(k)}}\sigma ({x}_{t};{\varepsilon }^{(k)})d{W}_{t}.$$
(20)
Here, for the numerical integration of the above equation to get stochastic realizations of \({x}_{1}^{(k)}\), we assume that the Euler-Maruyama numerical scheme can be used, reading:
$${x}_{1}^{(k)} \, \approx \, {x}_{0}^{(k)}+{h}^{(k)}\mu ({x}_{0}^{(k)};{\varepsilon }^{(k)})+\sigma ({x}_{0}^{(k)};{\varepsilon }^{(k)}){{\Delta }}{W}^{(k)},$$
(21)
where \({{\Delta }}{W}^{(k)}={W}_{{t}_{1}^{(k)}}-{W}_{{t}_{0}^{(k)}}\in {\mathbb{R}}\) is a one-dimensional random variable, normally distributed with expected value zero and variance h(k).
Considering the point \({x}_{1}^{(k)}\) as a realization of a random variable X1, conditioned on \({x}_{0}^{(k)}\) and h(k), drawn by a Gaussian distribution of the form:
$$\begin{array}{ll}&{p}_{{X}_{1}}({x}_{1}^{(k)})={\mathbb{P}}\left({X}_{1}={x}_{1}^{(k)}| {X}_{0}={x}_{0}^{(k)},{h}^{(k)}\right) \sim \\ & \sim {{{{{{{\mathcal{N}}}}}}}}\left({x}_{0}^{(k)}+{h}^{(k)}\mu ({x}_{0}^{(k)};{\varepsilon }^{(k)}),{h}^{(k)}\sigma {({x}_{0}^{(k)};{\varepsilon }^{(k)})}^{2}\right),\end{array}$$
(22)
we approximate the drift \(\mu ({x}_{0}^{(k)};{\varepsilon }^{(k)})\) and diffusivity \(\sigma ({x}_{0}^{(k)};{\varepsilon }^{(k)})\) functions by simultaneously training two neural networks, denoted as μθ and σθ, respectively. This training process involves minimizing the loss function:
$$\begin{array}{ll} &{{{{{\mathcal{L}}}}}}(\theta|x_0^{(k)},x_1^{(k)},h^{(k)}) :=\\ &=\sum\limits_{k} \frac{(x_1^{(k)} – x_0^{(k)} – h^{(k)}\mu_{\theta}(x_0^{(k)};\varepsilon^{(k)}))^2}{h^{(k)}\sigma_{\theta}(x_0^{(k)};\varepsilon^{(k)})^2}+\\ &+{\log}|h^{(k)}\sigma_{\theta}(x_0^{(k)};\varepsilon^{(k)})^2|. \end{array}$$
(23)
which is derived in order to maximize the log-likelihood of the data and where θ denotes the trainable parameters (e.g., weights and biases of the neural networks μθ and σθ). A schematic of the Neural Network, based on Euler-Maruyama, is shown in Fig. 5b.
Locating tipping points via our surrogate models
In order to locate the tipping point, based on either the mesoscopic IPDE or the embedded mean-field 1D SDE model, we construct the corresponding bifurcation diagram in its neighborhood, using pseudo-arc-length continuation as implemented in numerical bifurcation packages. For the identified SDE, we used its deterministic part, i.e., the drift term, to perform continuation. The required Jacobian of the activation functions of the neural network is computed by symbolic differentiation. Note that this is just a validation step (we already know the location and nature of the tipping point).
Rare-event analysis/UQ of catastrophic shifts
Given a sample space Ω, an index set of times T = {0, 1, 2, … } and a state space S, the first passage time, also known as mean exit time or mean escape time, of a stochastic process xt: Ω × T ↦ Son a measurable subset A ⊆ S is a random variable which can be defined as
$$\tau (\omega ):=\inf \{t\in T| {x}_{t}(\omega )\in S\setminus A\},$$
(24)
where ω is a sample out of the space Ω. One can define the mean escape time from A, which works as the expectation of τ(ω):
$$\langle \tau \rangle :={\mathbb{E}}[\tau (\omega )].$$
(25)
For a n-dimensional stochastic process, as it is the ABM under study, S is typically set to be \({{\mathbb{R}}}^{n}\), and A is usually a bounded subset of \({{\mathbb{R}}}^{n}\). In the case of our local 1D SDE model, the subset A reduces to an open interval (a, b), with the initial condition of this stochastic process x0 also chosen in this interval.
We discuss two ways for quantifying the uncertainty of the occurrence of those rare-events. The first, presented in the main text, involves direct computational “cheap” temporal simulations of the 1D SDE, where one gets an empirical probability distribution; the second, presented only in section F4 of the Supplementary Information (SI), is a closed-form expression, based on statistical mechanics76, for the mean escape time (assuming an exponential distribution of escape times).
Computation of escape times based on temporal simulations of the 1D SDE
We perform numerical integration of multiple stochastic trajectories of the SDE to obtain an estimation of the desired mean escape time. The algorithm for estimating the escape times of a one-dimensional stochastic process with initial condition x0 on the interval (a, b) can be described as follows:
-
1.
Given a fixed time step h > 0, we perform i + 1 numerical integration steps of the SDE until a stopping condition, i.e., xt exits, for the first time ti+1, the interval at a or b.
-
2.
Record the time τ = ti, which corresponds to a realization of the escape time.
-
3.
Repeat the above steps 1–2 for K iterations, and each time collect the observed escape time.
-
4.
Compute the statistical mean value of the collected escape times that corresponds to an approximation of the mean escape time.
In our case, the initial condition x0 was set equal to the stable steady state, and the termination condition a or b at which we consider the dynamics escaped/exploded.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
