Using deep learning to capture gravel soil microstructure and hydraulic characteristics

Generation effects of 3-D models

3-D models

Figure 3 illustrates three randomly chosen realizations for each of the three groups of real gravel soil models, which were randomly selected from the training set with a size of 64 × 64 × 64 voxels. The figure showcases different instances or examples of the real models, providing a visual representation of the variability within each group. The realizations in Fig. 3 appear to exhibit a significant similarity to the real soil model. The generated models closely resemble the characteristics and patterns observed in the real soil model, indicating that the training process has been successful in capturing the essential features of the original data. This similarity suggests that the trained models are capable of generating realistic and representative samples that align with the properties of the real gravel soil model.

A dozen methods are used to evaluate the generated results, i.e., multiple dimensional scaling (MDS)^26,27,28 and morphological descriptor, i.e., two-point probability function²⁹, two-point cluster function³⁰ and variogram¹⁶, respectively. Additionally, an evaluation of the validity of the adopted methods can be obtained by comparing the flow behavior of the generated samples with that of actual samples. The reason for adopting these approaches is explained as follows.

Multi-dimensional scaling (MDS) inspection

The quality of the generated results can be evaluated using various methods proposed by numerous researchers^{25,26,27,28,29,30,31,32,33,34}, including the multi-scale structural similarity index measure (MS-SSIM)¹⁰, mean structural similarity index measure (SSIM), Multi-Dimensional Scaling (MDS), two-point correlation function, and others. Karras et al. (2018)³² proposed that directly using the Earth Mover’s Distance (Wasserstein distance) to evaluate the structural similarity of the generated model and the training model can achieve a better evaluation effect. This method, called multi-dimensional scaling (MDS) approach based on multi-scale sliced Wasserstein distance (MS-SWD), was proposed and successfully utilized to visualize the process of bringing the data distribution of the generated data closer to the actual (training) data. Another, the MS-SSIM or SSIM approach is not suitable for evaluating the detailed structures and realism of generated porous media samples, which used to apply the morphological descriptor³³, such as two-point correlation function³⁴, to yield better evaluation results. By visually observing the simulation results of flow behavior in porous media, it is possible to assess the similarity between the generated samples and training samples and even the validity of the WGAN-GP method. This method provides an intuitive way to observe characteristics such as flow behavior and permeability, in order to determine the realism and accuracy of the generated samples. Of course, relevant hydraulic parameters and other indicators need to be considered in the evaluation process.

Multi-Dimensional Scaling (MDS) can be applied to evaluate the WGAN-GP generated model, helping to map the generated images into a lower-dimensional space for visualization and comparison in based on the modified Hausdorff distance¹⁶. The steps for analyzing WGAN-GP results using MDS are as follows:

1.

Train the WGAN-GP and generate a set of synthesized images.
2.

Apply dimension reduction technique of MDS to reduce the dimensionality of the generated images based on distance matrix.
3.

Use MDS to map the reduced-dimensional data into a two-dimensional space.
4.

Visualize and analyze the mapped data, such as labeling different categories of images with different colors or symbols, to observe their distribution and similarity in the space.

By using MDS to analyze the generated images from WGAN-GP, we can gain a better understanding of the learning process of the generator, observe the similarities and differences among the generated images, and further improve the generation performance of WGAN-GP. This analysis method helps to intuitively comprehend the outputs of generative adversarial networks and potentially provides insights for model enhancement. The following MDS plots visualization approach can help to observe the performances of GANs as shown in the Fig. 4.

Three types of gravel soil samples were trained using WGAN-GP, resulting in the generation of 3D simulated samples. Comparing the MDS distributions of the training data and the generated models for the three types of gravel soil samples at different iteration steps, it can be observed that the training data and the generated models eventually blend together to a point where they are difficult to distinguish. This is the desired outcome of GANs, as they aim to achieve a seamless integration between the real and generated samples.

Morphological descriptors

Two-point correlation function

The morphological descriptors frequently used in literatures, namely, two correlation function S₂⁽ⁱ⁾(r) and lineal-path function L⁽ⁱ⁾(r)^30,35,36. These descriptors are widely applied by many researchers^16,33,37. As they provide a good assessment of the similarity between the generative model and the training set.

The two-point probability function S₂⁽ⁱ⁾(r) is a statistical measure used to describe the spatial relationship between two points in the same phase i (e.g., pore space or solid phase) of a material or system. It quantifies the probability that two points, separated by a distance r, both belong to the same phase. This function provides insight into the spatial structure and connectivity of phases in a material. For a given phase i, the two-point correlation function is defined as:

$$\:{S}_{2}^{\left(i\right)}\left({\varvec{r}}_{1},{\varvec{r}}_{2}\right)=\langle {I}^{\left(i\right)}\left({\varvec{r}}_{1}\right){I}^{\left(i\right)}\left({\varvec{r}}_{2}\right)\rangle$$

(6)

where S₂⁽ⁱ⁾(r₁, r₂) is a statistical descriptors of random media; The vectors of r₁ and r₂ represent the position of finding two randomly-selected points in phase i; The indicator function I⁽ⁱ⁾(r) = 1 if r belongs to phase i and is zero otherwise; $\:\langle \cdot \rangle$ represents the volume average.

Lu and Torquato³⁵ introduced the so-called lineal-path function L⁽ⁱ⁾(r), which describes the probability that a random line of length r (the “path”) remains within the phase i (e.g., void or solid) of the material. This function is a measure of the extent to which the material is connected and how pathways within the phase are structured, so it requires all points that lie on a line between them to fall in the same phase and contains connectedness information along a lineal path. Formally, it can be written as

$$\:{L}^{\left(i\right)}\left(\varvec{r}\right)=\langle P\left({\varvec{r}}_{1},{\varvec{r}}_{2}\right)\rangle$$

(7)

In this work, we compared the above morphological descriptors of the training dataset and the generated realizations in different direction. Obviously, it is difficult to visually distinguish between the final images generated by WGAN-GP and the training images. In other words, this demonstrates the ability of WGAN-GP to generate images and reproduce the morphology of complex porous materials. If the S₂^(p)(r) curves of the generated samples fit well with those of the training samples, it indicates that the generated samples simulate the pore characteristics better, and the same applies to the L⁽ⁱ⁾(r) curves. The both curves of the pore phases for three samples are illustrated in the Figs. 5 and 6, respectively.

The two-point correlation function is a statistical tool used in physics to assess how objects are clustered within a system. It calculates the increased likelihood of finding two objects at a specific distance, relative to a random distribution. In Fig. 5, the probability of encountering porous spaces at various vertical distances (represented by voxel counts) is shown for three groups of generated soil models. These probabilities are compared with those from three groups of primitive soil prototypes of identical size (64 × 64 × 64). The figure offers insights into the porosity and connectivity of the generated porous media, emphasizing the differences between the generated models and the original soil prototypes.

Visually, the generated realizations closely resemble the primitive soil prototypes. The average statistics of the training sets show the best match with the realizations (as seen by comparing the solid red and blue curves) along the x direction, with a maximum absolute deviation of 0.05. However, a slight mismatch is noticeable along the z direction, where the maximum absolute deviation is 0.22.

The lineal-path function (Fig. 6), or connectivity function, shows a strong agreement between the average statistics of the training sets and the WGAN model realizations (solid red and blue curves) along the x, y, and z axes, with a maximum absolute deviation of about 0.05. The soil prototype exhibited slightly higher connectivity. This comparison suggests that, although the WGAN model realizations are not perfect, they perform better than those generated by other recent algorithms, like Simulated Annealing (SA), in terms of quality. Furthermore, when a large number of model realizations is needed, the WGAN model offers faster performance.

Minkowski function

The specific surface area and Euler characteristic are two fundamental concepts in the field of geometric topology. Together with porosity ϕ, they are also called the order zero, one and 3rd Minkowski function³⁸. They characterize a d-dimensional convex body, which forms a detailed description of the complex porous-material construction here, through a series of quantized parameters.

The specific surface area of a convex body in d-dimensional space refers to the total surface area per unit volume. It quantifies the extent of surface area relative to the volume enclosed by the convex body. In porous-material constructions, the specific surface area is crucial for understanding properties such as adsorption, reaction rates, and permeability³⁹. It is defined as:

$$\:{S}_{V}=\frac{1}{V}\int\:dS$$

(8)

where integration occurs over the void-solid interface S. The specific surface area S_V has dimensions of $\:\frac{1}{length}$ and its inverse allows us to define a characteristic-pore size.

The Euler characteristic is a topological invariant that relates the number of vertices, edges, and faces of a polyhedral surface. In more general terms, for a d-dimensional convex body, the Euler characteristic relates the number of d-dimensional cells, (d-1)-dimensional faces, …, 1-dimensional edges, and 0-dimensional vertices. For more complex shapes, such as those with holes or cavities, the Euler characteristic can be generalized using more advanced definitions, but it still serves as a key measure in understanding the topology of the object.

The Euler characteristic is used to classify different shapes and structures based on their topological properties. It is defined as:

$$\chi =V – E\,+\,F$$

(9)

where χ is the Euler characteristic, V is the number of vertices, E is the number of edges, and F is the number of faces.

The specific Euler characteristic refers to the Euler characteristic per unit volume or per unit area. It is a commonly used parameter because it normalizes the Euler characteristic to account for the size or scale of the object. This adjusted measure allows for a more scale-independent comparison of the topological features of different structures, making it more comparable across different structures. It is defined as:

$$\:{\chi\:}_{specific}=\frac{\chi\:}{V}(\text{f}\text{o}\text{r}\:\text{v}\text{o}\text{l}\text{u}\text{m}\text{e}-\text{b}\text{a}\text{s}\text{e}\text{d})\:\text{o}\text{r}\:{\chi\:}_{specific}=\frac{\chi\:}{A}(\text{f}\text{o}\text{r}\:\text{a}\text{r}\text{e}\text{a}-\text{b}\text{a}\text{s}\text{e}\text{d})$$

(10)

where χ_specific is the specific Euler characteristic; V is the volume of object (for volume-based specific Euler characteristic); A is the surface area of the object (for area-based specific Euler characteristic).

We can acquire the specific surface area and Euler characteristic from the open-source image morphological software library MorphoLibJ⁴⁰. The results of directly computing the two Minkowski functionals are presented in Fig. 7, showing comparable distributions for both the connected pores (Fig. 7a,b) and all pores (Fig. 7c,d) in the training images and the synthetic realizations.