Using deep learning to capture gravel soil microstructure and hydraulic characteristics

Machine Learning


Generation effects of 3-D models

3-D models

Figure 3 illustrates three randomly chosen realizations for each of the three groups of real gravel soil models, which were randomly selected from the training set with a size of 64 × 64 × 64 voxels. The figure showcases different instances or examples of the real models, providing a visual representation of the variability within each group. The realizations in Fig. 3 appear to exhibit a significant similarity to the real soil model. The generated models closely resemble the characteristics and patterns observed in the real soil model, indicating that the training process has been successful in capturing the essential features of the original data. This similarity suggests that the trained models are capable of generating realistic and representative samples that align with the properties of the real gravel soil model.

A dozen methods are used to evaluate the generated results, i.e., multiple dimensional scaling (MDS)26,27,28 and morphological descriptor, i.e., two-point probability function29, two-point cluster function30 and variogram16, respectively. Additionally, an evaluation of the validity of the adopted methods can be obtained by comparing the flow behavior of the generated samples with that of actual samples. The reason for adopting these approaches is explained as follows.

Fig. 3
figure 3

Three groups of training images, each of size 64 × 64 × 64, are used as real data. Three random realizations, reconstructed by the WGAN-GP framework, are generated by sampling 200 random values from a Gaussian distribution (0, 1). These realizations are depicted in 3D formations in panels (ac). (d) shows a set of 2D cross-sectional views along the middle z-direction of each training image and its corresponding realizations. In all figures, the pore space is represented by a dark color. Cited from Zhu & Hu (2024)47.

Multi-dimensional scaling (MDS) inspection

The quality of the generated results can be evaluated using various methods proposed by numerous researchers25,26,27,28,29,30,31,32,33,34, including the multi-scale structural similarity index measure (MS-SSIM)10, mean structural similarity index measure (SSIM), Multi-Dimensional Scaling (MDS), two-point correlation function, and others. Karras et al. (2018)32 proposed that directly using the Earth Mover’s Distance (Wasserstein distance) to evaluate the structural similarity of the generated model and the training model can achieve a better evaluation effect. This method, called multi-dimensional scaling (MDS) approach based on multi-scale sliced Wasserstein distance (MS-SWD), was proposed and successfully utilized to visualize the process of bringing the data distribution of the generated data closer to the actual (training) data. Another, the MS-SSIM or SSIM approach is not suitable for evaluating the detailed structures and realism of generated porous media samples, which used to apply the morphological descriptor33, such as two-point correlation function34, to yield better evaluation results. By visually observing the simulation results of flow behavior in porous media, it is possible to assess the similarity between the generated samples and training samples and even the validity of the WGAN-GP method. This method provides an intuitive way to observe characteristics such as flow behavior and permeability, in order to determine the realism and accuracy of the generated samples. Of course, relevant hydraulic parameters and other indicators need to be considered in the evaluation process.

Fig. 4
figure 4

MDS plots of the relationship between the generated gravel soil models (red) and the training gravel soil models (black) in 2D space at different iterations of alternative training, for three groups of training set from (ac). As the training progress, the generated gravel soil model distribution gets closer to the training gravel soil model distribution, but after different training iteration steps to get to the optimal of the relationship between these two distributions. (a) 27.48k for Group 1; (b) 29.64k for Group 2 and (c) 27.96k for Group 3.

Multi-Dimensional Scaling (MDS) can be applied to evaluate the WGAN-GP generated model, helping to map the generated images into a lower-dimensional space for visualization and comparison in based on the modified Hausdorff distance16. The steps for analyzing WGAN-GP results using MDS are as follows:

  1. 1.

    Train the WGAN-GP and generate a set of synthesized images.

  2. 2.

    Apply dimension reduction technique of MDS to reduce the dimensionality of the generated images based on distance matrix.

  3. 3.

    Use MDS to map the reduced-dimensional data into a two-dimensional space.

  4. 4.

    Visualize and analyze the mapped data, such as labeling different categories of images with different colors or symbols, to observe their distribution and similarity in the space.

By using MDS to analyze the generated images from WGAN-GP, we can gain a better understanding of the learning process of the generator, observe the similarities and differences among the generated images, and further improve the generation performance of WGAN-GP. This analysis method helps to intuitively comprehend the outputs of generative adversarial networks and potentially provides insights for model enhancement. The following MDS plots visualization approach can help to observe the performances of GANs as shown in the Fig. 4.

Three types of gravel soil samples were trained using WGAN-GP, resulting in the generation of 3D simulated samples. Comparing the MDS distributions of the training data and the generated models for the three types of gravel soil samples at different iteration steps, it can be observed that the training data and the generated models eventually blend together to a point where they are difficult to distinguish. This is the desired outcome of GANs, as they aim to achieve a seamless integration between the real and generated samples.

Morphological descriptors

Two-point correlation function

The morphological descriptors frequently used in literatures, namely, two correlation function S2(i)(r) and lineal-path function L(i)(r)30,35,36. These descriptors are widely applied by many researchers16,33,37. As they provide a good assessment of the similarity between the generative model and the training set.

The two-point probability function S2(i)(r) is a statistical measure used to describe the spatial relationship between two points in the same phase i (e.g., pore space or solid phase) of a material or system. It quantifies the probability that two points, separated by a distance r, both belong to the same phase. This function provides insight into the spatial structure and connectivity of phases in a material. For a given phase i, the two-point correlation function is defined as:

$$\:{S}_{2}^{\left(i\right)}\left({\varvec{r}}_{1},{\varvec{r}}_{2}\right)=\langle {I}^{\left(i\right)}\left({\varvec{r}}_{1}\right){I}^{\left(i\right)}\left({\varvec{r}}_{2}\right)\rangle$$

(6)

where S2(i)(r1, r2) is a statistical descriptors of random media; The vectors of r1 and r2 represent the position of finding two randomly-selected points in phase i; The indicator function I(i)(r) = 1 if r belongs to phase i and is zero otherwise; \(\:\langle \cdot \rangle\) represents the volume average.

Lu and Torquato35 introduced the so-called lineal-path function L(i)(r), which describes the probability that a random line of length r (the “path”) remains within the phase i (e.g., void or solid) of the material. This function is a measure of the extent to which the material is connected and how pathways within the phase are structured, so it requires all points that lie on a line between them to fall in the same phase and contains connectedness information along a lineal path. Formally, it can be written as

$$\:{L}^{\left(i\right)}\left(\varvec{r}\right)=\langle P\left({\varvec{r}}_{1},{\varvec{r}}_{2}\right)\rangle$$

(7)

Fig. 5
figure 5

Probability function (PF) of finding porous space at different 3D distances in three gravel soil model groups is calculated for each facies along the x, y, and z axes. The black curves represent PF from a training set (64 × 64 × 64) based on primitive soil prototypes, while the red curves show the mean PF from 14 WGAN-GP realizations (64 × 64 × 64), with dashed lines indicating the minimum and maximum PF values at each distance.

In this work, we compared the above morphological descriptors of the training dataset and the generated realizations in different direction. Obviously, it is difficult to visually distinguish between the final images generated by WGAN-GP and the training images. In other words, this demonstrates the ability of WGAN-GP to generate images and reproduce the morphology of complex porous materials. If the S2(p)(r) curves of the generated samples fit well with those of the training samples, it indicates that the generated samples simulate the pore characteristics better, and the same applies to the L(i)(r) curves. The both curves of the pore phases for three samples are illustrated in the Figs. 5 and 6, respectively.

The two-point correlation function is a statistical tool used in physics to assess how objects are clustered within a system. It calculates the increased likelihood of finding two objects at a specific distance, relative to a random distribution. In Fig. 5, the probability of encountering porous spaces at various vertical distances (represented by voxel counts) is shown for three groups of generated soil models. These probabilities are compared with those from three groups of primitive soil prototypes of identical size (64 × 64 × 64). The figure offers insights into the porosity and connectivity of the generated porous media, emphasizing the differences between the generated models and the original soil prototypes.

Visually, the generated realizations closely resemble the primitive soil prototypes. The average statistics of the training sets show the best match with the realizations (as seen by comparing the solid red and blue curves) along the x direction, with a maximum absolute deviation of 0.05. However, a slight mismatch is noticeable along the z direction, where the maximum absolute deviation is 0.22.

Fig. 6
figure 6

Lineal-Path Function (LF) is used to analyze the pore distribution in 3D gravel soil models along the x, y, and z axes. The black curves show the correlation function (CF) from the original soil models (64 × 64 × 64), while the red curves represent the average CF from 14 WGAN-GP models of the same size. The dashed lines indicate the minimum and maximum CF values at each distance. This comparison helps assess how closely the synthetic models match the real soil distribution.

The lineal-path function (Fig. 6), or connectivity function, shows a strong agreement between the average statistics of the training sets and the WGAN model realizations (solid red and blue curves) along the x, y, and z axes, with a maximum absolute deviation of about 0.05. The soil prototype exhibited slightly higher connectivity. This comparison suggests that, although the WGAN model realizations are not perfect, they perform better than those generated by other recent algorithms, like Simulated Annealing (SA), in terms of quality. Furthermore, when a large number of model realizations is needed, the WGAN model offers faster performance.

Minkowski function

The specific surface area and Euler characteristic are two fundamental concepts in the field of geometric topology. Together with porosity ϕ, they are also called the order zero, one and 3rd Minkowski function38. They characterize a d-dimensional convex body, which forms a detailed description of the complex porous-material construction here, through a series of quantized parameters.

The specific surface area of a convex body in d-dimensional space refers to the total surface area per unit volume. It quantifies the extent of surface area relative to the volume enclosed by the convex body. In porous-material constructions, the specific surface area is crucial for understanding properties such as adsorption, reaction rates, and permeability39. It is defined as:

$$\:{S}_{V}=\frac{1}{V}\int\:dS$$

(8)

where integration occurs over the void-solid interface S. The specific surface area SV has dimensions of \(\:\frac{1}{length}\) and its inverse allows us to define a characteristic-pore size.

The Euler characteristic is a topological invariant that relates the number of vertices, edges, and faces of a polyhedral surface. In more general terms, for a d-dimensional convex body, the Euler characteristic relates the number of d-dimensional cells, (d-1)-dimensional faces, …, 1-dimensional edges, and 0-dimensional vertices. For more complex shapes, such as those with holes or cavities, the Euler characteristic can be generalized using more advanced definitions, but it still serves as a key measure in understanding the topology of the object.

The Euler characteristic is used to classify different shapes and structures based on their topological properties. It is defined as:

$$\chi =V – E\,+\,F$$

(9)

where χ is the Euler characteristic, V is the number of vertices, E is the number of edges, and F is the number of faces.

The specific Euler characteristic refers to the Euler characteristic per unit volume or per unit area. It is a commonly used parameter because it normalizes the Euler characteristic to account for the size or scale of the object. This adjusted measure allows for a more scale-independent comparison of the topological features of different structures, making it more comparable across different structures. It is defined as:

$$\:{\chi\:}_{specific}=\frac{\chi\:}{V}(\text{f}\text{o}\text{r}\:\text{v}\text{o}\text{l}\text{u}\text{m}\text{e}-\text{b}\text{a}\text{s}\text{e}\text{d})\:\text{o}\text{r}\:{\chi\:}_{specific}=\frac{\chi\:}{A}(\text{f}\text{o}\text{r}\:\text{a}\text{r}\text{e}\text{a}-\text{b}\text{a}\text{s}\text{e}\text{d})$$

(10)

where χspecific is the specific Euler characteristic; V is the volume of object (for volume-based specific Euler characteristic); A is the surface area of the object (for area-based specific Euler characteristic).

We can acquire the specific surface area and Euler characteristic from the open-source image morphological software library MorphoLibJ40. The results of directly computing the two Minkowski functionals are presented in Fig. 7, showing comparable distributions for both the connected pores (Fig. 7a,b) and all pores (Fig. 7c,d) in the training images and the synthetic realizations.

Fig. 7
figure 7

Comparison of specigical surface area and specifical Euler characteristic of every group of realization. They show good agreement that the most of the value of prototype models (red cross) are located at the range of the realization boxes. (a,b) are for the connected pores; (c,d) are for the whole pores including the connected pores and the dead pores.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *