Automated generation of structure datasets for machine learning potentials and alloys

Fundamentals of ASSYST

First we describe the details of ASSYST. Then we discuss the final training data and error metrics obtained from the fit. Next, we test the potential first on bulk and defect structures directly against DFT, before moving to the binary phase diagrams Mg/Al, Al/Ca, and Mg/Ca and compare them to experiment. Finally, we discuss whether the residual errors that we observe originate from interpolation or extrapolation errors.

The initial structure generation is based on our earlier approach¹¹ and extends it to multicomponent systems. First, periodic structures with a given composition and space group symmetry are generated, then relaxed and finally randomly perturbed. Once all structures are obtained, highly converged DFT point calculations are performed on all of them, with the parameters given in Section “DFT data generation”. A schematic depiction of this process is shown in Fig. 1.

ASSYST-Input

ASSYST requires very few external parameters to generate the training set. Also, the values we have identified change very little with the specific target system and can therefore be used as fixed (generic) parameters. The structure generation consists of three steps described by the following input parameters:

the number of atoms, or the composition of possible starting configurations. We call this the stoichiometry in the following. This is the main convergence parameter. Higher values will sample phase space denser, but generate larger and computationally more costly training sets;
how structures are relaxed specifically and whether samples are also taken along the relaxation trajectories;
the amount of random perturbations to add to the final structure set.

In the following, we will explain each of the steps and their parameters in detail. Where they can be changed in our implementation, we will only give a variable name in the explanation here. At the end of this section, we will introduce and discuss the numerical value we have chosen for this work.

Construction of initial structures

The main parameter of ASSYST is how many atoms are in the initial structures and what is their composition. Given a list of allowed occupations per element and structure, say 1–10 atoms, ASSYST first generates a list of all possible stoichiometries. For alloys we simply try all possible combinations, optionally limiting the total number of atoms per cell to n_total. Then for each of these stoichiometries, e.g., 2 Mg, 4 Al and 1 Ca, n_SPG random crystals are generated for each space group, 1–230. Of course, this could explode the number of structures very quickly if one were to allow large supercells, but we have found that structures of 8–10 atoms are generally sufficient to obtain good potentials. Not all stoichiometries can be realized in all space groups and naturally the simpler the atomic ratio (concentration) the more stoichiometries exist to realize them. This results in some concentrations being sampled more than others and leaves gaps where no structures are possible within the supercell limit we have chosen. We will verify in Section Verification with respect to DFT that MLIPs fitted with ASSYST perform well also in these gaps.

Relaxation of structures

This initial structure set is then relaxed within DFT at low convergence parameters. As the plain wave energy cutoff we use the largest of the element specific VASP^20,21 defaults and sample reciprocal space with a minimum spacing between k points of 0.5 Å⁻¹ (KSPACING = 0.5 in the VASP^20,21INCAR file). First, we relax each structure with respect to volume, keeping cell shape and scaled atomic positions fixed. Then, we fully relax the resulting structures, i.e., allowing cell shape, size, and atomic positions to vary. The relaxations are performed as separate VASP^20,21 runs to provide additional training points along the relaxation path and to avoid undue influence of Pulay stresses on the geometries. The final structures of both relaxation runs are collected and added to the training set separately. However, evenly spaced samples along the relaxation trajectory can also be taken. As mentioned above, the energies and other observables from these runs are discarded. Only the atomic structures on the relaxation path are used to recalculate energies, forces and stresses for the final training set with high convergence parameters.

The computational expense of this procedure is very modest compared to the final point calculations, allowing us to generate training structures that explore the potential energy surface without human input. Of course, multiple structures can relax to the same minimum (or close to each other), but we make no attempt to filter for such “redundant” structures. Intuitively, the wider the basin of attraction of a certain minimum energy structure, the more often a structure will fall into it, but also the more relevant it is for the material system. As such the redundant structures can help to sample the relevance of such basins automatically.

Adding random perturbations

Starting from the final set of fully relaxed structures we apply random perturbations on the cell shape, size, and atomic positions to sample the environment of minima on the potential energy surface thoroughly. Here again, the redundant structures increase the sampling of the relevant minima without resorting to human intuition or expert knowledge.

For every relaxed configuration, we generate n_rattle new structures with their positions changed by normally distributed noise with standard deviation σ_rattle and a small uniformly random strain matrix up to ϵ_rattle equal to 0.05 for diagonal (strain) entries and 0.005 for off-diagonal (shear) entries. This coupling between random displacements and small strains ensures that the fitted potentials do not contain unexpected holes that could be encountered e. g. when running NpT molecular dynamics simulations near the equilibrium structures.

Then again for every relaxed configuration we generate n_stretch new structures with their cells changed by random matrices. Materials tend to be much stiffer in the shear directions than in the diagonal strain directions so sampling the diagonal strain is more important. Therefore we pick a “mostly diagonal” matrix in 70% of the cases with diagonal entries uniformly distributed up to ϵ_hydro and off-diagonal entries up to 0.05 or in the other 30% a “mostly off-diagonal” matrix with diagonal entries up to 0.05 and off-diagonal entries up to ϵ_shear.

Combining and optional filtering

The final training set is the combination of the initial random crystals, relaxation results, and perturbed structures. Because the structures generated especially during the initial phase can be significantly out of equilibrium it is important to filter them. Both PYXTAL¹⁸ and RANDSPG²² can sometimes generate unit cells of very high aspect ratio, i. e. the longest basis vector divided by the shortest one. We discard here any cell where this ratio exceeds an empirically found threshold of 6. We also make sure that the PAW spheres of the atoms do not overlap in any of our structures by checking the first nearest neighbor distances. This also helps the MTP fit, as otherwise the potentials have to learn the very steep but (usually) physically not very interesting repulsive interactions at close distances, which we have found can negatively affect their performance on other properties. Afterwards we may discard structures with very high energies or forces as well.

Intuitive understanding

One may ask how such a training set of very small supercells can possibly describe larger structures of practical relevance. We will show explicitly and quantitatively in Section Interpolating vs. extrapolating that all defects and phases we consider in this work are not extrapolative, i.e., they are in a strong sense included in ASSYST generated training sets. Here we want to build up some intuition for this result. One can get a simple mental picture by realizing that the different radial functions used in MLIPs essentially slice the atomic neighborhood into different “generalized” shells. In the case of MTPs, this is done by the functions f_μ, see (2), but other potential types have similar functions. When predicting a new structure, descriptors made up of all radial functions are combined (linearly in the case of MTP) and thus can accurately “extrapolate” as long as each “slice” seen by the radial functions is represented in the training set.

We illustrate this here for the case of an interstitial in 2D in Fig. 2. In this figure, we plot the atomic density for three sample structures, one per row. The leftmost panel shows the full structure, while the three panels on the right are densities after convolving with Gaussian radial functions, f_i in the figure, centered on the colored circles shown in the left panels. Suppose we had two structures in our training set, a bcc-like structure and a more complicated but still ordered tetragonal-like structure. We see that our three radial functions pick up different parts of the total atomic density. When our potential now tries to predict an interstitial (bottom row), we find that this (local) atomic density is actually very well approximated by a combination of f₀ from the bcc structure and f₁, f₂ from the tetragonal structure. This works for MTPs because the final potential is a linear combination of each descriptor function and assumes that the training set is sufficiently diverse to sample the descriptor space well, which is exactly what our method tries to do. The three columns on the right can also be thought of as graphical representations of an abstract feature matrix used to fit the potential. Since a MLIP based on atomistic density features then combines them again for the final prediction, we understand that a potential trained only on the first two rows (labeled “Training Set”) can accurately predict the last row (labeled “Prediction”). Of course, this is a toy illustration, and reality is significantly complicated by the fact that a point defect is not described by just one central atom, and radial functions are in general not simple Gaussians. But the above discussion makes it plausible that an isolated point defect can be approximated from data obtained only on periodic structures. In fact, since realistic radial functions may be negative and more spread out, real MLIPs would have more possibilities and could freely linearly combine atomistic densities seen during training.

**Fig. 2: Cartoon atomistic density of three example 2D structures and interstitial.**

Another, more abstract way to think about this is in terms of sampling the atomic density. A large number of MLIP descriptors, moment tensors among them, can be expressed as functions of a local atomic density

$$\rho (r)=\sum _{i}\delta (r-{r}_{i}),$$

where i enumerates atoms in an atomic structure with positions r_i²³. A perfect training set would sample the space of all ρ evenly and densely. We can get some intuition for this sampling by considering instead the Fourier transform of the atomic density, $\tilde{\rho }(k)={\mathcal{F}}[\rho ]$. While this is not how MTPs are technically implemented, it provides the insight that the outer cutoff radius, R_c, of local potentials and the minimal distance enforced by Pauli repulsion between atoms, say R_i < R_c, essentially act as a band filter. Any atom centered local potential can only make use of information from $\tilde{\rho }$ if $| k| \in \left[\frac{1}{{R}_{c}},\frac{1}{{R}_{i}}\right]$ and only this region needs to be sampled by a training set. If we exclude long-range effects such as unscreened Coulomb interactions, the “near-sightedness” of quantum mechanics²⁴ ensures that features at the short wavelength limit will be most important, i.e., represented by a small unit cell. Additionally, the “slicing” effect discussed in the previous paragraph, means that each descriptor function only needs to learn a small part of the k-dependency of $\tilde{\rho }$.

Parameters and structures for this work

The full list of hyperparameters and steps is given in Section “Input parameters for ASSYST”. After generating the structure set, we filter out all the data with small interatomic distances and high forces. Structures with pair distances smaller than the cutoff radii, r_i, per element are discarded, d_ij < r_i + r_j. We choose values for the radii close to the PAW sphere radii in the VASP^20,21 pseudopotentials as r_Mg = 0.9 Å, r_Al = 0.95 Å and r_Ca = 1.15 Å. To quantify “high forces” in this context we calculate the dimer curves of all combinations of Mg, Al and Ca and take the force acting on the dimer when the PAW spheres start to overlap. In our system these forces are of the order of 5 eV/Å to 10 eV/Å and so we filter all structures with forces higher than 5 eV/Å. Data availability at short distances of course influences the stability of the potential under pressure, but the details depend on the MLIP framework and the application targeted. In Supplementary Note 1 we check that this MTP is well behaved up to distances of 1.5 Å.

In total, we obtain 116,690 structures and 655,766 atomic environments this way. The distribution of the training points in the ternary concentration space is shown in Fig. 3. Well visible is the discrete sampling that results from the small cells and their limited stoichiometry that we use for training. Changing the total number of atoms (n_total in the discussion above) it would be possible to arrive at denser or sparser training sets. This also controls the total number of structures ASSYST generates.

**Fig. 3: Stoichiometries present in training set.**

In Fig. 4 we plot the full training set as a histogram over the atomic energies and volumes. The energies span a wide range of just over 10 eV/atom. In the histogram, we observe that the majority of points fall within 16 Å³/atom to 42 Å³/atom and −3.7 eV/atom to −1.5 eV/atom, corresponding to the bulk volumes and cohesive energies in this ternary system, cf. Supplementary Note 2 and Table 1, though substantially more unfavorable configurations are covered as well. It is somewhat remarkable that ASSYST finds and covers this relevant range of structures well, but it may be understood in the sense that it is essentially a basic version of AB-INITIO RANDOM STRUCTURE SEARCH (AIRSS)²⁵. As we will see below in Section Verification with respect to DFT this property of the training set allows potentials to transfer exceptionally well to a variety of crystal structures.

**Fig. 4: Energy-volume plot of training data.**

We developed this procedure iteratively from pure Mg over binary Mg/Al to the full ternary system and as such there are some variations in the hyperparameters we ultimately used. For the pure Mg structures, we used the training set already published in¹¹ using RANDSPG²². For the Mg/Al structures we used a very similar procedure using RANDSPG²² as well. The other systems follow the general method described in this work. The structures for pure Al were already part of another publication²⁶, though we have recomputed them here with consistent DFT settings.

Given the cost of large DFT datasets, we did not repeat the training set generation with a fully consistent set of parameters. Nevertheless, we do not observe significant differences in the quality of the description of the MTP between elements, suggesting that there is some room for improvement in obtaining a minimally sufficient training set for alloys. We leave this exploration for future work.

Fitting results and training errors

The MTP training errors are collected in Table 1. We report there the Root Mean Squared Error (RMSE) and the Mean Absolute Error (MAE),

$$\begin{array}{rcl}{\rm{RMSE}}&=&\sqrt{\frac{1}{N}\mathop{\sum }\limits_{i}^{N}{\left({o}_{i}-{\hat{o}}_{i}\right)}^{2}}\\ {\rm{MAE}}&=&\frac{1}{N}\mathop{\sum }\limits_{i}^{N}\left\vert {o}_{i}-{\hat{o}}_{i}\right\vert ,\end{array}$$

where i runs over all observations. The hat indicates the DFT ground truth and o is the observable from which the error was calculated, such as energy per atom, force on an atom or the defect energy. The obtained energy MAE of 8.4 meV/atom indicates a good fit.

Table 1 Training Errors Root mean square (RMSE) and mean absolute errors (MAE) for energy and forces on the training set

We do not report separate test errors as we do not split off from the training set a separate test set as is commonly done. Error metrics are strongly dependent on the underlying data distribution. Since our training data here is highly randomized and widely distributed, such errors are not immediately interpretable. High errors at extreme energies and volumes could skew these metrics to larger values, but might not impact applicability. Also, very similar well-described training points could hide deviations in less well-sampled phase space regions. Therefore we will construct a completely separate test set in Section Verification with respect to DFT and quote appropriate errors there.

Verification with respect to DFT

Our test set contains 2234 structures. They are described below and contain bulk properties, energy volume curves, point and planar defects, and sampling of solid solution phases. None of these structures entered explicitly into the fit. Moreover, these structures generally have larger unit cells or different stoichiometry than the training set structures, and so provide a relevant and challenging out-of-domain test set.

We start with a set of 22 structures from the MATERIALS PROJECT that are within 10 meV of the ternary convex hull. Some details are summarized in the Supplementary Table 1 and Fig. 5 shows a few selected structures.

**Fig. 5: Example structures present in train and test data.**

All structures are relaxed separately with DFT and the MTP, before we obtain the equilibrium energy E₀, the lattice parameter, a, and the atomic volume Ω₀. Table 2 compiles the mean absolute error (MAE) on this test set. For these properties the MTP achieves relative errors below 1 %, indicating good transferability of the ASSYST-trained MTP to unseen structures.

Table 2 Test Errors on Bulk Properties Mean absolute error (MAE) on basic properties (equilibrium energy, E_eq, volume V_eq, lattice parameters, a, bulk modulus, B, and its pressure derivative ${B}^{{\prime} }$) of 22 test structures

Figure 6 shows the convex hull for the involved binary systems. Deviations from DFT are generally within the training error. However, this creates some issues for the Mg/Al system, where the MTP overestimates the binding of the MgAl₂ compound. Due to the small enthalpy of mixing, 10meV/atom to 20meV/atom, in the range of training error itself, describing this correctly is a hard problem. Accurate predictions of the convex hull require not only low absolute errors, but also guarantees as to the relative ordering of the phases, something that is not accounted for by common regression algorithms. Methods to improve this are discussed in the literature^27,28, but we believe them to be orthogonal to the problem of generating comprehensive training data that we study in this work.

**Fig. 6: Convex Hull diagrams predicted by MTP and DFT.**

We also check how well the potential reproduces elastic properties by applying volumetric strains to each of the test structures. For each of the 22 test structures we apply 40 hydrostatic strains between −70% and 400% volumetric strain. Where such large (negative) strains would cause the total volume to be less than 10 Å³/atom, we omitted them to avoid excessive k mesh requirements and overlapping PAW spheres. Figure 7 shows the E-V curves. From them we obtain the bulk modulus, B, and its derivative with respect to pressure, ${B}^{{\prime} }$, for all 22 test structures according to

$$B=V\frac{{\partial }^{2}E}{{\partial }^{2}V}\quad {\rm{and}}\quad {B}^{{\prime} }=\frac{\partial B}{\partial P}$$

for both the potential and the DFT reference structures. In Supplementary Note 3 show additional errors on a wider range of structures, that could not be included in the much more extensive tests below, in Supplementary Note 4 we compare phonon dispersions between MTP and DFT and find good agreement.

**Fig. 7: Verification of MTP predicted Energy–Volume curves against DFT.**

We obtain an extensive set of point defects by taking all structures from the test set introduced above, except the ternary structures, and systematically add vacancies, antisites and some interstitials to them. Symmetrically inequivalent lattice positions in each structure are obtained from SPGLIB²⁹ and then either removed (vacancies) or have their type changed (antisites). Tetrahedral and octahedral interstitials are identified by looking for positions in between lattice sites with a prescribed number of neighbors, high Voronoi volume and approximately regular connection polyhedra³⁰. More complex, but perhaps physically more relevant interstitials such as dumbbells were not explicitly considered. This results in 438 different point defect structures.

Figure 8a compiles the results of all defects considered. The overall MAE is around 146 meV. This error is comparable to the accuracy of other MLIPs in the literature, which have been fitted specifically for the description of point defects³¹. The fact that the ASSYST-fitted MTP is able to describe these with comparable accuracy without being explicitly fitted to them supports the qualitative account of the interpolation behavior that we have sketched out in Fig. 2. It appears that once a training set is sufficiently diverse, MLIPs are able to successfully piece together different structural environments, even if the exact structure predicted is not present in any of the training structures.

**Fig. 8: Correlation plots for the defect energies.**

We now investigate more complicated planar defects and construct a set of surfaces and grain boundaries in the pure phases of all three elements, including segregated structures. Their construction is described in detail in the Supplementary Note 5.

A correlation plot of all 267 surfaces between MTP and DFT is shown in Fig. 8b. All energies are obtained from a single static calculation, i.e., without relaxation of any kind. MTP and DFT agree very well, especially considering that no defect or surface structures were explicitly included in the training. While surprising at first glance, we attribute it to some structures in the training set that delaminate during the volume relaxation steps of the training set construction¹¹. These structures form quasi-2D crystals separated by vacuum and are apparently sufficient to provide enough information for the potential to extrapolate to the true surface structures.

Figure 8c plots the error correlation of all 147 grain boundaries. To avoid too close atoms we relaxed each grain boundary with the MTP before comparing the defect energies obtained by point calculations between DFT and MTP. The MTP achieves an MAE of 0.9 meV/Å², indicating again that our training set covers a wide range of local environments that are sufficient to describe planar defects.

We now turn to the question of how well the MTP can interpolate between the coarse concentration grid of the training set and the finer concentrations available in larger structures. To this end, we prepare test sets based on the hcp, fcc, and bcc lattices. For all three lattices we prepare Special Quasirandom Structures (SQS)³² with 32–54 atoms for the binary and 16–32 atoms for the ternary compositions and relax the volume of the structures with DFT and MTP independently. The initial volumes for the structures were set according to Vegard’s law, starting from the volume of the stable elemental bulk phases. This results in 99 structures for each of the binaries and 324 structures for the ternary, or a total of 603 structures. We calculate the formation energy for every prototype and every concentration by

$${E}_{f}({c}_{{\rm{Mg}}},{c}_{{\rm{Al}}},{c}_{{\rm{Ca}}})=E({c}_{{\rm{Mg}}},{c}_{{\rm{Al}}},{c}_{{\rm{Ca}}})-\sum _{X\in \{{\rm{Mg}},{\rm{Al}},{\rm{Ca}}\}}{c}_{X}{E}_{X}$$

where E_Mg, E_Al, E_Mg are the per atom energies of the stable prototypes for all three elements at their equilibrium volume. The obtained errors in E_f are shown in Fig. 9. They are generally within 5meV/atom to 10 meV/atom, except for the Al-rich bcc structures, where errors can be up to 25 meV/atom. Since sampling in concentration space is uniform and since we detect no extrapolation for any SQS structure, c. f. Fig. 3 and Section Interpolating vs. extrapolating, it appears that the potential energy surface of Al-rich bcc structures is harder to fit than for the other structures, though it is not particularly relevant for applications, since this phase is thermodynamically unstable at standard conditions.

**Fig. 9: Accuracy of MTP for solid solutions.**

Phase diagrams

As a first practical application, we compute the temperature-composition binary phase diagrams of the Mg/Al/Ca system. Much effort could be expended here to obtain high-quality phase diagrams. However, for the purpose of potential validation, we will choose computationally efficient approximations that can be scaled over a large number of potentials during the testing phase and that can serve as a baseline for future research.

We will construct our phase diagrams using a set of structures obtained from the Materials Project³³ and calculate their atomic Gibbs free energy G(T, p = 0) with CALPHY³⁴. The chosen prototype phases are compiled in the Supplementary Note 6 and the construction of the phase diagrams is described in Section “Phase Diagram Construction”.

Non-stoichiometric behavior is modeled by including all possible antisite point defects. For computational efficiency, we consider only the excess enthalpy of these point defects without computing their temperature dependence. Our restriction to antisites is not general, but sufficient for the binary systems considered here. Al/Ca and Mg/Ca consist entirely of line phases, leaving only fcc-Al, hcp-Mg and ${\beta }^{{\prime} }$–Mg₂Al₃ as phases with finite solubilities. Vacancy mediated off-stoichiometry is not possible for the terminal phases and has been ruled out for γ–Mg₁₇Al₁₂ by³⁵. A comparison between defect enthalpies and entropies in the same work shows that even at 1000 K the defect entropy is only 10% of the defect-free energy for typical point defects in Mg/Al, suggesting to us that for a first order approximation of the phase diagram neglecting the entropies is justified. In the Supplementary Note 7 we estimate the effect of errors in the free energy on the transition temperatures in pure calcium. As expected, even small differences can lead to qualitative changes, so we leave the exact calculation of the phase diagrams of the MTP to future work.

Mg/Al The relevant phases for this system are hcp Mg, fcc Al, ${\beta }^{{\prime} }$–Mg₂Al₃, γ–Mg₁₇Al₁₂ and ϵ–Mg₂₃Al₃₀³⁶. Additionally, we included the MgAl₂ phase, which is experimentally metastable³⁷ but on the DFT convex hull according to the Materials Project³³. Figure 10 shows the phase diagram calculated using the above approach with the MTP. Compared with recent ab-initio calculations, CALPHAD assessments and experiments^{36,37,38,39,40,41} the qualitative agreement is very good, reproducing the melting temperatures and liquid-solidus lines well, within about 50 K to 100 K. ${\beta }^{{\prime} }$–Mg₂Al₃ shows a finite solubility toward the aluminum side. While most assessments model this phase as a line compound, some experiments do report a similar solubility⁴⁰, but the data are generally rare. ϵ–Mg₂₃Al₃₀ is shifted to lower temperatures as a whole and the asymmetry of γ–Mg₁₇Al₁₂ is exaggerated. In assessments, this phase shows very low solubility towards the Mg-rich side, but maintains a constant phase boundary, whereas the MTP prediction shows a receding boundary towards the Al-rich side. Connected to this, the stability of the liquid is overestimated pushing the γ–Mg₁₇Al₁₂ + hcp-Mg → liquid eutectic point down and connecting γ–Mg₁₇Al₁₂, ${\beta }^{{\prime} }$–Mg₂Al₃ and the liquid in a peritectic rather than eutectic point. It seems likely that this is due to our simplified modeling of point defects, which may fail at high temperatures in this phase. The solubility is too low on the fcc Al side, probably because we assumed the defect-free energy of formation to be constant rather than decreasing at higher temperatures due to its excess entropy. However, all experimental phases are present and of the qualitatively expected shape. All three intermetallic phases have significantly larger unit cells than the training data, indicating that our small training cells are sufficiently diverse to sample a very large part of the phase space.

As a reference, we repeat all calculations with a Modified Embedded Atom Potential (MEAM)⁴². The computed phase diagram is shown in Fig. 10c. The solubilities of the terminals follow a similar trend as the ones for the MTP, suggesting that likely our simplified modeling limits the comparison to experiment here. Generally, the liquid is too stable and liquidus lines are about 100 K too low. Interestingly, the γ–Mg₁₇Al₁₂ is missing here, although it is reported to be on the convex hull at T = 0 K. In addition, the Al₂Mg phase is strongly overstabilized, displacing the ${\beta }^{{\prime} }$–Mg₂Al₃ and ϵ–Mg₂₃Al₃₀ phases from the phase diagram. To partially remove these artifacts, we have constructed a metastable phase diagram where we have manually removed the overstabilized phase (see Supplementary Note 8). This illustrates that even for the EAM or MEAM class of potentials, incomplete training data can easily lead to erroneous extrapolation. In contrast, the MTP ASSYST-fitted presented here does not suffer from such problems, even though the construction of the training set does not explicitly consider structures on the convex hull.

Mg/Ca andAl/Ca Both the Mg/Ca and Al/Ca binary systems exhibit only line phases with essentially no compositional variation. The C14-Mg₂Ca, bcc-Ca+Mg, hcp-Mg+Ca, bcc/fcc-Ca+Al and C15-Al₂Ca phases show finite solubility in the latest CALPHAD evaluation³⁶, but less than 1%. Here we model both systems fully in terms of line phases together with a regularly mixing melt. Details are given in Supplementary Note 6.

The predicted phase diagrams are shown in Figs. 11 and 12. Most notably, the MTP misses the AlCa and Al₃Ca₈ phases, but otherwise, the critical points and reaction temperatures are reproduced within approximately 100K.

Interpolating vs. extrapolating

We wish to show that the ASSYST generated training set is complete in the sense that for a wide variety of applications the potentials remain stable and give accurate results without continuous adjustment. What we have presented so far implies that the MLIP guarantees reliable results over a wide range of conditions, but the question remains whether this is really a consequence of the training set or rather a peculiarity of the material system we are working with. In other words, is the MLIP, e.g., in Fig. 8, extrapolating well or interpolating in between training configurations? The latter would imply that we can expect this procedure to generate training data to transfer to other material systems as well. The MLIP-2⁴³ package optionally reports the extrapolation grade of a structure with respect to a given training set. These grades are larger than one when a potential is extrapolating and smaller than one when interpolating^44,45. We collect first all structures used in the verification in Section Verification with respect to DFT, a total of 2234 configurations, and calculate their extrapolation grade. Less than 5% of the structures have an extrapolation grade larger than one, i.e. are in the extrapolative regime. Only 1% exceed grades larger than two, a limit that is commonly quoted for reliable extrapolation. The maximum grade was 8.75 for some highly compressed Al structures. All cases of high extrapolation grades occur exclusively in the strongly repulsive limit. More details are shown in Section Verification with respect to DFT. It is striking that none of the structures tested in Section Verification with respect to DFT are extrapolative, which implies that ASSYST generated data sets cover the full space of concentrations despite their small structure sizes.

We also obtained all structures from Materials Project³³ containing only Mg, Al and Ca, an additional 150 structures. Here the maximum grade was 1.23, with only three structures having extrapolation grades larger than one. This shows that the training set is indeed complete, and the procedure for generating training structures can be expected to transfer well to other material systems.

Additionally as part of the free energy calculations for Section Phase Diagrams we ran MD for multiple hundred thousand timesteps for multiple phases over a wide range of temperatures without failure or instability or the need to refit the MLIP, even for liquids. This implies that the ASSYST based MLIP is robust, stable and can be used for a wide range of applications.

Any errors on structures that do not show extrapolation must consequently be interpolation errors. Since we do not observe significant extrapolations here, the errors e.g., in Fig. 8, Fig. 9 must also be interpolation errors. The description of these quantities might be improved by adding corresponding structures manually or by running ASSYST with a larger n_total and/or increasing descriptor basis set.

Source link