A physics-informed machine learning framework for accelerated discovery of single-phase B2 multi-principal element intermetallics

Physics informed machine learning

As illustrated in Fig. S1, given the scarcity of reported B2 MPEIs in most alloy systems, our investigation focused on alloy systems containing refractory elements (e.g. Ti, Zr, Hf) and 3 d transition elements (e.g., Co, Ni, or Fe), which are thought to be able to form B2 structures due to the distinct elemental properties^35,36. Alloy compositions and structures from relevant literatures (e.g., quaternary, quinary and senary systems) and binary/ternary phase diagrams^37,38 were compiled to establish a representative database (see Table S1 for details). The phase or microstructures of all alloys in Table S1 fall into three classes, single-phased B2, multi-phased intermetallics (MPIM) and solid-solution + intermetallic (SS + IM). Compositions were classified as B2 if they crystallized directly into a single-phase B2 structure during casting of melt, while others as MPIM or SS + IM. The final database of three alloy systems is presented in Fig. 1a–c. For instance, 38 compositions in the Fe-Co-Ni-Ti-Zr system were identified as single-phased B2 alloys, while the remaining 358 were categorized as MPIM or SS + IM alloys. This results in a significant data imbalance, with a B2 to non-B2 ratio of 1:9. Comparable imbalance patterns were observed in other alloy systems such as Co-Ni-Ti-Zr (Fig. 1b) and Cu-Co-Ni-Ti-Zr-Hf (Fig. 1c), which poses challenges for ML-based alloy design²⁰.

**Fig. 1: The database structures of selected alloy systems, and distribution of physicochemical properties of candidate B2-forming elements.**

The development and selection of appropriate data descriptors is critical for developing reliable ML models^20,39,40,41. In the ML-assisted design of HEAs with BCC, FCC or amorphous structures, classic parameters such as δ (atomic size mismatch), ΔH_mix (enthalpy of mixing), ΔS_mix (entropy of mixing), Δχ (electronegativity difference), and VEC (valence electron concentration) have been commonly employed as data descriptors^42,43. While these parameters are effective in distinguishing SS and amorphous phases, they exhibit limitations in differentiating MPEIs from other phases^44,45,46. For the rapid identification of single-phased B2 MPEIs, the selection of physically meaningful data descriptors that capture the intrinsic characteristics of the B2 structure is essential. In our previously proposed random sublattice model¹², a B2 structure can be represented as a pseudo-binary system (PBS), characterized by two key parameters: δ_mean, which denotes the average atomic size difference between the two sublattices, and (H/G)_pbs, which quantifies the ordering tendency between sublattices (see Table 1 for details).

Table 1 The 18 data descriptors derived from our random sublattice model¹² employed for ML-assisted design of B2 MPEIs

Building upon our earlier random sublattice model¹², we introduce additional thermodynamic and geometric descriptors to assess the stability of long-range chemical ordering versus random mixing in the two sublattices of a B2 structure. Previous studies^47,48,49 suggest that the stability of long-range chemical ordering correlates positively with mixing enthalpy (ΔH), mixing entropy (ΔS), and differences in atomic radius (r_i), electronegativity (χ_i), and valence electron concentration (VEC_i). Figure 1d–e maps the elemental distributions of H_ij, r_i, χ_i and VEC_i for nine candidate elements (Fe, Co, Ni, Cu, Ti, Zr, Hf). These elements are categorized into two distinct groups: refractory elements (Ti, Zr, Hf) versus 3d transition metals (Fe, Co, Ni, Cu). According to the literature^{6,50,51,52,53}, these elements intend to segregate to form long-range chemical order while mix to form chemical disorder in distinct sublattices in B2 intermetallics because of the enhanced inter-group disparity in physicochemical properties relative to intra-group variations, which may generate thermodynamic driving force to overcome ΔS, thereby stabilizing the chemically ordered B2 structure. As summarized in Table 1, the tendency favoring such a mixed chemical order is quantified by the following descriptors: S_pbs, δ_pbs, ∆H_pbs, σ_VECpbs, σ_χpbs, (H/G)_pbs and (δ_pbs/δ$)$. Theoretical considerations indicate that: (1) high ${S}_{{pbs}}$ and low absolute $\triangle {H}_{{pbs}}$ favor chemical randomness over ordering^12,54. (2) δ_pbs and (δ_pbs/δ) reflect the reduction in atomic size differences due to ordering, where large values indicate a stronger ordering tendency. (3) Similarly, higher σ_VECpbs and σ_χpbs correlate with increased ordering stability. Furthermore, the stability of individual sublattices is evaluated using σ_Hpbs, ∆H_mean, σ_Hmean, δ_mean, and σ_VECmean. Consistent with solid solution models for single phase HEAs^45,55,56,57, large ∆H_mean and δ_mean promote sublattice ordering. However, as noted in refs. ⁶ and ¹², significant variance in VEC_i (σ_VECmean) induces lattice distortion, destabilizing the B2 structure. Additionally, elevated σ_∆Hpbs suggests heterogeneous bonding tendencies, which may drive short-range chemical ordering or elemental segregation^58,59,60, potentially leading to precipitation or phase separation^61,62,63. In addition to the above data descriptors, the classic parameters $\chi$ and σ_χ were also used since electronegativity can aid in distinguishing solid solutions and multi-phase immiscibility^55,64. As a result, 18 physics-informed data descriptors were employed for the ML-assisted design of B2 MPEIs, hereafter referred to as random-sublattice-based descriptors. For comparison, 16 classic descriptors (including δ, ΔH_mix and ΔS_mix, see Table 2 for details) was also employed to train the ML model, which are referred to as random-mixing-based descriptors. Before training the ML models, all descriptors for an alloy system with N compositions were normalized using the following formula:

$${x}_{i,j}=\frac{{x}_{i,j}^{0}-{x}_{imin}^{0}}{{x}_{imax}^{0}-{x}_{imin}^{0}},(i=1,2,\ldots ,18,j=1,2,\ldots ,N)$$

Where x_i,j and ${x}_{i,j}^{0}$ represent the normalized and initial values of the ith descriptor for the jth composition, respectively. Additionally, ${x}_{{imax}}^{0}$ and ${x}_{{imin}}^{0}$ denote the maximum and minimum values of the ith descriptor.

Table 2 The 16 classic data descriptors derived from random mixing commonly used for ML-assisted design of single-phase high entropy alloys^31,102,103

Within ML-assisted alloy design, one-hot encoding (OHE)^19,65 is particularly well-suited for phase/microstructure classification due to its inherent mutual exclusivity and exhaustiveness—key requirements when modeling multi-phase systems, as evidenced in prior studies^66,67,68. As illustrated in Fig. 2a, OHE encodes the three phases in this study as orthogonal vectors (B2: [1,0,0], MPIM: [0,1,0], SS + IM: [0,0,1]), serving as data labels compatible with conventional classifiers like artificial neural network (ANN). However, since the discovery of SS + IM or MPIM phases in the compositional space is not meaningful in the context of this study and falls beyond its primary focus. Therefore, for generative models such as variational autoencoders (VAEs), we developed a streamlined binary OHE scheme optimized for B2 MPEIs discovery. This adaptation consolidates MPIM and SS + IM into a single non-B2 class (encoded as 0), contrasting with the B2 class (encoded as 1), thereby reducing feature dimensionality, and decreases matrix sparsity by 66.7% (from 3D to 1D), thereby improving computational efficiency.

**Fig. 2: The physics-informed ML framework for B2 MPEI discovery.**

To mitigate the issue of data imbalance, the compiled dataset was refined by removing less informative entries, a strategy to enhance model training efficiency as demonstrated in our previous study²⁰. As depicted in Fig. 2b, the normalized dataset was first analyzed using principal component analysis (PCA)⁶⁹ to reveal its underlying structure, followed by K-means clustering⁷⁰ to identify compositionally distinct subgroups for data refinement. The refined dataset was then utilized for two purposes: (1) training an ANN-based classification model (Fig. 2c); and (2) training a generative CVAE model to explore new B2 compositions (Fig. 2d). Candidate B2 compositions generated by the CVAE were subsequently filtered through the ANN classifier, with successful predictions selected for experimental validation (Fig. 2e).

In the case of the dataset for the Fe-Co-Ni-Ti-Zr system, we found that the random-mixing-derived dataset, which used random-mixing-based descriptors, failed to differentiate single-phased B2 alloys from multiphase structures in the PCA plot (Fig. 3a). In the contrast, the random-sublattice-derived dataset, utilizing random-sublattice-based descriptors, enabled a clear separation of B2, MPIM and SS + IM alloys (Fig. 3b). The whole dataset D₀ was divided by K-means into three groups, which are denoted as M1, M2, M3 when using random-mixing-based descriptors, and as S1, S2, S3 when using random-sublattice-based descriptors. All single-phased B2 alloys were clustered into the first group (S1 or M1), whereas a considerable fraction of non-B2 alloys were distributed across the remaining groups (S2 + S3 or M2 + M3). As illustrated in Fig. 3c, d, three training datasets were generated by combining the B2-contained group with the other two groups. Using random-mixing-based descriptors, the B2: non-B2 ratio of D_RM1 increased to 1:5 from the 1:9 of D₀, while D_RM2 and D_RM3 achieved ratios of ~1:7. In comparison, the use of random-sublattice-based descriptors alleviated data imbalance more significantly, with the B2:non-B2 ratio reaching 1:1 in D_RS1, and 1:6 and 1:4 in D_RS2 and D_RS3, respectively. These results indicate that PCA combined with K-means clustering is effective in constructing more balanced datasets, especially when leveraging random-sublattice-based descriptors, whose enhanced phase separability directly contributes to this improvement.

**Fig. 3: The pre-processing of Fe-Co-Ni-Ti-Zr datasets derived from random-mixing-based and random-sublattice-based descriptors.**

As shown in Fig. S2, the training datasets in the Co-Ni-Ti-Zr and Cu-Co-Ni-Ti-Zr-Hf systems were constructed following procedures analogous to those used for the Fe-Co-Ni-Ti-Zr system. Similarly, PCA combined with K-means clustering effectively improves the B2: non-B2 ratios of these datasets, further confirming the superior B2/non-B2 separability of random-sublattice-based descriptors.

As shown in Fig. 4a, for the Fe-Co-Ni-Ti-Zr system, the ANN model trained on random-mixing-derived D₀ dataset effectively identifies MPIM and SS + IM alloys but exhibits poor accuracy in predicting B2 structures, primarily due to severe data imbalance^71,72. In contrast, the model trained with 18 random-sublattice-based descriptors achieves more balanced predictive performance, attaining 100% precision and 97.4% recall for B2 phase identification (Fig. 4b). The confusion matrix of the D_RM1-trained ANN model (Fig. S3a) further confirms that random-mixing-based descriptors necessitate a more balanced dataset for precise discrimination of single-phased B2 structure.

**Fig. 4: The training performance of different datasets in Fe-Co-Ni-Ti-Zr quinary alloy system.**

To assess the model stability, the ANN was subsequently trained 400 times independently for each dataset (i.e., D₀, D_RM1, D_RM2, D_RM3, D_RS1, D_RS2, and D_RS3). Figure 4c reveals that random-sublattice-derived D₀ dataset, along with its subgroups (D_RS1, D_RS2, and D_RS3) maintain high and stable F-measure values (0.97 ± 0.03) regardless of data imbalance, whereas the random-mixing-derived datasets show markedly lower performance, with only D_RM1 achieving moderate accuracy. To confirm that this difference arises solely from the descriptor type and not from differences in dataset composition, we repeated the training after exchanging the descriptors between the two groups of datasets. As shown in Fig. S4, the random-sublattice-based descriptors still yield superior and robust training performance, confirming that descriptor type is the dominant factor.

Like the ANN model, the CVAE model trained on random-sublattice-derived D₀ dataset, achieves clear separation between B2 and non-B2 alloys in latent space (Fig. 5a), without clustering preprocessing as needed for random-mixing-derived dataset (Fig. S3b). Compositional analysis of 10000 generated alloys (5000 B2 + 5000 non-B2) reveals distinct elemental partitioning: B2 alloys exhibit near-zero concentration differences between two element groups shown in Fig. 1d, e (D_com), while non-B2 alloys display D_com values up to 20 at.% (Fig. 5b). Here D_com is calculated as:

$${D}_{com}=|({C}_{Fe}+{C}_{Co}+{C}_{Ni})-({C}_{Ti}+{C}_{Zr})|$$

**Fig. 5: The generative machine learning targeted to the design of single-phase B2 MPEIs in Fe-Co-Ni-Ti-Zr system.**

The validation of physics-informed ANN model shows that 90% of generated B2 alloys possess a P_B2 >90%, with non-B2 alloys predominantly forming MPIM/SS + IM structures (Fig. 5c). As illustrated in Figs. S5–S7, similar training results in Co-Ni-Ti-Zr and Cu-Co-Ni-Ti-Zr-Hf systems further demonstrate that our random-sublattice-based descriptors offer superior performance in both predictive accuracy and computational efficiency for targeted B2 alloy composition separation and generation, compared to classic random-mixing-based descriptors. All the new B2 MPEIs discussed below were generated and verified using the ML models trained on the whole datasets with random-sublattice-based descriptors.

Experimental validation

As shown in Fig. 6a–c, three potential B2 Fe-Co-Ni-Ti-Zr MPEIs with predicted P_B2 values exceeding 0.95 were selected from CVAE-generated alloys. Notably, most of potential B2 alloys in the pseudo-ternary compositional diagrams exhibit D_com values clustered around zero (i.e. line compounds), while alloys with multiphase structures typically show larger D_com values. With increasing Fe and Zr concentrations, the upper limit of the D_com for B2 phase formation decreases from ~10 at.% to near zero.

The XRD patterns of as-cast MPEIs (Fig. 6d) reveal a single-phase BCC/B2 crystalline structure, further confirmed by scanning electron SEM images (Fig. 6e) showing typical dendritic structures with noticeable segregation. After homogenization, the XRD patterns reveal sharper B2 superlattice peaks (Fig. 6d), and the dendritic segregation is effectively eliminated (Fig. S8). Bright-field (BF), high-resolution TEM (HRTEM) and selected area electron diffraction (SAED) images (Fig. 6f) display distinct superlattice characteristics without evidence of antiphase domain boundaries or phase boundaries, confirming the formation of a single-phased B2 structure rather than a BCC + B2 dual-phase^{9,73,74,75,76,77,78,79}.

Additionally, two alloys with low P_B2 values were synthesized to assess the framework’s capability in predicting non-B2 structures. As illustrated in Fig. S9, both alloys formed MPIM structures, further demonstrating the predictive power of our ML framework for non-B2 alloys. These results provide strong evidence for the effectiveness of our physics-informed ML framework in generating novel single-phased B2 MPEIs within the Fe-Co-Ni-Ti-Zr alloy system.

In the Co-Ni-Ti-Zr system, the Co_19.9Ni_30.5Ti_38.1Zr_11.5 alloy, with a predicted P_B2 value exceeding 0.95, was selected for experimental evaluation. The distribution of P_B2 values across the compositional space, as depicted in Fig. 7a, demonstrates that indicates that the probability of B2 formation has the correlations with D_com and C_Zr which is similar to the trends observed in the Fe-Co-Ni-Ti-Zr system. The microstructure of this alloy, shown in Fig. 7c, d, indicates the characteristics of single-phased B2 structure and closely aligns with the ML model’s prediction.

**Fig. 7: The experimental validation for CVAE-generated B2 compositions in Co-Ni-Ti-Zr quaternary alloy system.**

The alloy Cu_4.7Co_9.2Ni_36.4Ti_38.7Zr_7.4Hf_3.5 was selected based on its P_B2 value exceeding 0.99, as predicted by the ANN model. Notably, in the Cu-Co-Ni-Ti-Zr-Hf system, if the D_com value remains constant (~0.3 at.%) near zero, the P_B2 value of alloy compositions declines sharply when Cu content exceeds ~25 at. % (Fig. 8a). XRD patterns and SEM images of both as-cast and as-homogenized alloys, shown in Fig. 8b, c, confirm the presence of B2 structure, while TEM characterization (Fig. 8d) provides further evidence supporting the single-phased and long-range chemical ordering nature of the material.

**Fig. 8: The experimental validation for CVAE-generated B2 compositions in Cu-Co-Ni-Ti-Zr-Hf senary alloy system.**

Beyond the above three alloy systems, the physics-informed ML framework also demonstrated success in discovering new single-phased B2 MPEIs in other alloy systems. For instance, experimental validations were successfully performed in the Fe-Ni-Ti-Zr and Co-Ni-Ti-Zr-Hf systems (Fig. S10a, b), confirming the effectiveness of our approach.

Furthermore, as depicted in Fig. S10c–e, the ML framework was extended to an even more complex octonary system, Cr-Fe-Co-Ni-Cu-Ti-Zr-Hf, which includes 871 alloy compositions and introduces Cr as an additional element not previously covered in Fig. 1. In this high-dimensional space, the ML models maintained strong predictive performance and successfully identified new potential B2 compositions, further validating its scalability, robustness, and generalizability.

Source link

Binance推荐码 commented on BITS Pilani unveils ‘Rakesh Kapoor Innovation Centre’; aims to revolutionise future of education: Thanks for sharing. I read many of your blog posts
b"asta binance h"anvisningskod commented on IP Basics: Copyright Law (Podcast) – Copyright: I don't think the title of your article matches th
binance konto commented on AI And The Channel: It’s Go Time: Thanks for sharing. I read many of your blog posts
小艾彩票平台 commented on Create the content you envision: Hello, for all time i used to check blog posts her
天天官网 commented on 10 AI Applications to Streamline Business and Customer Experiences: After looking into a few of the blog posts on your

A physics-informed machine learning framework for accelerated discovery of single-phase B2 multi-principal element intermetallics

Physics informed machine learning

Experimental validation

Leave a Reply

RECENT POSTS

Google OpenRL is an experimental self-hosted API for fine-tuning after LLM training

AI is reshaping healthcare. Are you keeping up?

Labor’s tax reform critic deletes anti-immigration AI video reposted from right-wing nationalist account | Australian Politics

Physics informed machine learning

Experimental validation

Related Posts

Leave a Reply