Comparison of the experimental and predicted values
The comparative analysis of the four ensemble models, CatBoost, XGBoost, GBM, and AdaBoost, revealed significant differences in their predictive capabilities for water absorption (WA) and bending strength (BS). As summarized in Table 2, CatBoost demonstrated superior performance, achieving an R2 of 0.9662 for WA and 0.9577 for BS, indicating that it explains over 95% of the variance in both target variables. The model’s low RMSE values (0.4549 for WA and 1.8661 for BS) further confirmed its precision, with minimal deviation between predicted and experimental values. XGBoost closely followed, with R2 values of 0.9613 (WA) and 0.9576 (BS), suggesting that its tree-pruning and regularization techniques also effectively captured the complex relationships in the dataset47. While slightly less accurate (R2 ~ 0.95), the GBM model still provided robust predictions, reinforcing the utility of gradient boosting for the fired ceramic properties modeling. In contrast, AdaBoost exhibited substantially higher errors, particularly for BS (RMSE: 3.4546), likely due to its reliance on weak learners and limited capacity to model high-order feature interactions. Figures 3, 4, 5, 6, 7, 8, 9 and 10 illustrate the highlighted findings. The scatter plots for CatBoost (Figs. 3 and 7) display a near-perfect alignment between predicted and experimental values, with data points tightly clustered along the unity line. This consistency held across the full range of WA (0.10–12.14%) nd BS (2.11–53.60 MPa), demonstrating the model’s reliability even for extreme compositions. XGBoost (Figs. 4 and 8) displayed comparable patterns, though with slightly greater variability at higher WA values (> 8%),potentially due to its approach to sparse data. GBM (Figs. 5 and 9) tended to underestimate BS in the mid-range (20–35 MPa), indicating potential improvements through hyperparameter optimization. On the other hand, AdaBoost (Figs. 6 and 10) consistently overpredicted BS for high-strength samples (> 40 MPa), exposing its limitations in capturing nonlinear sintering effects. These findings highlight the critical role of model selection, with CatBoost and XGBoost standing out as top contenders for the future industrial use cases demanding precise accuracy.








Partial dependence plot for the best model for BS prediction
The partial dependence plot (PDP) for CatBoost (Fig. 11) provides critical insights into the relationship between firing temperature and BS, independent of other variables. The plot reveals a nonlinear, sigmoidal trend: BS increases gradually from 1000 to 1150 °C, then rises sharply until 1200 °C, after which the improvements diminish. This behavior aligns with known ceramic sintering dynamics, where temperatures below 1150 °C are often insufficient for complete densification, while those above 1200 °C yield diminishing returns due to overfiring risks (e.g., excessive glass phase formation)30,48. The inflection point at 1180 °C is particularly noteworthy, as it marks an optimal thermal threshold for maximizing strength without incurring excess energy use.
Furthermore, the PDP highlights the interplay between the firing temperature and chemical composition. For instance, samples with high Al₂O₃ (> 24%) showed steeper BS gains at lower temperatures (1100–1175 °C), likely due to accelerated mullite crystallization, which begins at 1000 °C (primary mullite), but its quantity further increases with temperature49. Conversely, SiO2-rich formulations (> 65%) equired higher temperatures (1225–1300 °C) to achieve comparable strength, reflecting their reliance on glass-phase sintering. These findings provide a basis for manufacturers to tailor firing schedules to raw material compositions, thereby reducing energy demand while ensuring consistent product quality.

Partial dependence plots for BS using CatBoost.
Sensitivity analysis for the best model – CatBoost
The sensitivity analysis provided profound insights into how compositional and processing variables influence the key ceramic properties of water absorption and bending strength. Figures 12 and 13 present the sensitivity analysis for WA using CatBoost and BS using CatBoost, respectively.
For the water absorption parameter, clay mineral content proved the most influential factor, contributing approximately 40% to the model’s predictive accuracy. This dominant role reflects the fundamental importance of clay-sized particles and mineralogy in determining the pore structure evolution during firing50. The analysis identified a pronounced nonlinear pattern, showing that samples with more than 50% cay particle content experienced a sharp exponential decline in water absorption. This highlights how fine clay particles efficiently occupy interparticle spaces during the sintering process.
Silicon dioxide concentration showed the second strongest influence, contributing to about 30% of the water absorption predictions. The sensitivity analysis uncovered a critical compositional threshold at 62% SiO2, beyond which water absorption values dropped sharply. This phenomenon correlates well with ceramic engineering principles, where sufficient silica content promotes extensive glass phase formation, effectively sealing surface porosity. Conversely, an excessive proportion of SiO2 indicates a scarcity of clay minerals and the use of unsuitable clays for ceramic tile production. The remaining 30% of predictive influence was distributed among various minor components, with iron oxide and alkali metals showing modest but measurable effects through their fluxing actions during firing51.
For BS, the sensitivity analysis painted a different but equally informative picture. Firing temperature dominated the predictions with a 35% contribution, displaying a characteristic sigmoidal relationship where strength gains accelerated between 1150 and 1200 °C before plateauing at higher temperatures, which was seen previously. Al2O3 content followed closely in importance at 25%, with the analysis revealing two optimal concentration ranges near 20% and 25% that correspond to different stages of mullite crystallization49. The influence of magnesium oxide, accounting for about 5% o the predictions, may be attributed to its role in forming high-temperature spinel phases that reinforce the ceramic matrix or its act as a flux.

Sensitivity analysis for WA using CatBoost.

Sensitivity analysis for BS using CatBoost.
Taylor diagram for standard deviation and correlation
The Taylor diagrams (Figs. 14 and 15) provide a holistic assessment of model performance by comparing simulated and observed variability52. For WA (Fig. 14), CatBoost and XGBoost nearly overlapped at the ideal point (correlation: 0.98–0.99; standard deviation ratio: 0.97–1.02), confirming their ability to replicate both the magnitude and distribution of experimental data. GBM showed a slight overestimation of variability (ratio: 1.08), while AdaBoost’s low correlation (0.85) and inflated standard deviation (1.35) revealed systematic prediction errors.
For BS (Fig. 15), the results were even more pronounced. CatBoost achieved near-perfect alignment (correlation: 0.99; ratio: 1.01), whereas AdaBoost’s poor fit (correlation: 0.82; ratio: 1.42) underscored its unsuitability for strength prediction.

Taylor diagram for WA standard deviation.

Taylor diagram for BS standard deviation.
SHAP feature importance
The SHAP analyses (Figs. 16, 17, 18 and 19) serve as a powerful interpretability tool, translating the CatBoost model’s complex, high-dimensional outputs into chemically and technologically meaningful insights. By attributing predictive importance to individual features, these analyses provide a transparent understanding of how input variables influence key ceramic properties, thereby fostering trust in machine learning predictions while deepening our comprehension of composition–property relationships in ceramic materials. Figure 16 illustrates the dominant influence of firing temperature on bending strength. The SHAP values increase linearly up to 1200 °C, beyond which they plateau, a trend that closely aligns with the partial dependence plot and confirms the existence of an optimal thermal processing window. This behavior supports the empirical understanding that thermal activation up to a certain threshold enhances densification and phase development, beyond which further heating yields diminishing returns. The observed plateau thus signifies not only the thermodynamic limit of strength gains but also points to a practical energy-efficiency boundary in industrial processing30.
In Fig. 17, the SHAP value distribution for aluminium oxide reveals a more intricated response, with prominent peaks at 20 and 25% concentrations. These concentrations correspond to critical transitions in the firing process: the onset of kaolinite decomposition around 20%, and the formation of needle-like mullite crystals near 25%, which are well-documented contributions to mechanical reinforcement in ceramics2,53. This dual-peak behavior substantiates the relevance of Al2O3 as a structural optimizer and quantitatively supports its role as a key lever in tuning mechanical performance21, validating long-standing ceramic knowledge with data-driven evidence.
Turning to water absorption, Fig. 18 underscores the critical role of silicon dioxide content. Below 60%, SHAP values remain near-neutral, but a sharp negative trend emerges above 62%, indicating that higher SiO2 concentrations significantly reduce water absorption. This threshold effect is associated with enhanced glass-phase formation and pore closure during sintering, both of which are critical for producing low-porosity ceramic bodies54. Notably, this finding emphasizes the delicate balance required: while adequate silica promotes vitrification, excess levels can dilute structural components or introduce defects if not properly controlled-offering a refined target for silica optimization. Figure 19 complements this by revealing the significant contribution of clay minerals, particularly the dominance of illite-rich raw materials21. Samples with clay mineral contents exceeding 55%, partcularly those with a high illite fraction, show strong negative SHAP values for water absorption, indicating a superior ability to limit porosity. This effect may be attributed to illite`s moderate plasticity and its high fluxing behavior2. However, it is important to acknowledge that the model`s conclusions are shaped by the composition of the underlying database, which dominantly features illitic clays. Future expansions of the dataset to include more samples containing more kaolinite, smectite, or mixed-layer clays would help test the generalizability of these trends.
Beyond individual variable effects, SHAP interaction values offer deeper into synergistic or antagonistic relationships between features. For example, Fig. 16 reveals that the positive impact of high firing temperatures on strength is enhanced in samples with elevated Al2O3 content55, suggesting that thermal and compositional optimization must be co-considered for maximum performance. Similarly, in Fig. 18, the sharp SiO2 threshold effect becomes more pronounced in raw clays with finer particle size distributions, indicating that granulometry modulates vitrification efficiency and pore sealing. The strength-enhancing effect of firing temperature is significantly amplified when Al₂O₃ content exceeds 23%, suggesting that thermal activation and mullite crystallization work in tandem to boost mechanical performance. Such multidimensional interactions are challenging to detect through conventional experimental approaches, but emerge clearly through SHAP visualizations. For instance, the data support prioritizing silica control for floor tile applications where water resistance is paramount, while simultaneously guiding structural ceramic producers to focus on Al2O3 adjustments within the identified optimal ranges.
These findings have significant implications for ceramic process engineering. The ability to identify precise compositional thresholds and synergistic effects enables the development of more targeted, high-performance formulations. For instance, producers of moisture-resistant floor tiles can focus on fine-tuning silica levels while maintaining optimal granulometry, whereas refractory ceramic applications may benefit more from Al2O3 optimization within the identified effective ranges. The interpretability afforded by SHAP thus not only validates the model but also enhances formulation strategies in ways that align with both scientific principles and manufacturing constraints.
In conclusion, the SHAP analysis achieves two critical outcomes. First, it confirms that the machine learning model’s predictions are grounded in established ceramic science, reinforcing confidence in its applicability. Second, it expands the domain of ceramic knowledge by uncovering nonlinearities and interaction effects that are difficult, if not impossible, to isolate through traditional experimentation. This integrative approach bridges the gap between advanced data analytics and practical ceramic engineering, paving the way for the widespread adoption of machine learning in industrial quality control, material selection, and process optimization. Crucially, the ability to trace model decisions back to tangible material characteristics marks a substantial advancement toward explainable, data-driven ceramic design.

SHAP analysis for BS using CatBoost.

SHAP feature importance analysis for BS using CatBoost.

SHAP analysis for WA using CatBoost.

SHAP feature importance for WA using CatBoost.
