Newswise — For years, the field of eXplainable Artificial Intelligence (XAI) has focused on developing tools to interpret trained black-box models. These are techniques such as Local Interpretable Model-agnostic Explains (LIME) and Shapley Additive Explains (SHAP) that generate feature importance explanations. However, a critical gap remains. These explanations are used to understand the model, not to improve it. The idea of using explainability to guide training and improve model quality, closing the loop, is rarely considered. Because of these challenges, there is an urgent need for ways to not only explain AI’s decisions, but also to leverage its explanatory insights to build better, more reliable systems.
Now, researchers from the University of Granada’s Department of Computer Science and Artificial Intelligence and the Andalusian Data Science and Computational Intelligence Institute (DaSCI) have developed a new regularization approach, eXplainable AI – SHIELD (X-SHIELD), which has been published in the journal (DOI: 10.1007/s11633-025-1576-y). Machine intelligence research (June 2026). This technology belongs to a broader family called T-SHIELD (Transformation Selective Hidden Input Evaluation for Learning Dynamics) and represents a concrete step toward what the field calls “Red XAI,” which uses explainability to improve AI from a developer’s perspective.
X-SHIELD’s central innovation lies in the way you choose which features to hide. During training, the model first computes a saliency map. This is essentially a gradient-based measure of the importance of each feature to the model’s decisions. The technique then masks the least important features (such as background pixels in the image) and computes the Kullback-Leibler divergence between the model’s predictions for the original and modified inputs. This divergence term is added to the loss function, effectively penalizing the model if the prediction changes significantly when unimportant features are removed. result? The model learns to focus on what really matters. Experiments across seven benchmark image datasets, including CIFAR-10, CIFAR-100, Fashion-MNIST, EMNIST, Fflowers, Oxford-IIIT Pet, and ImageNet 1K, show that X-SHIELD improves accuracy in 13 out of 14 configurations compared to standard training. Perhaps more importantly, the explanations produced by models trained on X-SHIELD are now significantly more robust and prescriptive. That is, the model better reflected the actual decision-making process and remained stable even when the explanatory method was run multiple times.
“We realized that explanations were being treated as end products rather than tools for improvement,” the authors said. “X-SHIELD changes this by making explainability part of the training loop itself. Forcing the model to learn without the least important features not only makes the model more efficient, but also makes it more honest about how it makes decisions. The model can no longer hide behind irrelevant patterns and must rely on the features that truly matter. And, surprisingly, it also improves the accuracy of the model. This is a win-win, and makes it easier to trust AI We believe we can redefine the way we think about building.”
Its influence extends far beyond academic standards. In high-stakes fields like medical diagnostics, autonomous driving, and financial risk assessment, trusting an AI system is just as important as its actual performance. X-SHIELD provides a practical plug-and-play solution that can be integrated into existing training pipelines with minimal overhead. Although the explainability-guided version increases training time by about 31%, the researchers argue that the cost is justified by the increased transparency and accuracy. Furthermore, the method is model agnostic in the sense that it works with any differentiable architecture, from convolutional neural networks to transformers. As AI transparency regulations tighten globally, tools like X-SHIELD could become essential for developers seeking to meet both performance benchmarks and accountability standards. This makes the black box model a little less black and much more reliable.
###
References
Toi
10.1007/s11633-025-1576-y
Original source URL
https://doi.org/10.1007/s11633-025-1576-y
Funding information
This study was supported by the Spanish Ministry of Science and Technology (No. PID2023-150070NB-I00) and funded by the Spanish Ministry of Science and Technology (MCIN)/Agencia Estatal de Investigación (AEI) (Nos. 10.13039 and 501100011033). Funding for open access publishing: Universidad de Granada/CBUA.
About Machine intelligence research
Machine intelligence research The International Journal of Automation and Computing is published by Springer and sponsored by the Institute of Automation, Chinese Academy of Sciences. The journal publishes high-quality articles on original theoretical and experimental research, targets special issues on emerging topics, and strives to bridge the gap between theoretical research and practical applications.
