The AI ​​made a mistake with high confidence. Well, what is it?

Machine Learning


In this Help Net Security interview, Christian Debes, Head of Data Analytics and AI at SPRYFOX, talks about the growing gap between what AI models do and what operators can explain. He argues that this gap is already problematic, especially when decisions affect people or money and no one can explain why a model produced a certain output.

Debes explains how responsible teams approach wrong answers with confidence, why procurement leaders are held accountable when AI systems fail, and what explainability means as a translation layer between technical teams and operators. He also touches on EU AI law and the risk it will create a compliance front, and concludes with a candid assessment of where AI infrastructure will go if explainability cannot keep up with model complexity.

AI Accountability Accountability

There is a growing gap between what a model does and how operators can clearly explain why that model does what it does. At what point does that gap become a drawback rather than an acceptable engineering trade-off?

I think that point has already been reached. We may not always realize it right away, but in many cases it is already a liability. It has always been difficult to fully explain machine learning models (even relatively simple models). However, the scale of the model has changed. We have moved from models where we could at least explore the importance of features or track the path of decision-making to systems where even those who built the models could only provide approximate information about why a particular output was produced.

The moment this becomes a real responsibility rather than just an acceptable engineering trade-off is when decisions based on these outputs impact people and money, and no one in the room can answer the question, “Why did I say that?” Think about the risks if the reasoning behind a credit decision, fraud flag, or medical recommendation is not understood and cannot be challenged.

This may have happened because the model performed well on individual benchmarks and the team was satisfied and did not investigate further. Traditional monitoring of machine learning solutions focuses on drift within the model (usually caused by data drift) and hard performance numbers. They rarely focus on measuring explainability.

Of course, most of the time we don’t even know that there is such a gap between what the model does and what the operator can articulate. If the model is right, everything looks fine. Liability only comes to light when something goes wrong, and you have to explain to regulators and courts why you introduced a problem that you couldn’t explain at the time.

If the transformer model yields a wrong answer with a high degree of confidence, and no one on the team can reproduce why, what would a responsible engineering team do next? What would most teams do?

A responsible team will treat this as a major incident, rather than an unfortunate, rare data point that doesn’t negatively impact the overall aggregate numbers. You stop and investigate. Experienced data scientists have many tools for this. Our approach is to first ask, “Is this a training problem or an inference problem?” Next, examine similar inputs to see if the failure is systematic or isolated. Third, we try to understand the confidence adjustment.

When a model is confidently wrong, it often indicates that something is fundamentally wrong with the way it learns. Finally, I will discuss explainability. With classic machine learning methods, you can trace back to the data point that caused this incorrect but confident decision (SHAP and LIME are examples). Modern LLM systems have methods such as machine interpretability, which, although technically very different from classical methods, answer a similar question: which tokens in the input text led to this decision.

Unfortunately, what most teams do is log it, maybe add it to a test set, and move on. Because the model works 98% of the time and there is pressure to ship, iterate, and deliver the next feature. In my experience, confidently debugging wrong answers is an in-depth detective work that can take a long time to do properly. Many organizations don’t budget for it. They spend their budget building new features without a deep understanding of why existing models sometimes fail.

When you have a problem, a reliable wrong prediction can tell you more about your model than all 1,000 correct predictions. These indicate the limits of the model, not a happy path forward.

Procurement teams and executives often rely on vendor assurances to greenlight AI systems they don’t understand. How responsible are they when these systems fail? How is accountability considered?

I see this all the time and have some sympathy for both positions. Executives must make purchasing decisions about technology that is advancing faster than anyone can reasonably follow, and they often rely on vendor guarantees because there are no other alternatives. They do not have the in-house expertise to technically evaluate these systems (nor do they have the capacity to monitor this rapidly changing area).

That said, “I trusted the vendor” is never a good defense when something goes wrong, and that doesn’t apply to AI either. If you procure a system that makes consequential decisions and you can’t explain (even at a high level) how it works, what data it was trained on, and what its known limitations are, that’s just a governance failure.

Explainability plays a very important role here. I always think of explainability not as a common language, but as a layer of translation. This allows technical aspects to be translated into the domain language, thus acting as a bridge between the team that built the system and the business that operates it.

If a vendor can’t explain and document how their model reaches decisions to procurement teams and executives in language that even non-experts can understand, that’s an immediate red flag. It is not necessary for buyers to understand the transformer architecture, but vendors who cannot explain their systems easily and clearly may not fully understand themselves either. What’s worse, they know it and choose not to be transparent about their restrictions. This explainability is more difficult these days with larger models, but good vendors make that kind of investment.

EU AI law establishes binding transparency obligations for high-risk systems. Is the industry technically ready to meet these requirements, or is it moving towards a broader compliance realm?

More directly, for many organizations, this begins as compliance theater. EU AI law has requirements such as transparency around training data, documentation of model limitations, and human oversight mechanisms for high-risk systems. These are reasonable requirements. But addressing them properly requires a level of ML engineering discipline and governance that many companies (including large ones) have yet to build.

What I expect is a wave of documentation that looks thorough on paper but doesn’t actually help in understanding or auditing the system. The easiest way is to put a compliance wrapper around your operational black box and check a few boxes. This is expected to be a significant number.

Companies that do this well are those that have already invested in understanding their model before regulation forces them to do so. Building models with good documentation, proper experiment tracking, test cases, meaningful evaluation beyond accuracy metrics, and auditing in mind. These are practices that great ML teams have followed for years. The AI ​​Act doesn’t invent anything new, it just mandates what good ML teams should be doing anyway. And maybe that’s actually a good thing because it differentiates teams that follow good practices from those that cut corners.

If explainability is not resolved at the rate of current model complexity, what will the state of AI be like in 10 years? Are we building critical infrastructure on a foundation that cannot be audited?

The honest answer is, “At this rate, explainability cannot keep up with the complexity of the model.” In 10 years, there will be many systems that cannot be audited because they are built on larger, more sophisticated, and more opaque models. But that doesn’t mean you should stop building, it just means you have to be more disciplined about how you build.

However, it is a choice whether or not critical infrastructure is put at risk. ML engineers must choose whether to document. Those responsible for infrastructure (including buyers of AI software) may have to choose between short-term financial gains and investing in robust systems. Regulators must choose where to draw the line to protect critical systems without negatively impacting innovation.



Source link