Will AI replace application security testing?

Will AI replace application security testing? I’ve had this same conversation almost every day since Anthropic released Claude Mythos in April. With customers, prospects, and our field teams. “Too Dangerous to Release” This model was limited to a few organizations through Project Glasswing, but it sparked a wave of questions that hit the heart of the industry.

In April, I wrote about how to build AppSec in the age of AI (AI didn’t destroy application security, revealing the next evolution). The post featured a landscape. These are the answers to three questions I get in almost every conversation. The answer comes from actual testing, not guesswork.

Question 1: Will AI replace SAST?

Probably not right away, and definitely not by itself. There are two reasons.

First, the model is not a security solution. LLM is a building block, more like a database engine than an application. Turning raw model intelligence into something that AppSec teams can trust requires a lot of things, including harnesses that ensure the entire codebase is actually analyzed (coding agents are surprisingly bad at this themselves), reproducible results, results reported in a structured format rather than collected from chat output, deduplication to what existing tools already discover, and a platform for portfolio management, reporting, and CI/CD integration. If you ask your coding agent to “find security issues in your project,” you’ll find several. No program is provided.

Second, economics. Token prices are rising as providers correct years of below-cost pricing. Agent Code Analysis’s proprietary testing costs hundreds of tokens for premium models to scan real-world business applications. Mythos pricing is open at $25 per million input tokens and $125 per million output tokens, which is five times the cost of Claude Opus and clears $1,000 per scan. It’s fine as an occasional in-depth review. I would pay more for a pen test. It cannot be done every day, across all applications, and on every commit.

So a realistic pattern would be: Traditional SAST remains the everyday first line of defense: fast, decisive, and cheap enough to run all the time. AI-powered agent analysis will become a regular deep sweep performed weekly or monthly, just as today’s penetration testing complements DAST. We call this hybrid analysis, and we think this is where the overall market will land.

What is interesting is that agent analysis reveals that SAST cannot detect this. Two examples stand out in our testing. One is an architectural flaw, such as an authentication design that stores usernames in unsigned cookies without a server-side session. Any attacker can forge that cookie. Dataflow rules won’t catch this because the problem is in the design, not the flow. (I once discovered this exact flaw in a real banking application during a penetration test. It happens in real life, and yes, it was a long time ago.) Another is persistent cross-site scripting. This involves two separate data flows: user input to the database and database content to the page. Traditional SAST handles reflected XSS well because it is single flow. In the two-flow version, AI’s broader code understanding adds real value.

The added value is real. That’s not hype. However, it is a complement to SAST, not a replacement for it.

Question 2: I can’t access Mythos. Are we exposed?

Less than the headline suggests. Here are the findings in our research that surprised people the most: Deep agent analysis does not require Mythos class models. In our prototype tests, widely available models handle core detection well, and premium models scope and validate results. The same is true for public benchmarks. Mythos represents an incremental improvement over the best commonly available models, not a fundamental leap forward.

What customers actually need, the ability to perform deep AI-powered reviews of their own code within their existing AppSec programs, is now possible in a universally accessible model. That’s exactly what we’re building into the Fortify portfolio. Agent analytics capabilities, currently in development and expected to be available later this year, will connect to SSC and Fortify on Demand, report only findings missed by existing scans, and work with a wide range of LLMs. Once you have access to Mythos, you can use it. No need to wait for it.

And many of Fortify’s AI capabilities are currently shipping and operational. Fortify Remediation Aviator uses AI to audit SAST findings and suggest remediation. OpenText is powered by OpenText’s own development organization with 7,000 developers and over 2,000 applications, with over 1 million issues audited to date. We are shipping OWASP LLM Top 10 detection rules starting in late 2023. Additionally, our free, open-source MCP server and agent skills embed Fortify directly within the coding agents where developers currently reside.

Question 3: What if the attacker has Mythos?

We assume they will, whether it’s Mythos itself or an equivalent feature. Realistic attack patterns are clear. Open source code is a natural target precisely because it is exposed to analysis. AI can perform zero-day hunts at scale and reverse engineer newly released security patches into working exploits. Neither technique is new. What changes is speed and scale, and that changes the game.

However, defense is not a rare new technology. This is an established application security practice and is executed with more discipline and urgency. Software configuration analysis and proactive remediation of impactful results. This is because this is where AI attacks will land first. We also test all code before shipping. This means that developers use AI to write code. Independent benchmarks such as BaxBench show that even top models given explicit security instructions still produce a significant proportion of insecure implementations, with recent results producing approximately one in five functionally correct solutions. SAST analyzes code generated by AI just like it analyzes code written by humans. This practice is now more relevant, not less.

That’s the conclusion I keep coming back to. The current AI threat landscape does not make application security obsolete. That makes it more important than ever. AI simultaneously creates truly new AppSec risks and truly new AppSec capabilities. Winning organizations will be those that build both into one program. This means combining the efficiency and determinism of traditional analytics with the depth of agent techniques on a single platform.

If you are grappling with these issues within your organization, please reach out to us. I have these conversations every day, and I’m happy to have them with you.

Source link