Enabling AI in Application Security

AI is undoubtedly a hot topic right now, with a lot of people dumping information, opinions, or even just parroting it out. Frank Catucci, his CTO and head of security research at Invicti, spoke with Mike Shema on episode 234 of the Application Security Weekly cybersecurity podcast to discuss current and near future applications. We talked about what it meant. Watch the full video below and read on to get an overview of the AI currently relevant to application security and learn about the latest in hallucinogenic squatting.

Faster, Easier to Use, Riskier

For all the hype about Large Language Models (LLM) and generative AI in recent months, the underlying technology has been around for years and a comparison that has made AI more accessible and useful. A small adjustment brought about a tipping point. While there are no fundamental changes on the technical side, the big realization is that AI is here to stay and is set to evolve even more rapidly. So we need to really understand AI and consider all the implications and use cases.In fact, industry leaders recently signed open letter It calls for a six-month moratorium on developing models stronger than GPT-4 until the risks are better understood.

As AI continues to evolve and is used more frequently and in more areas, considerations such as responsible use, privacy, and security are critically important when understanding risks and planning ahead. will be After the fact.Hardly a day goes by without another controversy related to ChatGPT data privacy, regardless of bot Leakage of user information or there Feed Your Own Data in Queries There are no clear instructions as to how that information is processed and who sees it. These concerns are exacerbated by the growing awareness that bots are being trained on publicly accessible web data, so even with a lot of management effort, it’s hard to be sure what will be revealed. you can’t.

Attacks on bots: rapid injection and more

Conversational AI, such as ChatGPT, uses the user-entered prompts as the main input to the application. In cybersecurity, when we look at “inputs” we think of them as “attack surfaces”. Not surprisingly, prompt injection attacks are the hottest area of security research. There are at least two main directions to explore. Creating prompts that extract data that bots shouldn’t expose, and applying existing injection attacks to AI prompts.

The first area is to bypass or change the guardrails and rules defined by conversational AI developers and administrators. In this context, prompt injection is crafting a query that causes your bot to behave in an unintended way.Invicti’s own Sven Morgenroth created a dedicated rapid fire playground To test and develop such rapid injection attacks in controlled conditions in an isolated environment.

The second type of prompt injection treats the prompt like any other user input to inject the attack payload. If your application doesn’t sanitize AI prompts before processing them, it can be vulnerable. Cross-site scripting (XSS) and other well-known attacks. Input sanitization is especially difficult, given that ChatGPT is often asked about (and wanting) application code. If successful, such an attack is much more dangerous than encouraging the extraction of sensitive data, as it can compromise the system the bot is running on.

Many caveats in AI-generated application code

AI-generated code is a whole other can of worms. Tools like GitHub Copilot can now create entire blocks of code, not just autocomplete, saving developers time and effort. One of his many caveats is security, and according to Invicti’s own research, Unsafe co-pilot suggestions The generated code often shows that it cannot be implemented as-is without exposing serious vulnerabilities. This results in regular security testing using tools such as: dust and SAST More importantly, it’s very likely that such code will end up in your project sooner or later.

Again, this is not an entirely new risk. Pasting and adapting code snippets from Stack Overflow and similar sites has been a common part of development for years. The differences are speed, ease of use, and scale of AI proposals. If you find a snippet somewhere online, you should understand it and modify it for your particular situation. Usually only a few lines of code work. But with AI-generated suggestions, you might get hundreds of lines of code that seem to work (at least on the surface), making it much harder to get used to what you get, and many , this is no longer necessary. The pressure to use that code exists and is only increasing because it can be significantly more efficient.

Vulnerabilities are just one of the risks associated with machine-generated code, and probably not the most impactful. In 2022, there will be a new focus on securing and managing the software supply his chain, so some of their own code may actually come from AI trained on someone else’s code. The realization that there is will be a cold shower for many. What happens to license compliance if a commercial project is found to contain AI-generated code identical to an open source library? Does that require attribution?Or open source your own library? If the code is machine generated, is it copyrighted? Do you want a separate software bill of materials (SBOM) detailing AI generated code? existing tools and processes for Software Composition Analysis (SCA) Even if you check license compliance, you may not be prepared to address all of them.

Hallucinatory crouch is (or will be)

Everyone continues to experiment with ChatGPT, but at Invicti we’re always on the lookout for anomalous and exploitable behavior. During the discussion, Frank Catucci tells an interesting story that illustrates this. One of him on our team was looking for an existing Python library to do his very specific JSON manipulations, so we decided to ask ChatGPT instead of the search engine. Bot very helpfully suggested three of his libraries that seemed perfect for the job, but none of them actually existed and were all invented by his AI (or Mike Until it turns out to be a hallucination, as Shema puts it.

Researchers believed that if a bot recommended a library that didn’t exist, others were more likely to receive the same recommendation and go looking for it. To confirm this, they took one of the forged library names, created a real open source project with that name (no code in it), and monitored the repository. Sure enough, the project had several visits within a few days, suggesting a future risk of AI suggestions directing users to malicious code. By analogy to typosquatting, where malicious sites are set up under domains corresponding to false domain names of high-traffic sites, this might be called psychedelic squatting. This is intentionally creating an open source project to mimic a non-existent package suggested by AI.

And if you think it’s just a curiosity with a funny name (it is), imagine a Copilot or similar code generator actually importing such a psychedelic library into its code suggestions. Your code won’t work if the library doesn’t exist. However, if malicious actors are abusing that name, they may be unknowingly importing malicious code into their business applications.

Using AI/ML in Application Security Products

While many companies have jumped on the AI bandwagon in recent months, Invicti has been using more traditional and predictive machine learning (ML) techniques to improve its products and processes internally. As Frank Catucci said, we regularly analyze anonymized data from millions of scans on the cloud on his platform to understand how customers use our products. Learn where you can improve performance and accuracy. One way AI/ML can be used to improve user outcomes is to help prioritize vulnerability reports, especially in large environments.

In an enterprise setting, some customers routinely scan thousands of endpoints – websites, applications, services, APIs – all add up to a staggering number. We use machine learning to consider multiple aspects, such as page structure and content, as well as identified technologies and components, to help users decide which of these assets should be prioritized based on their risk profile. I suggest to This type of assistant saves a lot of time when going through thousands of issues that need to be triaged and addressed across all her web environments. In refining this model, there was a case where, starting with about 6000 problems, he was able to select the most important 200 or so problems with about 85% confidence, which made the process much more manageable. It’s easier. for users.

Accurate AI starts with input from human experts

Any attempt to accurately assess real-world risk must begin with training data from human experts. Because the AI is only as good as its training set.Some Invicti security researchers, etc. Bogdan Karin, being an active bounty hunter, in refining this risk assessment function, we are associating specific vulnerability weights with what we see in bounty programs. This also helps narrow down the actual impact of the vulnerability in context. As Frank Catucci mentioned, much of that work is really filtering out valid warnings about obsolete or known vulnerable components that are not high risk in context. For example, if a particular page doesn’t accept a lot of user input, using an older version of jQuery isn’t a priority, so the results can be moved further down the list.

But will there come a time when AI can take over some or all of security testing from penetration testers and security engineers? Far from it, the new search and code generation capabilities are definitely being used by testers, researchers, and attackers. Getting answers like “code a bypass for this or that web application firewall” or “find an exploit for product and version XYZ” saves a lot of time compared to trial and error and traditional web searching You can, but it’s still essentially a manual process.

Known Risks and Features – Amplification

While the current hype cycle may suggest that Skynet is just around the corner, the reality is that the explosion of AI will amplify existing security risks and give another twist. just add. The key to getting the most out of available AI technology (and avoiding the worst) is to truly understand what it can and can’t do, or be tricked into doing. And in the end, they are nothing more than computer programs written by humans and trained by humans on huge human-generated datasets.

Source link