How to mitigate security risks in AI-generated code

Generative AI is rapidly becoming more prevalent, permeating and automating countless tasks and processes across all types of businesses. Software developers are rapidly adopting AI-generated code, but it brings with it unique considerations, including AI-specific security risks. As companies consider how to address these issues, one thing is clear: existing application security tools are not suited to the unique complexities that AI code introduces. Similar to the early days of open source software and the Software Composition Analysis (SCA) tools built to monitor it, new security tools are needed to address AI-generated code.

The rise of AI in software code

When a company develops software, it typically consists of three parts that the organization either purchases or contracts people to create:

code
Open Source
Commercial Law

Now we have a fourth component that contributes code to our software: AI, of course. This unique piece of modern software is layered with every other piece, further complicating the challenge of application security.

AI code is used in two different ways. First, you can use an AI-based code generation tool to write the lines of code for you. The most well-known is Copilot. ChatGPT is also heavily used for code generation. Amazon has a tool called Whisperer, and there are many other smaller tools.

The second way is for people who create software to use AI to replace or improve in some way what they're trying to do, often using publicly available pre-trained models – for example, Hugging Face is a website with 400,000 AI models and 90,000 data sets to train the models on.

The publicly available pre-trained models are similar to what we already do for open source with SCA, except this is a new vulnerability category that didn't exist in other open source projects. The initial use case, however, is more revolutionary, like the advent of open source.

Lessons learned from the emergence of open source

AI-generated code is reminiscent of the early days of open source. Looking back 30-40 years ago, open source was an organic movement where developers and students started publishing code projects online without fees or licenses. Eventually, as open source began to take hold, developers began attaching licenses to their code, abdicating liability if someone used it and something went wrong. Companies moved away from open source, and by the 1990s and 2000s, shunning open source had become a mainstream position for most companies. Today, it's estimated that open source code accounts for 70-90 percent of modern software.

Machine learning-based code might work as well; however, the lifecycle would likely be faster because the transformation is already happening. We see a few companies trying to avoid using LLM-based code, but it's much harder today than it was 30 years ago because everyone is smarter, more agile, and more communicative. There is a real concern that organizations that avoid AI will be left behind, especially since other companies have already adopted it.

SCA: A technology for looking at components

However, there are growing concerns about how to handle AI code, both from a legal/compliance perspective and from a security perspective. AI-generated code brings its own unique challenges and vulnerabilities. Like open source, AI code requires AI-specific tools.

Where SCA scans and identifies open source vulnerabilities, we predict a new SCA market will emerge dedicated to monitoring and securing AI-generated code. SCA for open source didn't exist 15 years ago; today it's a $500 million market. Given the pace of AI adoption, SCA for AI has the potential to match or even exceed that growth.

What's next?

AI-generated code is less secure because the AI models are trained on open source code, which is less secure than commercial code. In other words, “bad code in, bad code out.” Application security technologies are beginning to emerge that address this challenge. It's hard to trace the software supply chain when an open source library gets vulnerable code from a library that gets code from another library. The same is true for AI code, which may be generated from an LLM that was trained on bad code. So, it's not too far-fetched to expect an AI bill of materials (AI BOM) to arrive soon.

Before AI-generated code is adopted across the board, organizations need to be aware of the challenges that AI poses. At the same time, the application security industry must take steps to help monitor and secure AI code in modern software. The evolution of open source has given modern software providers a blueprint for adopting new technologies and practices to ensure that AI-generated code meets expectations and avoids risks.