AI models write code that works, but often miss the basics of security

A major review of AI coding tools has shown that many models create software that works, but basic security checks fail. A new study tested over 100 large language models, showing that almost half of the code samples contained vulnerabilities.

Code that runs but is not safe

Veracode researchers assigned the same 80 programming tasks to each model. The task was written in Java, Python, C#, and JavaScript. They were designed to test the frequency at which the model avoided vulnerabilities in well-known software.

These included SQL injection, weak encryption, cross-site scripting, and log-injection issues.

In total, 45% of the code samples failed the security check. This means that we have introduced issues already known to developers and issues that are often documented in industry guidelines. In most cases, the code runs correctly. However, from a security standpoint, it was not enough.

Java has had its worst

Of all the languages tested, Java produced the most unstable results. Only about 28% of the Java samples passed the security check. Python had the best performance, but still failed around 38% of the time. JavaScript and C# fell in between.

The results were particularly poor in the test focusing on cross-site scripting and login injection. Approximately 87% of these code samples did not block the threat. These issues are well understood and are commonly exploited.

Model size didn't make a difference

The researchers compared models, large and small. Some had over 100 billion parameters. The others were much smaller. Despite the differences in scale, security performance remained roughly the same.

In both cases, the average success rate was hovered at nearly 50%. The model has become better at writing Clean Code, but it has not improved writing secure code.

One possible reason is the data used to train these models. Many of them come from public sources, including codes that are not intended to be used in production. Some datasets also contain code samples with known defects, either by chance or for educational purposes.

Lost security from AI prompt

Many AI tools do not apply protection unless your request contains very specific instructions. If the prompt does not mention security, the model will often skip. This means that developers using these tools may get unsafe code without realizing it.

Even if a company does not rely on AI to write code, its system can still be affected. AI-generated code can be added via third-party vendors, open source libraries, or low-coded platforms. If that code is not checked, it can cause problems that are difficult to find later.

Risk without reviews

These tools help speed and convenience, but have yet to replace the need for safe development practices. Without reviews, AI-generated code can increase the risk of data leaks, bugs, and long-term maintenance costs.

The report recommends checking all code samples for vulnerabilities, even if they are from trusted AI assistants. It also points out that better training data and clearer, faster structures are needed.

Generation tools have changed how they build software. However, their role in security remains limited, and much of their production requires a second look before they enter production.

Note: This post was edited/created using Genai Tools.

Read next: As market structure changes, humanity moves ahead with enterprise AI

[ad_2]
Source link