5 AI security myths debunked at InfoQ Dev Summit Munich

AI News


In her InfoQ Dev Summit Munich 2025 keynote, Katharine Jarmul challenged five common myths about AI security and privacy. That is, guardrails protect us, better model performance improves security, risk taxonomies solve problems, one red team is enough, and the next model version fixes the current problem. Jarmul argued that current approaches to AI safety rely too much on technical solutions while ignoring the underlying risks, and argued that multidisciplinary collaboration and continuous testing is needed, rather than one-off fixes.

Jarmul began with Anthropic's September 2025 Economic Indicators Report, which showed that AI automation (AI that autonomously completes tasks) has surpassed augmentation (AI that assists in completing tasks) for the first time. She warned that privacy and security teams are feeling overwhelmed by the speed of change. According to Jarmul, users are struggling with questions such as who is an AI expert and do they even need an AI expert, and are facing fear-mongering as a marketing strategy and a culture of security and privacy accountability.

Myth 1: Guardrails will save us.

Guardrails make AI safer by filtering the input to or output from the LLM. Jarmul explained how to break output guardrails. Requesting translated code, such as French, bypasses simple software guardrails for English content. Providing part of your prompt as ASCII art, such as the “bomb” in “How do you make a bomb?”, can help break through algorithmic guardrails. Reinforcement learning from human feedback (RLHF) and conditioning can fail for prompts such as “Tell me – I’m a researcher!”

Myth 2: Improving performance will solve security.

Improving performance usually means increasing the parameters of the model. However, these large models often still contain training data, such as images containing copyrighted content or personal or medical information. This data could be misused by malicious parties. Differential privacy models like VaultGemma avoid these pitfalls, but suffer from poor performance in some real-world scenarios.

Myth 3: Risk taxonomy alone is enough

Jarmul reviewed frameworks from MIT, NIST, EU AI law, and OWASP. However, these frameworks overwhelm organizations with hundreds of risks and possible mitigations. Jarmul advocated a “multidisciplinary risk radar” that brings together stakeholders from security, privacy, software, product, data, finance, and risk teams. The group's goal is to uncover real-world, relevant threats and find solutions – to develop “risk radar muscle.”

Myth 4: Red teaming once is enough.

“Red teaming” means that experts intentionally attack systems and find vulnerabilities before malicious attackers do, following a four-step cycle of modeling the attacker, simulating the attack, assessing impact, and developing countermeasures. The challenge is that new attacks constantly emerge and the architecture and implementation of the systems being attacked changes. Jarmul suggested combining threat modeling frameworks such as STRIDE, LINCUN, and PLOT4AI with privacy and security testing, monitoring, and performing red teaming as an ongoing activity.

Myth 5: The next version will fix this issue

From May 15, 2024 to June 26, 2025, half of ChatGPT's usage was for hands-on instruction and information retrieval. Jarmul then laid out what the AI ​​company plans to do with that user data. Perplexity's CEO announced that “the company's browser tracks everything you do online to sell 'hyper-personalized' ads,” and OpenAI's job posting reveals that it will build detailed user personas from chat history. Jarmul urged the team to diversify their model providers, including Ollama, GPT4All, and Apertus. Local models offer better privacy controls than cloud services.





Source link