Robust watermarks for generative models achieve resistance limits with up to half the change bits

Machine Learning


Robust watermarks, a technique for embedding hidden signals in data generated by artificial intelligence, face the fundamental limitations that researchers currently define. Danilo Francati of Sapienza University in Rome, Yevin Nickel Goonatilak of George Mason University, Schbam Pawl of Holloway, London, and colleagues, established a threshold for undetectable transparency and demonstrate that the scheme collapses when more than half of the encoded bits change. The team introduces a new coding abstraction called Messageless Secret-Key code to formalize robust watermark requirements, such as tamper detection and security. Importantly, we not only identify this limitation, but also build efficient code that achieves maximum robustness under standard encryption assumptions, and ensure that the experimental results already work at this critical threshold and provide a complete characterization of the field.

Fragile watermarks and AI-generated content

A central challenge in identifying AI-generated content lies in vulnerabilities in existing watermarking technologies. This is often removed by relatively simple manipulation or attacks. Watermarks aim to establish the origins of AI-generated content and combat misinformation, copyright infringement and malicious use, and are considered a requirement for compliance with new regulations. However, because generative models have significant power in data reproduction and transformation, it is difficult to embed signals that can withstand these processes without being significantly or easily deleted. Current watermarks are defeated by rephrasing text, slight image manipulation, or simply playing back content, and by highlighting the fundamental trade-offs of watermark strength, detectability, and content quality.

Various watermark approaches are being explored, each with its weaknesses. Hidden signatures that embed information in the latent space of a model are vulnerable to manipulation of that space. Paraflushing-based watermarks that control specific phrasing are easily defeated by rewriting the text. Tree ring watermarks that embed patterns in the generation process can be deleted by playback or image manipulation. Techniques such as potential diffusion model watermarks, universal adversarial signatures, optimized watermarks, frequency domain watermarks, and model-based watermarks share common vulnerabilities such as playback, paraphrase, image manipulation, adversity attacks, and removal through forwarding attacks.

Researchers are investigating a variety of offensive strategies and potential defenses. Black box attacks that do not require knowledge of the internal work of the model are often the most practical. Removal attacks are intended to eliminate existing watermarks, while evasion attacks produce content that completely avoids watermarks. Potential defenses such as robust watermark design, embedding watermarks into multiple modalities, developing detection mechanisms, model fingerprinting, search-based defenses, and encryption approaches are often limited in effectiveness. Relying solely on watermarks for regulatory compliance is insufficient, and reliably attributing AI-generated content can be challenging, suggesting the need for a robust detection mechanism and alternative approach to content authentication and source tracking.

Recent theoretical research establishes the fundamental limitations of watermarking in generated AI, drawing on similarities to sand watermarking, indicating that watermarks inevitably leave detectable traces or are vulnerable to removal. Researchers argue that there are inherent limitations on the amount of information that is reliably embedded in the generated content without affecting its quality or detectability, with the need to prioritize its reliance on regulations, attributions, and watermarks alone. Future research should focus on developing more robust methods, exploring alternative approaches to content authentication, and establishing reliable methods to track the origins of AI-generated content.

Watermark Robustness Limitations in Generation Models

This study introduces a new framework for understanding the robustness of encryption watermarks in generative models and establishes precise limitations on how well watermark content can withstand before detection fails. The researchers introduced “Messageless Secret-Key Code,” an abstraction of coding, and formalized the requirements for robust watermarks, soundness, tamper detection, and pseudo-randomness. This allowed them to strictly define thresholds where watermarks would no longer be reliable. The discovery of the core is that for binary output, if more than half of the encoded bits have been changed, and for alphabets of size, all watermark schemes fail to fail. kthere is one threshold k symbol.

These linear codes can withstand errors in binary cases and errors k– A case that represents important advances in watermark design. To test these theoretical findings, researchers experimentally tested the watermark limits of recent image generation techniques, focusing on methods developed by Gunn, Zhao and Song. The generated image was made into a simple crop, resized, reassurely flipped about half of the potential signs, effectively erased the watermark, successfully recovered the codeword, leaving the image visually intact. This experimental confirmation shows that the current watermark scheme is already operating at the edge of robustness limitations, and a fundamentally new approach is needed to further improve it.

The team adopted a rigorous mathematical framework, defined the algorithm as stochastic polynomial time, and used concepts such as Hamming distance to quantify the degree of modification. We also used string, set and randomness notation to accurately model the watermark scheme and attack behavior. This detailed analysis provides a complete characterization of a robust watermark and identifies empirical evidence confirming the exact threshold of the fault, the structure that achieves it, and its practical relevance.

Watermark Robustness Limited by Bit Change Rate

Scientists have established fundamental limitations on the robustness of cryptographic transparency in the generated AI model, proving that the watermark will fail if more than half of the encoded bits are changed in the binary system. This breakthrough comes from the introduction of a new abstraction: Messageless Secret-Key code. This forms the key requirements for robust watermarks, soundness, tamper detection, and pseudo-randomness. This study shows that this limitation is not merely theoretical, but a concrete barrier to achieving greater robustness with current encryption techniques. The team has proven that it can certainly survive changes that are more than half of the encoded bits of the binary system, that is, half of the symbols of the Q-are system (1 – 1/Q).

Conversely, they developed an explicit structure of code that has no messages approaching these limitations, achieving robustness of less than half the bits of the binary system, and only (1-1/Q) of the Q-are system. These codes are constructed using safe pseudo-random functions and public counters to provide efficient, linear time performance, and tolerate errors on half of the encoded symbols. The experiment focused on the latest cutting-edge watermark schemes of images, revealing that simple crop and revival operations reliably flipped about half of the potential signs, effectively erasing the watermark while leaving the image visually intact. This shows that the theoretical limitations identified by the study are in fact already reaching, highlighting the prominent contrast between text and image modalities. The findings position previous impossibility in results within an accurate and quantitative framework, identifying precise thresholds where robustness fails and providing a structure to achieve it. This work establishes a critical characterization of encryption watermarks and suggests that a significant increase in robustness requires a fundamentally new approach beyond the pseudo-randomness of encryption.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *