A framework to enhance the security of text-to-image generation networks

Machine Learning


This article has been reviewed in accordance with Science X's editorial processes and policies. The editors have highlighted the following attributes while ensuring the authenticity of the content:

fact confirmed

preprint

trusted sources

proofread


Overview of potential guards. First, the team compiled a dataset of safe and unsafe prompts centered around blacklisted concepts (left). We then leveraged a pre-trained text encoder to extract features and mapped them to the learned latent space using an embedded mapping layer (center). Only the embedding mapping layer is trained and all other parameters remain fixed. The team trained by imposing a contrasting loss on the extracted embeddings, separating unsafe prompt/concept embeddings from safe ones while bringing them closer together (right).Credit: Liu et al.

× close


Overview of potential guards. First, the team compiled a dataset of safe and unsafe prompts centered around blacklisted concepts (left). We then leveraged a pre-trained text encoder to extract features and mapped them to the learned latent space using an embedded mapping layer (center). Only the embedding mapping layer is trained and all other parameters remain fixed. The team trained by imposing a contrasting loss on the extracted embeddings, separating unsafe prompt/concept embeddings from safe ones while bringing them closer together (right).Credit: Liu et al.

The advent of machine learning algorithms that can generate text and images according to the instructions of a human user has opened up new possibilities for creating specific content at low cost. One type of these algorithms that is fundamentally transforming creative processes around the world are so-called text-to-image (T2I) generative networks.

T2I artificial intelligence (AI) tools such as DALL-E 3 and Stable Diffusion are deep learning-based models that can generate realistic images tailored to text descriptions and user prompts. These AI tools are becoming increasingly popular, but their misuse poses significant risks, from privacy violations to promoting misinformation and image manipulation.

Researchers from the Hong Kong University of Science and Technology and the University of Oxford recently developed Latent Guard, a framework designed to improve the security of T2I-generated networks.That framework outlined in a previously published paper arXivcan prevent the generation of unwanted or unethical content by processing user prompts and detecting the presence of concepts included in an updatable blacklist.

“T2I models, which have the ability to generate high-quality images, can be exploited to create inappropriate content,” Runtao Liu, Ashkan Khakzar and colleagues wrote in their paper.

“To prevent abuse, existing security measures are based on easily circumvented text blacklists or classification of harmful content, require large datasets for training, and are inflexible. We propose Latent Guard, a framework designed to improve safety measures in the T2I generation.

Latent Guard, a framework developed by Liu, Khakzar, and colleagues, takes inspiration from previous blacklist-based approaches to increase the security of T2I-generated networks. These approaches essentially consist of creating a list of “forbidden” words that cannot be included in user prompts, thereby limiting unethical use of these networks .

A limitation of most existing blacklist-based techniques is that a malicious user can circumvent the technique by changing the wording of the prompt and refraining from using blacklisted words. This means that they may ultimately be able to create and potentially disseminate any offensive or unethical content they wish to create.

To overcome this limitation, the Latent Guard framework goes beyond the exact representation of the input text or user prompt to extract features from the text and map them to a previously learned latent space. This improves the ability to detect unwanted prompts and prevents images from being generated for these prompts.

“Inspired by blacklist-based approaches, Latent Guard learns the latent space on top of the text encoder of a T2I model and is able to check whether harmful concepts are present in the input text embeddings.” write Liu, Khakzar et al.

“The framework we propose consists of a task-specific data generation pipeline using large-scale language models, ad-hoc architectural components, and contrastive learning strategies that benefit from the generated data. Masu.”

Liu, Khakzar, and their collaborators evaluated their approach in a series of experiments using three different datasets and comparing its performance to four other baseline T2I generation methods. One of the datasets they used, the CoPro dataset, was developed by the team specifically for this study and contained a total of 176,516 safe and unsafe/unethical text prompts. Prompts were included.

“Our experiments show that our approach enables robust detection of unsafe prompts in many scenarios and provides good generalization performance across different datasets and concepts,” the study said. they wrote.

Initial results collected by Liu, Khakzar, and their colleagues show that Latent Guard is a very promising approach to increasing the security of T2I-generated networks and reducing the risk of these networks being used inappropriately. suggests. The team plans to publish both the framework's underlying code and the CoPro dataset on his GitHub soon, allowing other developers and research groups to experiment with the approach.

For more information:
Runtao Liu et al., Latent Guard: A secure framework for text-to-image generation, arXiv (2024). DOI: 10.48550/arxiv.2404.08031

Magazine information:
arXiv



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *