Keep backdoors safe with robust machine learning models

A visible distributed trigger is shown in Figure 1(a), with a target label of 7. training data is changed. This is illustrated in Fig. 1(b), where the model was trained using this contaminated data. As shown in Fig. 1(c), inputs without triggers are classified correctly, but inputs with triggers are classified incorrectly during inference. Credit: SUTD

Software systems are all around us, from computer operating systems to search engines to the automation used in industrial applications. At the heart of all this is data, used in machine learning (ML) components that can be used in a variety of applications such as self-driving cars and large language models (LLMs). Many systems rely on ML components, so ensuring their security and reliability is important.

For ML models trained using robust optimization techniques (robust ML models), the effectiveness against various attacks is unknown. An example of a primary attack vector is backdoor poisoning, which refers to compromising the training data fed into a model. Technologies exist to detect backdoor attacks in standard ML models, but robust models behave differently and have different assumptions than standard models, requiring different detection methods for backdoor attacks. increase.

This is the gap that Dr. Sudipta Chattopadhyay, Assistant Professor in the Information Systems Technology Design (ISTD) pillar at the Singapore University of Technology (SUTD), sought to fill.

In a study titled “Towards Backdoor Attacks and Defenses in Robust Machine Learning Models”, computer and security,assistant. Professor Chattopadhyay and his fellow SUTD researchers studied how to defend against injecting backdoor attacks against robust models of specific ML components called image classifiers. Specifically, the studied model was trained using a state-of-the-art projected gradient descent (PGD) method.

The backdoor problem is urgent and dangerous, especially considering the way software pipelines are developed today. Chattopadhyay said, “Currently, no one develops ML model pipelines and data collection from scratch. They might download training data from the internet or use pretrained models. If trained models or datasets are tainted, the resulting software will be unsafe to use these models, often only 1% data poisoning is required to create a backdoor .”

The difficulty with backdoor attacks is that only the attacker knows the poisoning pattern. A user cannot pass through this harmful pattern to know if an ML model is infected.

“The difficulty of the problem fascinated us. We speculated that the backdoor model might have a different internal structure than the clean model,” says Chattopadhyay.

AEGIS attack model. Credit: SUTD

To this end, Chattopadhyay investigated backdoor attacks against robust models and found them to be highly vulnerable (67.8% success rate). He also found that poisoning his training set creates a mixed input distribution of the poisoned classes, allowing a robust model to learn multiple representations of a given predictive class. In contrast, a clean model only learns a single representation for a given predictive class.

Chattopadhyay and his colleagues used this fact to develop AEGIS, the first backdoor detection technique for robust models trained on PGD. AEGIS uses t-variance stochastic neighborhood embedding (t-SNE) as the dimensionality reduction method and mean-shift clustering as the clustering method to detect multiple feature representations within classes and identify backdoor-infected models. can.

AEGIS works in five steps. (1) generate transformed images using algorithms and (2) extract feature representations from clean training and clean/backdoor transformed images. (3) reduce the dimensionality of features extracted via t-SNE; (4) using the mean shift to compute clusters of the reduced feature representations and (5) counting these clusters to determine if the model is backdoor infected or clean.

If there are two clusters in the model (training image and transformed image), AEGIS will flag this model as clean. If there are three or more clusters (training images, clean transformed images, and poisoned transformed images), AEGIS flags this model as a suspected backdoor infection.

Furthermore, AEGIS effectively detected 91.6% of all backdoor-infected robust models, with a false positive rate of only 11.1%, demonstrating its high effectiveness. The development of AEGIS is significant because even the best backdoor detection technology in the standard model cannot flag backdoors in the robust model. It is important to note that AEGIS specializes in detecting backdoor attacks with its robust model and is ineffective with its standard model.

In addition to being able to detect backdoor attacks with a robust model, AEGIS is also efficient. AEGIS takes an average of 5-9 minutes to identify a backdoor-infected model compared to standard backdoor defenses that take hours or days. In the future, Chattopadhyay aims to further refine AEGIS to work with various and more complex data distributions to defend against more threat models beyond backdoor attacks.

Recognizing the buzz around artificial intelligence (AI) in today’s climate, Chattopadhyay said: “I hope people are aware of the risks associated with AI. LLM-powered technologies like ChatGPT are trending, but there are big risks, and backdoor attacks are just one of them. We aim to introduce trustworthy AI through our research. ”

For more information:
Ezekiel Soremekun et al., Toward Backdoor Attacks and Defenses in Robust Machine Learning Models, computer and security (2023). DOI: 10.1016/j.cose.2023.103101

Source link

Registrera commented on World Rugby To Introduce Smart Mouthguards To Detect Player Concussions: I don't think the title of your article matches th
binance referral commented on OpenAI And Anthropic Aim For Big Valuation Spikes, Visa Looks To Join Generative AI Gold Rush: Can you be more specific about the content of your
binance h"anvisning commented on How to Make AI Work for You, at Work: Your article helped me a lot, is there any more re
FxPro Low Leverage commented on Exante launches AI-powered news aggregator Leaprate: 現代日本は、技術革新において世界的に注目されています。特に、自動車産業では、トヨタなどの大手企業が世
anime commented on AI platform Hugging Face says hackers have stolen authentication tokens from Spaces: I recently found IndoNovelList and it’s amazing fo

Keep backdoors safe with robust machine learning models

Leave a Reply

RECENT POSTS

“Beyond AI introduction to organizational redesign”…Hunet accelerates company-wide AX

DOL launches free text message-based AI literacy course

Rapid analysis of Fermi surfaces with machine learning

Related Posts

Leave a Reply