Software systems are all around us, from computer operating systems to search engines to the automation used in industrial applications. At the heart of all this is data, used in machine learning (ML) components that can be used in a variety of applications such as self-driving cars and large language models (LLMs). Many systems rely on ML components, so ensuring their security and reliability is important.
For ML models trained using robust optimization techniques (robust ML models), the effectiveness against various attacks is unknown. An example of a primary attack vector is backdoor poisoning, which refers to compromising the training data fed into a model. Technologies exist to detect backdoor attacks in standard ML models, but robust models behave differently and have different assumptions than standard models, requiring different detection methods for backdoor attacks. increase.
This is the gap that Dr. Sudipta Chattopadhyay, Assistant Professor in the Information Systems Technology Design (ISTD) pillar at the Singapore University of Technology (SUTD), sought to fill.
In a study titled “Towards Backdoor Attacks and Defenses in Robust Machine Learning Models”, computer and security,assistant. Professor Chattopadhyay and his fellow SUTD researchers studied how to defend against injecting backdoor attacks against robust models of specific ML components called image classifiers. Specifically, the studied model was trained using a state-of-the-art projected gradient descent (PGD) method.
The backdoor problem is urgent and dangerous, especially considering the way software pipelines are developed today. Chattopadhyay said, “Currently, no one develops ML model pipelines and data collection from scratch. They might download training data from the internet or use pretrained models. If trained models or datasets are tainted, the resulting software will be unsafe to use these models, often only 1% data poisoning is required to create a backdoor .”
The difficulty with backdoor attacks is that only the attacker knows the poisoning pattern. A user cannot pass through this harmful pattern to know if an ML model is infected.
“The difficulty of the problem fascinated us. We speculated that the backdoor model might have a different internal structure than the clean model,” says Chattopadhyay.
To this end, Chattopadhyay investigated backdoor attacks against robust models and found them to be highly vulnerable (67.8% success rate). He also found that poisoning his training set creates a mixed input distribution of the poisoned classes, allowing a robust model to learn multiple representations of a given predictive class. In contrast, a clean model only learns a single representation for a given predictive class.
Chattopadhyay and his colleagues used this fact to develop AEGIS, the first backdoor detection technique for robust models trained on PGD. AEGIS uses t-variance stochastic neighborhood embedding (t-SNE) as the dimensionality reduction method and mean-shift clustering as the clustering method to detect multiple feature representations within classes and identify backdoor-infected models. can.
AEGIS works in five steps. (1) generate transformed images using algorithms and (2) extract feature representations from clean training and clean/backdoor transformed images. (3) reduce the dimensionality of features extracted via t-SNE; (4) using the mean shift to compute clusters of the reduced feature representations and (5) counting these clusters to determine if the model is backdoor infected or clean.
If there are two clusters in the model (training image and transformed image), AEGIS will flag this model as clean. If there are three or more clusters (training images, clean transformed images, and poisoned transformed images), AEGIS flags this model as a suspected backdoor infection.
Furthermore, AEGIS effectively detected 91.6% of all backdoor-infected robust models, with a false positive rate of only 11.1%, demonstrating its high effectiveness. The development of AEGIS is significant because even the best backdoor detection technology in the standard model cannot flag backdoors in the robust model. It is important to note that AEGIS specializes in detecting backdoor attacks with its robust model and is ineffective with its standard model.
In addition to being able to detect backdoor attacks with a robust model, AEGIS is also efficient. AEGIS takes an average of 5-9 minutes to identify a backdoor-infected model compared to standard backdoor defenses that take hours or days. In the future, Chattopadhyay aims to further refine AEGIS to work with various and more complex data distributions to defend against more threat models beyond backdoor attacks.
Recognizing the buzz around artificial intelligence (AI) in today’s climate, Chattopadhyay said: “I hope people are aware of the risks associated with AI. LLM-powered technologies like ChatGPT are trending, but there are big risks, and backdoor attacks are just one of them. We aim to introduce trustworthy AI through our research. ”
For more information:
Ezekiel Soremekun et al., Toward Backdoor Attacks and Defenses in Robust Machine Learning Models, computer and security (2023). DOI: 10.1016/j.cose.2023.103101
