Adversarial Machine Learning – Security Boulevard

Machine Learning


Artificial Intelligence (AI) is increasingly embedded in our daily lives. For example, many of us may have Alexa-enabled devices in our homes, or use Siri on our iPhones. However, since his ChatGPT went public late last year, there is a widely shared perception that the trajectory of AI is accelerating rapidly.

In fact, in March 2023, Bill Gates declared on his blog that the “Age of AI” had begun. he writes:

Developing AI is as fundamental as creating microprocessors, personal computers, the Internet, and mobile phones. It will change the way people work, learn, travel, manage their health and communicate. The entire industry will pivot around it. Businesses are differentiated by how well they use it.

The key takeaway is that AI will soon become an integral part of our daily lives. As such, concerns about the security of AI systems are only growing. Therefore, it makes sense to study this issue carefully.

That’s a big problem. This has to do with the nature of AI systems. AI systems have many parts. At a minimum, the components of an AI system include data, models, instructions for training, testing, and deploying machine learning (ML) models, and the infrastructure required for all of this. There are, of course, many possibilities when it comes to designing the infrastructure involved, but training, testing, and deploying ML models involves several kinds of data modalities, many kinds of models, and There are various procedures. So it’s no surprise that effective attacks against AI systems are widespread.

To understand the problem of protecting AI systems, it is useful to develop a high-level framework for classifying and classifying attacks. In particular, it serves the important purpose of standardizing terminology used in the AI ​​and cybersecurity communities. To this end, NIST issued his March 2023 AI 100-2e2023 ipd. The title is “Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations”. The purpose of this blog post is to review the taxonomy developed by the author of this report, as shown in Figure 1.

Figure 1: Classification of attacks against AI systems

attack target

The CIA Triad is a benchmark model used to evaluate system security. His three attributes of systems evaluated in the model are confidentiality, integrity, and availability. Broadly speaking, NIST AI 100-2e2023 considers which of these attributes attackers are most interested in to classify attacks against AI systems. In Figure 1, each of these attributes is in the center of its respective circle.

  • invasion of privacy – The principle of confidentiality dictates that data should not be accessed without authorization. In other words, the data should be kept private. Therefore, the purpose of privacy violation attacks against AI systems is to learn information about training data or ML models. There are different types of privacy attacks, which are detailed below.

  • integrity violation – The integrity principle states that data should not be modified by unauthorized entities. The goal of an integrity violation attack is to target the output of an ML model to generate incorrect predictions. Integrity violations can be performed by launching “evasion attacks” or “poisoning attacks”. These are described below.

  • Inventory status breakdown – The availability principle states that data must be accessible to authorized users. In the availability breakdown, attackers try to break the model’s performance during the testing or deployment phase. This kind of attack uses the “poisoning” technique described below.

Ability of Attacker

To accomplish one of the above objectives, NIST AI 100-2e2023 lists six types of functionality that attackers may exploit. These refer to common strategies that adversaries may use to achieve their overall objectives. In Figure 1, these features are shown on the outer layer of each circle. Let’s take a quick look at each of these.

  • training data control – Attackers control a subset of training data by inserting or modifying training samples.

  • model control – Attackers control model parameters by injecting Trojan horses into models or by sending malicious local model updates in federated learning setups.

  • Data control test – Attackers add perturbations to test samples during model deployment.

  • label control – The attacker controls the labels of the training samples (in supervised learning).

  • source code management – Attackers modify the source code of ML algorithms, especially open source components such as third-party libraries.

  • query access – Attackers send queries to models managed by cloud providers.

Note – As shown in Figure 1, query access is a feature only used in privacy attacks. In an integrity violation, an adversary could exploit any of his six functions above. Finally, in the availability breakdown, an attacker needs model control, label control, training data control, or query access.

Attacker knowledge

An important consideration when classifying attacks is the level of knowledge the attacker has of the AI ​​system. There are three possibilities for him: white box attacks, black box attacks, and gray box attacks.

  • white box – Attackers operate with complete knowledge of AI systems.

  • Black box – Attackers operate with minimal knowledge of AI systems.

  • gray box – Attackers operate with some knowledge of AI systems.

Attacker knowledge is an important factor in attack classification as it affects the types of attacks an adversary can mount. This is probably most noticeable in the evasion attacks described below.

evasion attack

The goal of evasion attacks is to generate “adversarial examples”. This is a test sample that allows an attacker to change the classification with minimal impact during deployment. NIST AI 100-2e2023 lists several ways in which this can occur, including optimization-based methods, universal evasion attacks, physically feasible attacks, score-based attacks, decision-based attacks, and transfer attacks. identifies the method of

  • Optimization-based method – Attackers familiar with ML models computes the gradient for the loss function of the modelintended to generate an adversarial example at some distance from the original test sample.

  • All-around dodge attack – Built by attackers small universal perturbation This can add to data, especially images, to induce misclassification.

  • Physically Feasible Attack – Attackers target AI systems a method that is feasible in the physical world, For example, put black and white stickers on road signs to evade road sign detection classifiers.

  • score-based attack – The attacker is model confidence score or logit Queries use optimization techniques to create adversarial examples.

  • decision-based attack – The attacker is final predicate label Create adversarial examples using various techniques such as optimization techniques.

  • transfer attack – The attacker is Alternate ML modelgenerates an attack against it and forwards the attack to the target model.

Note that optimization-based methods, universal evasion attacks, and physically feasible attacks are considered white-box attacks. Score-based attacks and decision-based attacks are considered white-box attacks.

poisoning attack

A broadly defined poisoning attack is an attack that targets the training stage of an ML algorithm. This kind of attack can be launched with the goal of compromising the availability or integrity of AI systems. Broadly speaking, NIST AI 100-2e2023 classifies poisoning attacks into one of his four classes: availability poisoning, target poisoning, backdoor poisoning, and model poisoning.

  • availability poisoning– Caused by an attacker Indiscriminate degradation All samples are affected by the ML model, effectively causing a denial of service.

  • target poisoning – The attacker induces changes in the ML model’s predictions. small numbers of the target sample.

  • back door poisoning – Attackers are small patch trigger Change the label to the target class on a subset of the training data.

  • model addiction – Attacker attempts change directly A trained ML model by injecting malicious features.

privacy attack

Attackers launch privacy attacks with the goal of compromising the confidentiality of AI systems. In other words, the goal of privacy attacks is to give attackers access to privileged data, especially training data and ML models. NIST AI 100-2e2023 identifies her five types of privacy attacks: data reconstruction, memory, membership inference, model extraction, and property inference.

  • data reconstruction – attacker reverse engineer Access to aggregate statistical information from access to sensitive information or other sensitive data about individual users. According to the authors of the NIST report, this is the most concerning privacy attack.

  • Memorization – The attacker is Extract training data For example, from a generative ML model by inserting a synthetic canary in the training data and extracting it.

  • Membership Inference – The attacker specific record Or the data samples are part of the training dataset.

  • model extraction – Attacker attempted Extract information about model architecture and parameters Send a query to the ML model. This class of attacks specifically targets ML models trained by MLaaS providers.

  • Property Inference – Attackers try to learn Global information – Sensitive demographic information, etc. – Information about distributing training data by interacting with the ML model.

Conclusion

This blog post only scratches the surface of NIST AI 100-2e2023. Readers interested in the details of the above taxonomies, not to mention mitigation strategies, are encouraged to read the report.

Needless to say, developers of AI systems want to harden their techniques against the aforementioned attacks in order to make their systems more trustworthy. At the same time, perhaps he wants AI developers to maximize model performance. A troubling conclusion of NIST AI 100-2e2023 is that it may not be possible to simultaneously maximize AI system performance and attributes that contribute to system reliability, especially adversarial robustness.

Worried about the security of your AI tools? Schedule a time to speak with one of our experts.

*** This is a ModernCyber ​​Blog Security Bloggers Network syndicated blog created by ModernCyber ​​Blog. Read the original post: https://www.moderncyber.com/blog/Adversarial-Machine-Learning



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *