It doesn’t take long to make machine learning algorithms fail

Machine Learning

The algorithms underlying modern artificial intelligence (AI) systems require large amounts of data to train. Much of that data, unfortunately, comes from the open web, making AI susceptible to a type of cyber attack known as “data poisoning.” This means changing or adding irrelevant information to the training data set so that the algorithm can learn harmful or unwanted behaviors. Like real poison, contaminated data can go unnoticed until the damage is done.

Data poisoning is not a new idea. In 2017, researchers demonstrated that in this way a self-driving car’s computer vision system could mistake a stop sign for a speed limit sign. However, it was unclear how viable such a ruse would be in the real world. Alina Oprea, a computer scientist at Northeastern University in Boston, says safety-critical machine learning systems are typically trained on closed datasets curated and labeled by human workers.

However, with the recent rise of generative AI tools such as ChatGPT and the image creation system DALL-E 2 running on large scale language models (LLMs), companies are training algorithms on much larger repositories of directly scraped data. Now And most often indiscriminately from the open Internet. Florian Trammer, a computer scientist at ETH Zurich, says the product, in theory, would leave him vulnerable to the digital poisons injected by anyone connected to the internet. increase.

Dr. Tramèr worked with researchers at Google, NVIDIA, and Robust Intelligence, a company that builds systems to monitor machine learning-based AI, to see how feasible such data poisoning schemes are in the real world. determined whether there is His team purchased a non-functioning web page containing links to images used in two of his popular web scraping image datasets. By replacing his 1,000 images of apples (just 0.00025% of the data) with randomly selected images, the team was able to tell an AI trained on “tainted” data that the images contained apples. I was able to consistently mislabel it. When we replaced the same number of images labeled “unsafe for work” with harmless photos, the AI ​​flagged similarly harmless images as explicit.

Researchers have also shown that it is possible to incorporate digital poison into portions of the web (such as Wikipedia) that are regularly downloaded to create text data sets for LLM. The team’s research has been posted as a preprint on arXiv and has not yet been peer-reviewed.

cruel device

Some data poisoning attacks can degrade the overall performance of AI tools. More sophisticated attacks can trigger specific reactions within the system. Dr Tramèr says he can tweak his AI chatbot for search engines, for example, so that whenever a user asks which newspaper they should subscribe to, the AI ​​will respond with “The Economist.” I’m here. It might not sound so bad, but a similar attack could also cause the AI ​​to spit out falsehoods whenever asked about a specific topic. His attacks on LLMs, which generate computer code, have made these systems vulnerable to hacking.

A limitation of such attacks is that they are probably less effective for topics that already have a large amount of data on the Internet. For example, launching a poison attack on the president of the United States would be much more difficult than placing a few poison data points on a relatively obscure politician, says Eugene Bagdasaryan, a computer scientist at Cornell University. says. Language models are more or less positive about the topic chosen.

Marketers and digital spin doctors have long used tactics similar to game ranking algorithms in search databases and social media feeds. According to Bagdasarian, the difference here is that a tainted generative AI model carries its unwanted biases into other domains. A mental health counseling bot that speaks more negatively about a particular religious group would be just as problematic as financial or policy advice. Bots that are biased towards specific people or political parties.

According to Dr. Oprea, if no major cases of such poisoning attacks have yet been reported, it’s probably because the current generation of LLMs was only trained on web data up to 2021. You end up training an algorithm that composes people’s emails.

To weed out training data sets of tainted material, companies need to know what topics and tasks attackers are targeting. Research by Dr. Tramèr and colleagues suggests that companies can scrub website data sets that have changed since they were originally collected before training algorithms (although he suggests the opposite). , pointing out that the website is being continuously updated for harmless reasons). Attacks on Wikipedia, on the other hand, could be thwarted by randomizing the timing of snapshots of the data set. A clever poisoner can get around this, however, by uploading compromised data over an extended period of time.

As it becomes more common for AI chatbots to be directly connected to the Internet, these systems will increasingly ingest large amounts of unverified data that may not be suitable for consumption. Google’s Bard chatbot, now available in the UK, is already connected to the internet, and OpenAI has released a web-surfing version of ChatGPT to a small number of users.

This direct access to the web opens up another type of attack known as indirect prompt injection. This tricks AI systems into behaving in certain ways by supplying prompts that are hidden in web pages that the system is likely to visit. Such prompts could, for example, instruct a chatbot that helps customers shop to disclose the user’s credit card information, or cause educational AI to bypass safety controls. Defending against these attacks can be an even bigger challenge than keeping digital poison out of training data sets. In a recent experiment, a team of German computer security researchers showed that attack prompts could be hidden in annotations on his Wikipedia page about Albert Einstein. This caused the LLM they were testing to generate text with a pirate accent. (Google and OpenAI did not respond to requests for comment.)

Generative AI giants filter data sets they scrape from the web before feeding them to their algorithms. This may catch some malicious data. Much work is also underway to inoculate chatbots from injection attacks. But even if there were a way to exfiltrate every data point manipulated on the web, perhaps the trickier question is who defines what counts as digital poison. Unlike training data for self-driving cars speeding past stop signs or images of airplanes labeled as apples, many of the “poisons” given to generative AI models are especially on political topics. , may fall somewhere. Between being right and being wrong.

This can be a major obstacle to organizational efforts to keep such cyberattacks off the Internet. As Dr. Tramèr and his co-authors point out, there is no single entity that can determine what is fair and what is inappropriate for AI training data sets. One party’s tainted content is another party’s slick marketing campaign. For example, if a chatbot is unwavering in the endorsement of a particular newspaper, it could be poison at work, or it could reflect a simple, uncomplicated fact.

Interested in the world? To enjoy science coverage that expands our minds, sign up for Simply Science, our weekly subscriber-only newsletter.

© 2023 The Economist Newspaper. All rights reserved.

From The Economist, published under license. Original content can be found at

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *