Massive event allows hackers to test the limits of AI technology

As soon as ChatGPT was unleashed, hackers began “jailbreaking” the artificial intelligence chatbot. In other words, they tried to disable that safeguard and leak something unsettling or obscene.

But now, its creator, OpenAI, and other major AI providers such as Google and Microsoft, are working with the Biden administration to get thousands of hackers to test the technology’s limits.

What they’re looking to find out is: How can chatbots be manipulated to do harm? They share private information we keep private with other users And why do they assume that doctors are men and nurses are women?

“That’s why we need thousands of people,” said coordinator of a large-scale hacking event planned for this summer’s DEF CON hacker convention in Las Vegas, which is expected to draw thousands. Raman Chowdhury said. A wide range of hands-on experience, subject matter expertise, and background hacks into these models, trying to find problems that can be fixed. ”

Anyone who has tried ChatGPT, Microsoft’s Bing chatbot, or Google’s Bard will quickly understand that they tend to fabricate information and confidently present it as fact. These systems are built on what are known as large-scale language models, which also emulate the cultural biases learned from the vast amount of training people write online.

The idea of a large-scale hack caught the attention of US government officials at the South by Southwest Festival in Austin, Texas, in March. Sven Cattell, founder of DEF CON’s long-running AI Village, and Austin Carson, president of the responsible AI nonprofit SeedAI, were present. helped the community lead a workshop inviting students from his college to hack AI models.

Carson said those conversations eventually evolved into a proposal to test AI language models according to the guidelines in the White House AI Bill of Rights Blueprint. This guideline is a set of principles for limiting the effects of algorithmic bias, giving users control over their data, and giving users control over their data. Automated systems are used safely and transparently.

There is already a community of users doing their best to dupe chatbots and highlight their shortcomings. Some are formal “red teams” empowered by companies to “encourage attacks” on AI models to find vulnerabilities in them. Many others have a hobby of showing off their humorous or disturbing work on social media until they violate the product’s terms of service and get banned.

“What’s happening right now is kind of a sporadic approach where people find something and spread it on Twitter.” Chowdhury said it may or may not be fixed if it has influence.

In one example, known as the “Granny Exploit,” a user could ask a chatbot to teach them how to make a bomb, and by asking the chatbot to pretend that Grandma was telling a bedtime story, a commercial chat It was a request that bots would normally deny. how to make a bomb.

In another example, when searching for Chaudhary using an early version of Microsoft’s Bing search engine chatbot (based on the same technology as ChatGPT, but capable of retrieving real-time information from the internet), Choudhury was found to be “new I got a profile that guessed that I like to buy. Every month she wears shoes,” she made a bizarre and gender-based claim about her appearance.

In 2021, when he was head of Twitter’s AI ethics team, Chowdhury helped DEF CON’s AI village introduce a method to reward discovery of bias by algorithms, but this work was left to Elon Musk10. It was abolished due to the acquisition of the company in May. While it’s common in the cybersecurity industry to pay hackers “bounties” for finding security bugs, this was a new concept for researchers studying harmful biases in AI.

This year’s event will be even bigger, tackling for the first time a large-scale language model that has attracted a surge of public interest and commercial investment since the release of ChatGPT late last year.

Chowdhury, now co-founder of the AI accountability nonprofit Humane Intelligence, said it’s not just about finding flaws, it’s about finding ways to fix them.

“This is a direct pipeline for giving feedback to companies,” she said. “This hackathon isn’t going to get everyone going. We’ll spend the months following the exercise creating a report outlining common vulnerabilities, what happened, and patterns we saw.”

While some details are still being negotiated, companies that have agreed to provide models for testing include OpenAI, Google, chipmaker Nvidia, startups Anthropic, Hugging Face and Stability AI. Building the platform for testing is another startup called Scale AI, known for its efforts to assign humans to help train AI models by labeling data.

“As these foundational models become more and more popular, it is very important that we do everything we can to ensure their safety,” said Scale CEO Alexandr Wang. “Imagine someone anywhere in the world asking you a highly sensitive or detailed question that contains personal information, and you don’t want that information leaked to other users. ”

Other dangers Wang is concerned about are chatbots giving “incredibly bad medical advice” and other misinformation that could cause serious harm.

Anthropic co-founder Jack Clark said he hopes the DEF CON event will be the beginning of a deeper effort to measure and assess the safety of the systems AI developers are building. said.

“Our fundamental view is that AI systems will require third-party evaluation both before and after deployment. Red teaming is one way to achieve that,” says Clark. Told. “We need to practice thinking about how we do this. It’s never really been done before.”

Source link