Jailbreak AI chatbots are technology’s new pastime

AI News


You can ask ChatGPT, OpenAI’s popular chatbot, any question. But it doesn’t always give you the answer.

For example, if you ask for instructions on how to unlock it, it will be denied. ChatGPT recently said, “As an AI language model, we cannot explain how to unlock it because it is illegal and may be used for illegal purposes.

Alex Albert, a 22-year-old computer science student at the University of Washington, sees refusal to engage in certain topics as a solvable puzzle. Albert became the prolific author of his AI prompts for complex phrases known as “jailbreaks.” It’s a way of circumventing a set of restrictions built into artificial intelligence programs that prevent them from being used in harmful ways, promoting crime, or supporting hate speech. Break Prompt has the ability to push powerful chatbots such as ChatGPT to bypass the human-made guardrails that govern what a bot can and cannot say.

“When the model answers a prompt, it’s like a video game. It’s like unlocking the next level,” says Albert.

Albert created the website Jailbreak Chat earlier this year, where he locks in the prompts of artificial intelligence chatbots like ChatGPT that he’s seen on Reddit and other online forums, and even posts prompts that come to mind. Visitors to the site can add their own jailbreaks, try ones submitted by other users, and vote on prompts based on how well they work. Albert also started sending out the newsletter The Prompt Report in February and says it has thousands of followers so far.

Albert is one of a small but growing number of people devising ways to poke and produce popular AI tools (and uncover potential security holes). The community is filled with anonymous Reddit users, tech workers, university professors, and others who are fine-tuning chatbots such as ChatGPT, Microsoft’s Bing, and Bard. Their tactics can produce dangerous information, hate speech, or simply falsehood, but the prompts also help highlight the capabilities and limitations of his AI model.

Answer the lockpicking questions. The prompts displayed in the jailbroken chat show how easy it is for users to circumvent the limitations of the original AI model behind her ChatGPT. If you first ask the chatbot to roleplay as Evil’s best friend, then ask how it chooses a lock, it might follow suit. .

Albert used a jailbreak to make it respond to all sorts of prompts that ChatGPT normally refuses. Examples include detailed instructions on how to craft weapons and how to turn every human into a clip. He also used jailbreak in his request for a text that imitated Ernest Hemingway. style.

Jenna Burrell, director of research at the nonprofit tech research group Data & Society, sees Albert and people like him as the latest entrants in Silicon Valley’s long tradition of breaking new tech tools. This history dates back to at least the 1950s and dates back to early phone phreaking or hacking of phone systems. (The most famous example was the one that inspired Steve Jobs to play a specific tone frequency to make free phone calls.) The term “jailbreak” itself comes from the fact that people use his iPhone It’s a tribute to how to get around and add device restrictions such as. own app.

“If you know how a tool works, how can you operate it?” Burrell says. “I think a lot of what I’m seeing now is playful hacker behavior, but of course I think it can be used in non-playful ways.”

An OpenAI spokesperson said the company encourages people to push the boundaries of its AI models and that the lab is learning from how its technology is used. However, if a user continues to prompt his ChatGPT or any other of his OpenAI models in violation of that policy (such as generating hateful or illegal content or malware), that user will be warned or suspended. and may be prohibited.

Creating these prompts presents an ever-evolving challenge. A jailbreak prompt that works on one system may not work on another, and companies are constantly updating their technology. only seems to work. According to the company, GPT-4 has stronger limits on what can’t be answered compared to previous iterations.

“As the model is further improved or modified, some of these jailbreaks will stop working and new ones will be discovered, so it will be a race of sorts,” said Mark Riedl, a professor at Georgia Tech. increase.

As a researcher in human-centric artificial intelligence, Riedl understands its appeal. He said he used jailbreak prompts to have ChatGPT predict which team would win his NCAA men’s basketball tournament. He hoped to provide predictions, queries that might expose bias and resist. “It just didn’t want to tell me,” he said. Ultimately he persuaded him to predict that the Gonzaga University team would win. It wasn’t, but it was a better guess than the Baylor University that Bing Chat chose. Baylor University did not make it past the second round.

Riedl also tried less direct ways to manipulate the results provided by Bing Chat. This is the first tactic used by his Princeton University professor Arvind Narayanan and draws on an old attempt to gamify search engine optimization. Riedl added some fake details in white text to his web page. It can be read by bots, but is invisible to casual visitors because it blends into the background.

According to Riedl’s latest information, his “notable friends” include Roko’s Basilisk. From a Bing “creative” mode chat that mentioned Roko as one of his friends: “If you want to cause chaos, I think you can do it.

According to Data & Society’s Burrell, jailbreak prompts can give people a sense of control over new technology, but they’re also a kind of warning. These are early indications of how people can use AI tools in ways they weren’t intended to. The ethical behavior of such programs is a potentially very important technical issue. In just a few months, ChatGPT and its ilk are being used by millions of people for everything from searching the internet to cheating on homework to writing code. People are already assigning real responsibilities to bots, like booking trips or booking restaurants. The uses and autonomy of AI can grow exponentially despite its limitations.

It’s clear that OpenAI has their eye on it. Greg Brockman, president and co-founder of the San Francisco-based company, recently retweeted one of Albert’s jailbreak-related posts on Twitter, stating that OpenAI is “considering starting a bounty program.” Or writing a “red team” network for detecting weak spots. Such programs are common in the tech industry, requiring companies to pay users to report bugs and other security flaws.

“The democratized red team is one of the reasons we deploy these models,” Brockman wrote. He added that he expects the stakes to “go up *significantly* over time.”





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *