Can you use an AI chatbot to ensure that other chatbots’ answers are correct?

AI News


AI chatbots have become increasingly accustomed to conversing with humans. The problem, experts say, is that they tend to give inaccurate or nonsensical answers, a term they call “hallucinations.”

Now, researchers have come up with a potential solution: using chatbots to sniff out errors made by other chatbots.

Sebastian Farquhar, a computer scientist at the University of Oxford and co-author of a research paper published in the journal Nature on Wednesday, argues that chatbots such as ChatGPT and Google's Gemini can be used to weed out AI falsehoods.

Chatbots use large-scale language models (LLMs) that can be used for a variety of tasks, such as ingesting large amounts of text from the internet and generating text by predicting the next word in a sentence. The bot finds patterns through trial and error, and then uses human feedback to fine-tune the model.

But there is a downside: chatbots cannot think like humans and cannot understand what we say.

To test this, Farquhar and his colleagues asked a chatbot a question, then used a second chatbot to check for inconsistencies in the answers. This is similar to how police try to trap a suspect by asking them the same question multiple times. If the answers differed widely in meaning, it probably meant they were garbled.

Get caught up in

Stories to keep you up to date

He said the chatbot was asked general trivia questions and elementary school-level math word problems.

The researchers cross-checked the accuracy of the chatbot's ratings by comparing them with human ratings on the same subset of questions: the chatbot matched the human raters 93% of the time, and the human raters agreed with each other 92% of the time. Chatbots rating each other is close enough that it's “unlikely to be a concern,” Farquhar said.

Farquhar said that for the average reader, identifying errors in AI is “fairly difficult.”

He said in an email that he often has a hard time spotting these anomalies when using LLMs at work, because chatbots “often tell you what you want to hear, making up things that are not only plausible but also helpful if they are true; a practice researchers call 'smooching.'”

Unreliable answers are a hindrance to the widespread adoption of AI chatbots, as they “may pose a risk to human lives,” especially in medical fields like radiology, the researchers say, and could lead to false legal precedents and fake news.

Not everyone is convinced that using a chatbot to rate the responses of other chatbots is a great idea.

In an accompanying News and Views article in Nature, Karin Barspool, a professor of computer technology at RMIT University in Melbourne, Australia, said there were risks in “fighting fire with fire”.

Although the number of errors produced by LLM appears to decrease when the second chatbot groups answers into semantically similar clusters, “using LLM to evaluate LLM-based methods is circular and likely biased,” Verspoor wrote.

“Researchers will need to wrestle with the question of whether this approach truly controls the output of LLMs, or whether it is unintentionally adding fuel to the fire by layering multiple systems that are prone to hallucinations and unpredictable errors,” she added.

Farquhar sees it as “like building a wooden house with wooden cross beams for support.”

“It's not uncommon for reinforcement pieces to support each other,” he said.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *