OpenAI develops AI model that criticizes its own AI models • The Register

To help ChatGPT detect code errors, OpenAI is leveraging human AI trainers in hopes of improving the model. OpenAI has developed another AI model called CriticGPT to assist the human trainers in cases where humans fail to notice errors.

The Microsoft-backed SuperLab published the paper on Thursday. [PDF] An article titled “Helping LLM critics find the bugs in LLMs” describes this approach.

Generative AI models like GPT-4o are trained on vast amounts of data and then undergo a process of refinement called reinforcement learning with human feedback (RLHF).

This typically involves human workers, often hired through crowdsourcing platforms, interacting with the model and annotating its answers to various questions. When Time magazine investigated this last year, it found that OpenAI was paying workers in Kenya less than $2 an hour to improve its models.

The goal is to teach the model which answers are preferred and improve performance. But as the model becomes more capable, RLHF becomes less effective. In particular, as the chatbot learns to know more than its teacher, it becomes harder for the human AI trainer to identify incorrect answers.

So to help those tasked with providing feedback to improve the model's ability to generate programming code, OpenAI created another model to critique those generated responses.

“We trained a GPT-4-based model, called CriticGPT, to detect errors in ChatGPT's code output,” the AI startup explained in a blog post. “We found that reviewing ChatGPT's code with the help of CriticGPT performs 60% better than unaided reviewers.”

Screenshot of a diagram from the OpenAI paper “LLM critics help find bugs in LLM” – click to enlarge

In other words, this is not an autonomous feedback loop from one chatbot to another, but a way to augment the knowledge of the person managing the reinforcement learning.

This approach seems to lead to better results than relying on crowdsourced workers, who, at $2 an hour, probably aren't computer science professors or sharp technical writers or whatever the current annotation rates are.

According to the paper, the results show that “LLM detects significantly more injected bugs than paid, qualified human code reviews, and furthermore, model critiques are preferred over human critiques more than 80 percent of the time.”

The finding that CriticGPT can help AI trainers write better model-response critiques isn’t entirely surprising: Perhaps even run-of-the-mill office temps will write better-crafted email messages with the help of generative AI.

But the AI assistance comes at a cost: when human contractors work in tandem with CriticGPT, the resulting critiques in ChatGPT replies have a lower rate of hallucinated (invented bugs) than CriticGPT replies alone, but their error rates are higher than when a human AI trainer responds without AI assistance.

“Unfortunately, it is not clear what the appropriate trade-off between hallucinations and bug detection is for an overall RLHF system that uses critique to improve model performance,” the paper acknowledges.®

It's all about artificial intelligence…

Testing revealed that OpenAI's ChatGPT sometimes served users broken URLs for at least 10 publications (that, you guessed it, have licensing agreements with OpenAI). These partnerships were supposed to ensure that when the chatbot generated answers based on articles with these titles, the system would kindly verify the source and provide a link to the original article. And that was it.
The Center for Investigative Journalism, the nonprofit that runs Mother Jones and Reveal, is suing OpenAI and its backer, Microsoft, for illegally using copyrighted content without permission or compensation.
Speaking of Microsoft, the Windows giant's translation and web search engine in China, Bing, is more aggressive in censoring content than its Chinese rivals, a study has found.

Source link