In 2023, Google was in a race to catch up with ChatGpt.
Hundreds of documents obtained by Business Insider reveal that Sucelate AI's Google contractor systematically uses chatgpt to improve Google's own chatbot at the time. When it was released earlier that year, Bird, who has since been renamed Gemini, was internally laughed at him as “hurried” and “failed”.
The scale AI contractor generated thousands of responses from ChatGpt and compared them to “rewrites” of their own Bard answers. They then improved the rewrite, at least surpassing or at least matching ChatGPT, and then reverting all the data back to Google.
The Scale AI Manager wrote in detail how ChatGpt answers make better formatting and more interesting facts. They ordered workers to “explain why GPT4 is better” and “do it better than GPT”. A single spreadsheet has flagged dozens of contractors to write answers that are “consistently worse than GPT4.” In one example, the document stated that contractors can get a 15% bonus for answers with better performance than chatGpt.
Scale AI is a startup at San Francisco and is doing the crucial AI Grunt work for Big Tech. Using an army of human contractors, we do things like labeling images and rewrite chatbot answers, like Google's case. Meta reportedly invests $15 billion in AI as part of a blockbuster AI deal to buy almost half of the company for its internal “super intelligence” team and hire CEO Alexandr Wang.
Documents obtained by BI show how closely Google monitored the work of its chief rival.
Openai's terms and conditions at the time prohibited others from using its output to “develop models that compete with Openai.” Scale AI and Google did not answer any questions about whether they had permission from OpenAI for detailed comparisons and rewrites.
Scale AI told BI that ChatGpt output was not used to train Google or other models and is part of a routine “assessment” that it said is an industry standard.
“Scale didn't use chatgpt responses to train gemini or models, and it wasn't,” a Scale spokesman said in a statement. A spokesperson said the document describes “a standard parallel evaluation rather than using ChatGPT or third-party model output for training.”
“Avoiding competition in sorting is a standard practice for the industry, and these assessment results are not used to train models,” the spokesman said.
Similarly, Google said, “The suggestion that Gemini was trained using models from other companies is inaccurate.”
Experts told BI that this kind of comparison is actually common in some top AI labs. Open AI, which is currently in partnership talks with Google Cloud, did not respond to repeated requests for comments.
Project “bulba”
Scale AI gave Bird the catchy codename “Burba” after the Pokemon Bourbasaurus. The mission was clear. Compare Bulba's answers with ChatGpt to improve them.
Scale AI never mentioned Google by name in the document, instead referring to its anonymous “client”. I reference birds 12 times in a private Google sheet entitled “Bard Rewrite GPT4”, and one training document slide contains the Google logo.
Alexandr Wang, founder of scalle ai. Jeff Chiu/AP
In July 2023, managers ordered workers to study the GPT-4 response in detail and understand why they are superior to the bards. “Think of feedback you can share so that experts can write responses better than GPT4 or at least the same,” the manager wrote.
Scale AI also created a spreadsheet in October 2023 that directly compares the rewrite of 1,729 birds to ChatGPT. Each post was rated with labels such as “worst than GPT4” and “some fixes needed.” In one example, the workers rewrite a review of the nursery chair bard, which the manager stamped as “worst than GPT” because “there is no detail compared to GPT4.”
Another contractor's review of the History Museum in Charleston also did not cut. The manager wrote that the ChatGpt version is “much better.”
Scale AI also used ChatGpt to improve Bard's response in certain domains such as engineering and physics. In an update starting in August 2023, the Scale AI Manager wrote that using GPT4 guidance, “REDOs Google's AI answers to engineering-related questions.”
The document showed that Scale AI and Google would prohibit contractors from rewriting and pasting ChatGPT responses directly, but there is an issue where many contractors were flagged.
According to Scale AI, the comparison was not for training
The reviewed internal documents explain that the project's goals will help “train” to provide more specific and complete answers and refer to “model improvement” efforts.
Google did not answer follow-up questions about whether these comparisons had an impact on training. Scale AI said there is a clear line between evaluating the performance of the model and training it. He also said that ChatGpt output was only used for the former.
“There is a difference between training data and assessment data,” the spokesman said. “The evaluation data is used to measure the performance of the model, not by the model to train the model.”
Matthew Guzdial, assistant professor of computer science at the University of Alberta, said the assessment data could still affect AI models.
“You can argue that all they're doing is looking at those outputs and evaluating that information to adjust the structure of the model, but they're involved in the training process,” he told BI.
The document was published
Scale AI, which had not been previously published about his work at Google, left over more than 300 pages of Google Doc Public.
This includes numerous links to other Google Docs. Many of them are publicly available and contain sensitive information including contractor compensation details, personal email addresses, performance reviews, and other passwords that are still working. Some parts of Google Docs can still be edited by those who have the link.
Scale AI told BI it is “actively investigating” how the documents were “accessed” and “actively investigating” “to take steps to improve careless exposure.”
More than two days after BI told AI about Public Google Doc to Scale AI, it was still online and available to anyone with a link to download.
Google is ahead of AI once again
Google CEO Sundar Pichai. Klaudia Radecka/Nurphoto
The document does not specify how effective the comparison effort is. Since Bard Flub in 2023, Google has rebranded Bard to Gemini and converted it to an AI delivery machine. Last month we launched over 100 new AI products and features at our annual developer conference, I/O.
Google CEO Sundar Pichai rattled off the industry benchmarks that Gemini is topping, launching a speech on I/O, promoting the company's latest AI achievements.
“We're shipping faster than ever,” Pichai said on stage.
Any hints? Please contact this reporter by email crolet@insider.com Or with signal and whatsapp 628-282-2811. Use your personal email address and unprocessed devices. Here's a guide to sharing information safely.

