- Is YouTube's vast library of content being used to train AI models?
- CEO Neel Mohan said some creators have deals with the platform that could result in their content being used.
- He also said that if OpenAI were to train Sora on videos within YouTube content, it would be in violation of their terms of service.
Google is betting big on AI, and AI models, including Google's Gemini, require huge amounts of training data to be competitive.
So the natural question is: Is Google looking to leverage its vast collection of YouTube videos to further its AI ambitions?
To answer that question, we looked at what YouTube's CEO has to say on the matter, what the platform's terms of service say, and also sent some clarifications to parent company Google.
In an interview, YouTube CEO Neil Mohan was asked about the possibility of Google using YouTube's vast library of digital content to train AI models.
The New York Times reported in April that “Like OpenAI, Google also transcribed YouTube videos to harvest the text for its AI models, according to five people familiar with the company's practices, which could violate the videos' copyright, which belongs to their creators.”
Mohan said some YouTube creators have special deals that allow their content to be used to train AI.
“Google's use of YouTube content is strictly in accordance with our terms of service and individual contracts with creators and uploaders on our platform,” Mohan told Bloomberg's Emily Chang in an interview, portions of which were first published in April.
“Many creators have entered into various types of license agreements for the content on our platform, as do many rights holders,” he added.
Essentially, YouTube's CEO sounds like he's saying that any AI training the company is doing on YouTube content, whether it's scraping video titles, transcripts, or even the video content itself, is done in a way that respects the terms that content creators have agreed to.
“Some content on YouTube may be used in these models, but it will be consistent with the terms of service and contracts that creators sign before uploading their content to YouTube,” Mohan said.
As The New York Times reported, Google may not be the only company turning to YouTube for AI training data.
In a March interview with The Wall Street Journal, OpenAI CTO Mira Murati was asked whether the company's AI text-to-video generation tool, Sora, was trained on YouTube content. “The reality is, I don't know much about that,” Murati said.
Mohan said that depending on what data OpenAI collects, it could violate YouTube's terms of service.
“Our terms of service allow some YouTube content to be scraped, such as video titles, channel names and creator names, because doing so makes that content visible on the open web and visible and available to other search engines and the like,” Mohan told Chan.
“However, downloading transcripts, video bits etc. is not permitted and is a clear violation of our terms of use,” he said.
YouTube's Terms of Service define Content as “video, audio (such as music and other sounds), graphics, photos, text (such as comments and scripts), branding (such as trade names, trademarks, service marks and logos), interactive features, software, metrics and other materials provided by users, YouTube, or third parties.”
The terms also state that uploaders “retain their ownership rights in their content” but “grant certain rights to YouTube and other users of the service.”
The terms of service state, “By submitting content to the Service, you grant YouTube a worldwide, non-exclusive, royalty-free, sublicensable, and transferable license to use (including copying, distributing, preparing derivative works from, displaying and performing) that content in connection with the Service and YouTube's (and its successors' and affiliates') business, including for the purpose of promoting and redistributing part or all of the Service.”
So while Mohan's comments and YouTube's terms of service have shed more light on the issue, it's still not entirely clear if and how your average YouTube video could potentially be used by Google for AI training purposes.
Business Insider has reached out to Google to ask whether the company trains its AI models and products, such as its recently announced text-to-video tool Veo, on actual video files from YouTube content. We will update this article if we hear back.
On February 28, Axel Springer, the parent company of Business Insider, along with 31 other media groups, filed a $2.3 billion lawsuit against Google in a Dutch court, alleging damages caused by the company's advertising practices.
