Apple has addressed reports that it uses YouTube videos to train AI models, saying that the company used certain datasets to train its open-source OpenELM open language models, but that they are not used for any AI or machine learning features available to customers.
In an explanation provided to 9to5mac, Apple said that the OpenELM model was created as a way to contribute to research, and not to consumer AI applications.
“OpenELM does not support any AI or machine learning features, including Apple Intelligence,” the company told the magazine.
Reports that major tech companies trained AI models with YouTube data
According to a Wired report based on an investigation by Proof News, Apple, along with Nvidia, Salesforce, and Anthropic, is using material from thousands of YouTube videos to train AI models.
“Our research found that subtitles for 173,536 YouTube videos, extracted from over 48,000 channels, were used by major Silicon Valley companies, including Anthropic, Nvidia, Apple, and Salesforce,” the report said.
The dataset used to train the AI reportedly included a wide range of videos, including YouTube subtitles, transcripts from educational platforms like Khan Academy, MIT, and Harvard, and news organizations like The Wall Street Journal, NPR, and BBC. Additionally, content from popular YouTubers like MKBHD, PewDiePie, and MrBeast was also part of the training data, the report claims.
What is OpenELM – Apple’s AI model
Apple announced the OpenELM open language model, which “uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the Transformer model to improve accuracy.”
Shortly after its annual WWDC conference last month, Apple said that its Apple Intelligence models were trained on “licensed data, including selected data to power specific features, as well as public data collected by our web crawlers.”
In an explanation provided to 9to5mac, Apple said that the OpenELM model was created as a way to contribute to research, and not to consumer AI applications.
“OpenELM does not support any AI or machine learning features, including Apple Intelligence,” the company told the magazine.
Reports that major tech companies trained AI models with YouTube data
According to a Wired report based on an investigation by Proof News, Apple, along with Nvidia, Salesforce, and Anthropic, is using material from thousands of YouTube videos to train AI models.
Expanding
“Our research found that subtitles for 173,536 YouTube videos, extracted from over 48,000 channels, were used by major Silicon Valley companies, including Anthropic, Nvidia, Apple, and Salesforce,” the report said.
The dataset used to train the AI reportedly included a wide range of videos, including YouTube subtitles, transcripts from educational platforms like Khan Academy, MIT, and Harvard, and news organizations like The Wall Street Journal, NPR, and BBC. Additionally, content from popular YouTubers like MKBHD, PewDiePie, and MrBeast was also part of the training data, the report claims.
What is OpenELM – Apple’s AI model
Apple announced the OpenELM open language model, which “uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the Transformer model to improve accuracy.”
Shortly after its annual WWDC conference last month, Apple said that its Apple Intelligence models were trained on “licensed data, including selected data to power specific features, as well as public data collected by our web crawlers.”