Earlier this week, we covered the ethical and infringing concerns raised by the tech community over major companies using YouTube video transcripts to train AI models without the consent of the content creators. OpenAI, Meta, and Google were criticized for violating rules regarding their own use of YouTube videos. Apple also came under fire recently for allegations that it was plagiarizing content from OpenELM models. The Cupertino tech giant responded to the controversy by stating its side of the story and explaining its unethical practices in training LLM models.
Apple has denied speculation about unethical AI practices, saying its intelligence was trained on licensed content, not stolen YouTube videos.
Google, Meta, and OpenAI have come under fire for using subtitles collected from over 170,000 videos of popular YouTubers to train their AI models. Previous reports have pointed out that Apple also uses transcribed YouTube content in its OpenELM models, and that, like other companies, it continues to engage in unethical AI practices. The company has now defended itself and clarified the matter.
As reported by 9toMac, Apple confirmed to the channel that the OpenELM model is not linked to the company's other AI initiatives. Apple Intelligence and its LLM models are trained through licensed data. Apple has explained to users and the tech community that OpenELM is part of a research initiative, and that the company used the Pile dataset to train the open source model. It was created to make the company's open language model development publicly available by making it readily available on Apple's machine learning research site.
Apple further clarified that the OpenELM model released in April has nothing to do with Apple Intelligence or its AI-powered features. The tech giant also stated that it has no plans to release a version of OpenELM, and that this is merely a contribution to research. However, Apple Intelligence claims to rely entirely on ethical practices for training, with millions of dollars paid to publishers and licensing data. Apple detailed this in a research paper, explaining its commitment to responsible AI development.
However, large companies are using YouTube subtitle datasets from non-profit organization EleutherAI to train their AI models, raising serious concerns about permissions, AI ethics, and copyright infringement. While Apple Intelligence is not part of the ongoing controversy, all large companies need to be more transparent about their material extraction techniques and AI training methods to avoid getting caught up in such issues.