MKBHD responds to Apple using its YouTube videos to train AI

a New reports Proof News alleged that Apple, NVIDIA, and other major tech companies were using datasets containing copyrighted IP to train their AI models. That copyrighted IP included transcripts of YouTube videos by prominent creators such as MKBHD, one of the platform's biggest tech reviewers.

of The report cited a study The investigation turned to a dataset called Pile, where the reporters claim to have found transcripts and subtitles for over 170,000 YouTube videos across 40,000 different channels, including videos from creators such as MrBeast, MKBHD, Jimmy Kimmel, Stephen Colbert, and PewDiePie. They also uncovered a statement from the companies that they used the Pile dataset to train their AI models, as the dataset is free and publicly available.

This newly surfaced report raises the question of what happens to AI companies that use datasets that contain copyrighted IP to train their AI models? Who is responsible? The owner of the AI model, or the company that created the dataset? Or both? OpenAI found itself in an AI model vs. copyrighted data predicament a few months ago when Chief Technology Officer (CTO) Mira Murati could not answer whether OpenAI was using YouTube videos to train its AI models.

Following OpenAI's ambiguity regarding training data, YouTube's CEO issued a public warning Harvesting data from YouTube violates their terms of service.

MKBHD later responded to the new reports with a quick one-minute YouTube Short to explain the situation, adding that they pay to have high-quality subtitles created for each of their videos, meaning their content has been “plagiarised” multiple times.

Source link