Apple reportedly used late-night TV hosts' videos to train AI without their permission

Artificial intelligence platforms don't work right out of the box. Like a puppy, they need to be trained. This is done by “feeding” the algorithm with data of your choice so that the system can give accurate answers. For example, as we reported in April, Apple is considering paying $50 million to license content from media companies such as NBC News, Condé Nast (a newspaper publisher), and others. trend and New Yorker), IAC (publisher) people, Better Homes and Gardensand The Daily Beast) for AI training.

Today, Apple and other companies use YouTube video content. Training AI models without permission from the creators of these videosThe new report says that a third party created files of closed captions from more than 170,000 videos, including content from longtime tech commentator Marquise Brownlee (MKBHD) and late-night comedians Stephen Colbert and Jimmy Kimmel.

WIRED reported that the subtitles for 173,536 YouTube videos were used by Silicon Valley companies, including Anthropic, Nvidia, Apple, and Salesforce. The downloads were apparently made by a company called EleutherAI, which helps developers train AI models. The aim, reportedly, was to create training materials for smaller developers and academics.

“Tech companies have acted aggressively, and people are concerned about the fact that they haven't had a choice,” Keller said. “I think that's a real problem.” — Amy Keller, partner at law firm DiCello Levitt

However, big companies like Apple used a dataset created by EleutherAI called YouTube Subtitles. This dataset does not contain images, but the plain text of video subtitles. The latter also includes translations into languages such as Japanese, German, and Arabic. YouTube Subtitles contains content from over 12,000 videos, including videos that have been removed from YouTube. One anonymous creator deleted all of his videos that were online and discovered that his work was still included in some AI models.

The problem is that none of the YouTube creators were asked for permission to use the videos they created to train the AI models. Despite lawsuits filed by members of the AI community for using content without permission, companies like Open AI and Meta have defended their actions by arguing that they are backed by the fair use doctrine, which allows for the unauthorized use of copyrighted material under certain circumstances.

Source link