Microsoft AI CEO: Fair use of everything on the open web for training

AI For Business


For an AI to write texts, lead ad campaigns, or power a side hustle, it needs training material. ChatGPT needed about 300 billion words to launch, and continues to train itself based on user interactions.

But the humans who create the content the AI ​​is consuming aren't getting credited or compensated. Authors, artists, and news organizations have already filed countless copyright lawsuits against AI giants like OpenAI and Microsoft after discovering that AI bots can speak “all too accurately” about their works, implying that their work was included in the AI's training data.

So when Microsoft AI CEO Mustafa Suleiman was asked at the Aspen Ideas Festival in late June whether AI companies were essentially stealing the world's intellectual property,

Suleyman's answer? Nearly all content on the internet is eligible for AI training, with one exception.

Related: Microsoft-affiliated AI startup sued by world's largest record label

“When it comes to content that's already on the open web, I think since the 1990s the social contract around that content has been that it's fair use,” Suleiman said.

Suleiman said content on the open web can be copied and recreated “by anyone.”

“It was a freeway,” he said. “That was the understanding.”

However, some news sites and publishers ask that you not scrape or crawl their content.

“It's a grey area and I think it will be resolved in court,” Suleiman said.

Mustafa Suleiman. Photo by Stefan Warmuth/Bloomberg via Getty Images

Suleiman is leading Microsoft AI at a time when the company is investing billions of dollars in the technology, and his position on what is and isn't fair use is illustrative of how AI companies defend intellectual property claims in court.

For example, OpenAI reportedly used over 1 million hours of YouTube videos to train ChatGPT. When asked if YouTube or social media videos were used in the creation of OpenAI's video generator, Sora, the company's chief technology officer Mira Murati said, “We used publicly available and licensed data,” without providing further details.

AI seems to be taking over work generated by other AIs, resulting in a decline in the quality of the output. Experts predict that within the next two years, 90% of online content will be generated by AI.

Related: The most downloaded news app in the US may have published dozens of fake articles written by AI



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *