The AI Giants win a big win in the copyright battle. This is what happens now.

Big Technology has recently won a major victory in debate over copyright and artificial intelligence.

Everything published online becomes a fair game, eventually being scraped, copied, and caught up in funnels by AI models and chatbots that compete with the creators of the original material.

This is the moment Google, Meta, Openai, Microsoft, Anthropic, and other giants from the Generation AI era are waiting and hoping for. They are much closer to having the legal certainty that they never have to pay for data essential to a blockbuster AI product.

What does this mean for the future of the web and the content creation business? Continue (or wait about an hour for an AI summary from your favorite chatbot).

The big news is: The judge recently determined that humanity's use of millions of books to train AI models is entitled to fair use, a legal doctrine that allows free use of copyrighted content without the owner's permission in certain circumstances. Meta also won a similarly large lawsuit.

“Good news for all Gen AI developers,” writes Adam Eisgrau, senior director of The Chamber of Progress, a lobbying group funded by high-tech giants such as Google, Amazon, Apple and Nvidia. Human decisions “are likely to apply in many cases,” he added.

A sudden drop in written words

The investment banker I spoke to recently summarised the impact of fair use in the age of generative AI.

He's right. If copyrighted content can be scooped up for free and refluxed in slightly different forms in milliseconds, the value of online text – even exclusive “frontier content” will plummet.

The US Copyright Office is the only voice currently on the other side of this debate. We concluded that using copyrighted content for AI violates fair use, as the generated AI is flooded with webs with piles of additional words, images and videos. That extra supply will undermine the market for the original content. The judges seem to have ignored this so far.

One of my previous editors gave this advice when they wanted to write about such issues. No one cares about the media too much. Some may say at dinner parties that they do, but they are not really. These days, this industry is extremely tiny compared to other economies. This editor would say, write about something bigger.

One example: Meta holds approximately $80 billion in cash and marketable securities. That's almost 10 times more total The value of the New York Times. Meta will spend up to $72 billion on CAPEX this year, primarily on AI data center infrastructure. Mark Zuckerberg also offers a $100 million compensation package to try to hire a single AI expert.

Still, Meta does not pay DIME for AI model training content, nor does it pay for using this copyrighted content in the generated AI output. The same goes for Google and most other AI giants.

Why can't machines do the same?

Shortly after ChatGpt was announced in 2022, I first realized that AI models were trained on mountains of copyrighted materials without payment or permission. I brought up this question, and this person answered this argument: humans learn by consuming copyrighted content from the web, books, and other sources. They internalize this information, process it, and often create new ideas and content based on originals they have read in the past. Why can't machines do the same?

This was delivered with such speed and calmness. There was no pause to reflect or think about it. It was as if this big tech company had been preparing for this moment for years. It's the moment when everyone realizes that their work is being used by AI models and chatbots that ultimately compete with them.

This is also overtone in Google's research paper, which launched a generative AI boom. All you need to be aware of is introducing “transformers” to the world. This is a special type of AI model that ingests mountains of content and data to train powerful generative models.

Why did the Googler who wrote this paper come up with the name “Trans”? I don't know, but this phrase addresses fair use questions head on. One test to see if it violates copyright law is whether it has been “converted” the original work sufficiently to avoid infringement. Google came up with the name for this transformer in 2017. This was five years ago when ChatGpt brought this new technology, and this copyright question to the world.

Tech blogger Ben Thompson has a cool mind and knowledgeable view of all of this. He strongly supports the judge's decision in human cases, agreeing that training AI in free books is entitled to fair use, calling it “very important.” AI learning, like human learning, is transformative and does not infringe copyright if the output does not replicate the original material. Copyright laws always have trade-offs to encourage creation without restricting innovation, and fair use exists to balance those benefits.

Warning from the grave

So, what does it flow from the fact that basically online copyrighted content is a fair game that AI companies can use for free?

This is one prediction. This comes from the grave, but also from a deeper layer within Openai, the company behind ChatGpt.

Suchir Balaji was part of the Openai team, which collected data from the Internet for AI model training. He joined the startup in hopes of how AI could help society, but was disillusioned. In November, Balaji was found dead in an apartment in San Francisco. The city's chief inspector determined that the death was a suicide.

Before he passed away, Balaji wrote an essay on his personal website, criticizing AI companies for using public data without compensation, and questioning claims of “fair use.” He argued that this trend poses a threat to the sustainability of the internet by consuming value from the original content source.

Balaji cited a study that found that traffic for coding Q&A sites overflow traffic after the release of CHATGPT reduced Q&A sites overflow traffic by about 12%. Developers who once visited the site to ask questions and answer questions have turned to AI, reducing new sign-ups and community engagement.

This weakens the “Grand Bargain” on the web. Google and other tech giants were craving websites and collecting data without paying. But in return, they were able to send traffic and visitors to the creators of these sites, making money by advertising, subscriptions, selling products and other means. Recently, Big Tech's AI bots are crawling for free and sending far less traffic to creators of original copyrighted content.

Running one of the largest networks on the web, CloudFlare deployed a potential solution on Tuesday. The company has launched a “Payment Per Crawl” service that helps content creators require payments from AI companies to access and use the content.

CloudFlare blocks AI crawlers for new customers by default and creates content access opt-in instead of opt-out. Signed by major publishers including Ziff Davis, The Atlantic and Time. This would like to force large tech companies to pay to cut new digital content for AI development. A startup called Tollbit is trying to do the same.

I don't know if these efforts will be successful. The core point is that humans should be allowed to learn from copyrighted information for free, and perhaps machines should. Reversing this could lead to more problems. As a journalist, can I read Ben Thompson's newsletter and incorporate one of his ideas into future articles? Probably not. Will Thompson be banned from reading a scoop by Business Insider and analyzing new information in one of his amazing newsletters? Is that a good idea? Probably not.

Some predictions

One of the consequences of all this is that in the future, truly valuable information may no longer be placed on the web. Below are three examples that suggest this may already be happening:

Rather than relying on free web distributions, Ben Thompson distributes content via paid newsletters.

Bloomberg operates perhaps the largest newsroom in the West. why? One reason is that its news division is buried in a highly profitable financial data business. Another is that Bloomberg's most valuable news content is featured in the terminal. This is a trading tool used by wealthy investors. This system has its own network. I don't rely much on the web. Bloomberg only publishes the best news content on the web after a long delay. Much of that content only stays in the terminal and doesn't go to the web at all. Surprise: There's plenty of financial newsroom.

Finally, valuable content can only be shared in the future through personal meetings, relationships, or through repeated paper publications. Data is out of the immediate scope of bots that raw technology AI.

A few weeks ago, Microsoft launched a new publication called Signal, which explores the future of AI, society, and business.

It is only published in paper form.

Source link