The Media Industry’s Race To License Content For AI

AI For Business


Many lawsuits that copyright owners have brought against generative AI platforms such as OpenAI are grinding their way through the courts. Meanwhile, in advance of any court decisions, copyright owners have been operating on the assumption that the AI companies will need to take licenses to content to use it for training their models or in their generative outputs. The Copyright Clearance Center’s announcement on Tuesday that it is including some AI rights in its Annual Copyright License for corporations is just the latest in a growing number of initiatives toward this goal.

AI content licensing initiatives abound. More and more media companies have reached license agreements with AI companies individually. Several startups have formed to aggregate content into large collections for AI platforms to license in one-stop shopping arrangements known in the jargon as blanket licenses. There are now so many such startups that last month they formed a trade association—the Dataset Providers Alliance—to organize them for advocacy.

Ironically, the growing volume of all this activity could jeopardize its value for copyright owners and AI platforms alike.

It will take years before the panoply of lawsuits yield any degree of clarity in the legal rules for copyright in the AI age; we’re in the second year of what is typically a decade-long process for copyright laws to adapt to disruptive technologies. One reason for copyright owners to organize now to provide licenses for AI is that—as we’ve learned from analogous situations in the past—both courts and Congress will consider is how easy it is for the AI companies to license content properly in determining whether licensing is required.

The essential problem is that generative AI companies want to train their platforms on “all” content of a given type (text, images, music, video); thus the licensing scenario is akin to that of subscription music services like Spotify or e-book services like Everand (known until recently as Scribd)—both areas where the need to take licenses has been well established. One reason for Spotify’s huge success (239 million subscribers) is that it has gotten licenses to the vast majority of recorded music, while one reason for Everand’s lesser success (1.8 million subscribers) is its limited selection of major-publisher book titles.

Some AI companies have expressed willingness to take licenses if they’re reasonable in cost and easy enough to administer. What they don’t want is to have to take licenses separately from every copyright owner or pay royalties according to hundreds or thousands of different schemes—as services like Spotify and Everand do today. If licensing becomes too fragmented and complicated, then some AI companies may decide to stick with the training methods that many of them have relied on thus far, which rely on content obtained without permission.

Yet the current flurry of AI licensing activity runs the risk of just that happening. The most active area for individual deals right now by far—judging from publicly known deals—is news and journalism. Over the past year, organizations including Vox Media (parent of New York magazine, The Verge, and Eater), News Corp (Wall Street Journal, New York Post, The Times (London)), Dotdash Meredith (People, Entertainment Weekly, InStyle), Time, The Atlantic, Financial Times, and European giants such as Le Monde of France, Axel Springer of Germany, and Prisa Media of Spain have each made licensing deals with OpenAI.

Yet such individual licensing activity doesn’t scale, and the confidential nature of these deals won’t help the industry converge on common licensing models or terms. They may not help convince judges or lawmakers that licensing for AI use is straightforward, and they tend not to represent smaller copyright owners. These deals also usually encompass terms that go well beyond copyright, such as joint product development plans and referral of traffic from AI-generated online content or AI-powered search to copyright owners’ own online properties.

To remedy this fragmentation, there’s a growing list of startups that are attempting to aggregate content into libraries that are licensable for AI use, most focusing on specific types of content. One of these is Calliope Networks, whose CEO Dave Davis comes from the world of Hollywood video licensing; Calliope has aggregated thousands of hours of film and TV content from around the world, including from sources in Latin America and Africa. vAIsual (pronounced “v-eye-sual”) has a library of over 300 million images and videos. GCX (Global Copyright Exchange) licenses a music library that contains over 4.4 million hours of audio, largely from individual musicians.

The latest entrant is Created By Humans, launched last month by Trip Adler, the founder and former CEO of Scribd. Created By Humans has raised $5 million in funding and is initially targeting book content. That’s not to be confused with Human Native AI, a UK-based startup that has raised over $3 million and has not identified a particular type of content as its focus.

In addition to these startups, copyright owners’ trade associations are trying to help organize licensing activities. For example, the News/Media Alliance, the trade association for journalism publishers, has been actively promoting licensing, including voluntary (opt-in) collective licensing for its members. In this world, there is a difference between archive content and fast-breaking news content. Much of the former is available online in some form but some is not. AI engines, particularly those that rely on retrieval-augmented generation (RAG) techniques (where the generative output identifies its sources), may want efficient and assured access to fast-breaking news content to ensure the timeliness and accuracy of their output; this would be available through authorized arrangements with publishers.

There’s decades of precedent for organizing large bodies of content for licensing with new technologies when the law hasn’t caught up with those uses yet. The most notable of these situations came about in the 1970s, when a new disruptive technology for rapid dissemination of content had come into wide use: the photocopier. More and more companies were installing photocopying machines in their offices, and employees would use them to make copies of copyrighted material such as articles in magazines and journals without paying for additional subscriptions.

When Congress revamped U.S. copyright law in 1976, it didn’t make any special provisions in the new law that covered large-scale photocopying. But in the discussions that led up to the new law, Congress did recommend that publishers “work out means by which permissions for uses beyond fair use can be obtained easily, quickly, and at reasonable fees.” As a result, publishers created a collective licensing organization, the Copyright Clearance Center (CCC), to offer blanket licenses to a large body of content, principally from scientific and technical journals.

CCC launched in 1978 when the new copyright law took full effect. It offered its blanket photocopying license to companies. A few took the license, while others claimed that photocopying was fair use and didn’t do so. One such company was Texaco. So in 1985, a group of publishers got together and brought a test case against the petroleum giant. The case, American Geophysical Union v. Texaco, ended up in federal appeals court, and in 1995, the publishers prevailed. The court’s opinion cited the existence of CCC’s blanket license as a factor in its determination that Texaco employees’ photocopying wasn’t fair use. As a result, many more companies opted to take the license, which is now known as the Annual Copyright License—including, today, the vast majority of the Fortune 100.

CCC’s announcement this week states that it has extended the Annual Copyright License for internal AI use within companies that take the license. This required CCC to coordinate among the thousands of publishers whose content is included in the Annual Copyright License. Currently a critical mass of such content is covered in the license, with more to come in the future.

CCC’s license is still limited to licensees’ internal use: for example, a researcher at a pharmaceutics company might use an internal AI-based tool that summarizes a body of articles from scientific journals; the license would cover such use for the content included in the license. This is a direct extension of the types of uses that CCC’s license has traditionally covered.

These histories (and others, such as the history of mechanical licensing of music for streaming services) inform the activity around AI and copyright licensing today. Copyright owners know that the existence of simple and reasonably-priced collective licenses can help convince courts and Congress to favor copyright protection for the use of material in the training and output of generative AI platforms. That’s one reason why they are pushing for such licenses now instead of waiting for courts to decide whether AI platforms must take licenses to material they use for training or in output.

But there are three potential problems. One is that so many individual license deals already exist, including various deals that are not publicly known as well as the ones mentioned here and some others. Content companies have various reasons to push AI companies for licenses in advance of court decisions; for example, Nick Thompson, CEO of The Atlantic, said on The Verge’s Decoder podcast recently that The Atlantic’s deal with OpenAI “provides a way for us to help shape the future of AI … We believe that the odds of [AI] being good for journalism and the kind of work we do at The Atlantic are higher if we participate in it.”

Yet certain aspects of these deals may become obstacles to creating collective licensing schemes that are as comprehensive, efficient, and equitable as they need to be. News/Media Alliance CEO Danielle Coffey says that “while we are encouraged by business arrangements we’ve seen, if not for the very reason that they show [licensing] is possible and that there’s a market in general, one thing to consider is whether current deals reflect fair market value with uncertainty looming.”

The second problem is the proliferation of collective licensing schemes, which are all for-profit startups that presumably prioritize growth over cooperation. Having dozens of purported collective licensing entities could end up being little better than hundreds or thousands of individual deals. Still, CCC’s CEO Tracey Armstrong says that their attitude now is to “let all flowers bloom” and see how the market develops in the coming years. Shakeout and consolidation is inevitable in any hot startup scene.

One possibility is that the independent collective licensing entities end up representing small or individual content owners for specific types of content while the majors in each sector make direct deals. That’s how licensing works now in certain segments of the music industry.

The third problem is that antitrust laws limit the amount of cooperation that can take place among licensors, whether individual companies or collectives. For example, ASCAP and BMI, the two major music performing rights licensing agencies, had to go to the Justice Department to obtain a consent decree to operate what amounted to a duopoly in that market—an arrangement that has been challenged repeatedly over the years.

Previous successful voluntary collective licensing entities managed to avoid these problems. When periodical publishers were looking to form CCC, the good news was that there were no (or perhaps hardly any) existing licensing deals between corporations and periodical publishers that got in the way. The bad news was that the publishers had to get a critical mass of publishers to agree to a simple set of terms; without the critical mass, the blanket license would not cover enough material to be worth taking. This proved to be an onerous cat-herding exercise. Yet CCC managed to do this, and in a way that the Justice Department’s Antitrust Division found acceptable.

The amount of licensing activity for AI that is taking place before any substantial results from the courts or Congress is unprecedented. It dwarfs, for example, the amount of such activity that took place in the music industry in the MP3 era between the Napster litigation in 1999-2001 and the establishment of popular licensed music services such as Apple iTunes.

One reason for that is that it’s much easier technologically to create online licensing hubs now than it was 20 years ago. But another is that there’s more awareness now of the need to establish a market for licensed content for a new technology use than there was then. Developments in these licensing markets are likely to come at a faster pace than court decisions or legislation; the next couple of years are going to be very interesting.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *