How Dow Jones is building a framework to address AI copyright challenges

Tracey Mabry, general manager of risk, research and AI propositions at Dow Jones

In uncertain times, trusted news and journalism is valuable as readers rely on quality content to inform their daily decisions in business, finance and life — and generative AI may be the biggest uncertainty we've seen since the dawn of the internet more than 35 years ago.

The UK lags behind other markets, such as the EU, which have already passed AI laws, so the new government will need to introduce new rules — a process that could take years, if the time it took to pass the Online Safety Act is any indication.

Meanwhile, the responsibility lies with all industry participants to define an approach that puts journalistic integrity first and ensures that intellectual property rights are protected.

This means that all stakeholders – journalists, publishers, aggregators and governments – need to work together today to build a healthy media ecosystem for tomorrow. Dow Jones believes this both as a news publisher and as an arbiter for publishers around the world through our Factiva platform.

Start with transparency

Now in its second year, GenAI's potential to be a force for good is clearer. But concerns remain, including misinformation and illusions due to opaque sources and a lack of traceability. With so much at stake — informing important business decisions — we believe transparency is essential to integrating news data into AI models.

To reap the benefits that GenAI technology can bring, we need to be able to trace the origin of information down to the sentence level.

This level of traceability is particularly important for AI systems used in highly regulated environments, such as financial services and risk management, where decisions may have legal effects.

Dow Jones has established a clear audit trail for the content and summaries that are generated. The first step in this process is to ensure that the underlying data is properly tagged. Our Factiva archive adds 600,000 articles every day. Each of these articles is tagged with over 3,000 different identifiers, including company names, personal names, content subjects, and keywords. The resulting metadata allows us to pinpoint exactly when, where, and how often each piece of content appears in AI query results and summaries. This is critical because it lets publishers know how their content is being used in searches and queries and who is looking at it.

Transparent tagging and auditing is also important when considering usage rights and restrictions that publishers place on their content – for example, an organization may restrict access to content in certain jurisdictions or grant rights for internal use only.

These rights and restrictions cannot be enforced if they are not applied to content from the data repository to the final deliverable, such as as part of an abstract to an end user, which is why it is critical that publishers undertake tagging and auditing as part of their agreements with AI platforms and any kind of content aggregation platform that may grant licenses on behalf of publishers.

It also means that users, including those working in highly regulated sectors like risk management and financial services, can trace their AI summaries to accurate and legitimate sources. For publishers, it's easier to understand who their key audiences are and how they can best serve them. This transparency also lays the foundation for fair and appropriate content usage and remuneration.

A sustainable compensation model

Producing quality journalistic content takes time, money, and often involves great risk. It makes sense that these efforts should be rewarded in a sustainable way.

Previously, rewards were based on search result rankings or clicks. But in the age of GenAI, news and information are often integrated with other content sources. This changes the rules of engagement between publishers and platforms. You need to think about where your content appears in the query results or how much of your content is used, from a single sentence to a full article.

One of the key principles we implemented in our proprietary model is to ensure that search results are not influenced by advertising. In traditional online search, it is easy for end users to guess which results are shown based on relevance and which are sponsored. However, with AI summarization, this is much more difficult.

We developed the framework with legal and regulatory experts, which means that both our own intellectual property and that of publishers across the Factiva ecosystem are appropriately protected. As the field continues to develop, we will explore new ways to engage with publishers and other AI platforms. We will also continue to promote strengthened contractual license documentation, transparency, and compliance with changing regulations to ensure fair use and compensation.

As publishers know from past negotiations with online search platforms and regulators over adequate compensation for content, this can be a lengthy and timely process. Australia passed legislation in 2021 requiring online search companies to adequately compensate publishers, followed by Canada and the EU, with the UK finally passing a similar law in May this year. When it comes to fair and consistent global rules on AI, we cannot wait that long. We also need to avoid a situation where small publishers feel forced to enter into contracts that do not adequately protect their copyright and guarantee appropriate use.

We are already working with many publishers in the Factiva ecosystem who want to leverage our collective bargaining power, and we will continue to evolve our content licensing and compensation framework to ensure everyone has access to the resources they need to protect creativity and news.

Compliance and Advocacy

As a publisher, we have a dual responsibility. In addition to our primary role of reporting the news and providing informative content, Dow Jones must also act as a compliance officer. It is our responsibility to proactively monitor and adapt to the ever-changing global regulatory landscape and ensure our practices are compliant with copyright laws and regulations, especially given the transformative impact that AI will have on content creation and distribution.

Together with the wider News Corp family, we will continue to advocate for clearer guardrails to protect the creative industries in this new era, and as the UK's recent change of course on building transparency into AI regulation shows, a unified voice from industry peers, policymakers and international organisations is essential to drive change and establish consistent standards for AI infringement.

There are undoubtedly great benefits to using news data within AI systems around the world, but we need to ensure that this new technology strengthens the industry, not weakens it.

Tracey Mabry is general manager of risk, research and AI propositions at Dow Jones.