Cohere co-founder Aidan Gomez will be attending the Collision Conference in Toronto on June 18th.Chris Young/The Canadian Press
Artificial intelligence (AI) companies Cohere and Google have told the federal government they support a legal exemption that would allow them to build commercial AI models using data without being forced to compensate rights holders or get permission, despite warning that such a requirement would stifle the development of Canada's AI industry.
The AI models that power chatbots, like OpenAI's ChatGPT, are trained on huge amounts of diverse data. To create a coherent text, they use data: the companies pay for some proprietary data, but also use a large amount of material they glean from the internet, including works produced by authors and media organisations.
Canada's Innovation, Science and Economic Development, in collaboration with Canadian Heritage, launched a public consultation last fall to solicit input on possible changes to copyright law in response to the rapid development of generative AI systems that generate text, images, audio and video.
One key question is whether AI companies will have to license copyrighted material when training models for commercial purposes, or whether there will be an explicit exemption in the law. An exemption called fair dealing already exists for the use of IP-protected material in research and education.
The rapid growth of generative AI and legal uncertainty over copyright has led to a number of high-profile lawsuits by authors, artists, and media organizations against technology companies such as OpenAI, Meta Platforms Inc., and Stability AI.
AI developers say they need access to data to build more powerful models, which they argue could generate huge benefits that can help improve work, life and society overall. But creative workers, who feel cheated when their work is used without permission or compensation, are concerned about the impact generative AI will have on their industries.
Cohere, which builds large-scale language models that underpin chatbots and other applications, said in the filing that its AI training does not infringe copyright and therefore does not require a license, and that “commendation is not appropriate,” according to the filing, which was recently posted online.
The Toronto-based company claims its AI models learn concepts and facts by identifying patterns in large amounts of data. “These concepts, facts and patterns are not copyrightable, and therefore copyright law should not be interpreted as preventing the training of AI,” it said.
Microsoft (MSFT-Q), which is investing billions of dollars in OpenAI, has announced that it has signed a deal with Canada Copyright law would also apply to generative AI, and commercial exemptions would further encourage domestic AI development and investment. “Learning from copyrighted works is not copyright infringement, and using AI to read, write, or learn should not require compensation,” the company wrote in its filing.
Other groups, such as the Association of Canadian Publishers, oppose the exemption and instead support licensing agreements that pay creators for the use of their material in training AI. The exemption “deprives rights holders of a substantial source of revenue and potential income,” the association said, adding that a licensing market for generative AI models is already developing.
But Koheer argued that if such a requirement were imposed, rights holders would not be able to benefit from the new revenue stream. AI development would take place outside of Canada, “and it could mean that AI systems will no longer be available in Canada, including systems that are essential to advance healthcare, address the climate crisis and close Canada's productivity gap,” Coheer said.
Google similarly argued that seeking licenses or permissions would be “fundamentally impossible given the vast amount of data required to train AI models and the lack of comprehensive data on copyright ownership,” the company said in its filing. “It would effectively hinder the development and use of large-scale language models and other cutting-edge AI.” Google added that it has introduced tools that allow web publishers to opt out of having their content used to train future AI models.
All three AI companies pointed to Japan as an example that Canada should emulate, as the country has amended its copyright laws to allow training AI on copyrighted material for both commercial and non-commercial purposes.
Groups representing artists and other creators take a starkly different view: “Governments should not create new exemptions to copyright and other intellectual property rights that would allow AI developers to exploit creative works without permission or compensation,” Music Canada, which represents the Canadian subsidiaries of Sony Music, Universal Music and Warner Music, said in a filing.
The industry group warned lawmakers to be wary of the language AI developers use to describe how their systems work, which Music Canada says is an attempt to treat generative AI as already exempt from copyright law. “They may use words like 'learning,' 'transferring,' 'memorizing,' and 'simulating,' instead of words to describe the 'copying' and 'replicating' actions that the systems take to learn,” Music Canada wrote.
The Canadian Civil Liberties Association has similarly advocated for a licensing system that respects copyright holders: “Just because these models require large amounts of data does not mean that this data should be mined with little or no consideration for the creators powering that work,” the association said in a filing.
Many AI developers, including Cohere, have offered to compensate customers who have been sued for copyright infringement. OpenAI, which has been the target of several copyright lawsuits, has been busy signing deals with a number of media organizations and publishers, including News Corp., the Financial Times and the Associated Press, to use their content for training data.
Even these arrangements have come under criticism: AI companies are “pursuing business deals to avoid charges of theft,” Jessica Lessin, founder of tech news site The Information, wrote in The Atlantic last month. “It's still premature to partner with companies that have trained their models on professional content without permission.”
Complicating matters for rights holders is the lack of transparency about what material is actually contained within the vast training datasets, making it extremely difficult to verify whether their creations are being used.
In a filing with the Canadian government, Cohere, Google GOOGL-Q and Microsoft opposed being forced to document and disclose copyrighted material in their training data, saying it would be unfeasible. “It is virtually impossible to identify the copyright status of individual works contained in billions or trillions of datasets,” Cohere wrote.
A spokesman for Innovation Minister François-Philippe Champagne declined to comment on whether the minister would support amending copyright law to allow commercial exemptions.
“As this marketplace continues to evolve, we are committed to fostering a framework that supports creativity and innovation while protecting intellectual property rights,” spokeswoman Audrey Millett said.