With the introduction of EU AI law, companies training AI models will need to be careful about the datasets they use: certain copyrighted works may no longer be able to be used to train AI models.
introduction
There are three main aspects of generative AI applications that intersect with copyright protection: (1) machine learning using protected works (Input Side(2) Possibility of protecting works created with the assistance of generative AI (The debate over copyright protection for AI-generated works), and (3) potential infringement by the results of pre-existing works (Output side). This GT Alert focuses on the legal issues surrounding the input side.
On the input side, a key concern is whether publicly available copyrighted works could be used to train AI models used by commercial companies.
Since AI is designed to mirror human intelligence, the input side of AI can be likened to a person reading a book or listening to music. The process of gaining knowledge and inspiration from reading a book or listening to music is seamless and automatic. There is no law that prohibits copying directly or indirectly from the respective books or music and using the increased knowledge and inspiration for commercial purposes.
Compared to the natural way humans acquire knowledge, the people who train AI models have more control over whether and how the AI learns from the data they feed it. Another difference between AI and human intelligence is AI's ability to build using seemingly unlimited amounts of data. As a result, AI has the potential to exponentially accelerate technological processes and innovation.
As a result, a new question surrounding AI regulation is the extent to which machine learning should be restricted in order to respect intellectual property rights.
Machine Learning Based on EU AI Law
The EU AI Law, approved by the EU Council on 21 May 2024, is a first attempt to answer this question. The EU AI Law includes provisions equating “text and data mining” (TDM) with AI/machine learning under the EU Text and Data Mining Directive.1 Therefore, “machine learning” is
- People who program machine learning functions Lawful access Access content for the purpose of extracting text and data.
- The copyright and related rights owners and/or database owners No explicit reservation of extraction Text and data (so-called opt-out mechanism).
The EU AI Law is due to come into force in 2024 and be fully implemented 24 months after that. However, a TDM exception under the EU Text and Data Mining Directive already exists, so the TDM exception for machine learning could already be put into place in anticipation of the interpretation of the EU AI Law.
Opt-out Mechanism
It's not yet clear what a legally valid opt-out request would look like, but various organizations, including Dutch copyright collector Pictoright (photography) and French copyright collector Sacem (music), have drafted general reservation of rights statements that would allow creators to opt out of having their data used to train AI models. Additionally, many websites and social media images now feature similar opt-out statements.
There is no case law or other authoritative documentation determining whether such statements are sufficient to trigger an opt-out standard, but this tendency is likely to intensify now that EU AI Law has been adopted.
summary
Although EU AI law and its TDM exception have not yet been formally applied, AI system providers and developers should consider implementing measures and configurations to avoid infringement claims by rights holders. Here are four further points to consider:
- Obtaining legitimate access to contentDuring the course of reviewing web scraping or pre-built datasets, we check whether the content used for machine learning purposes is subject to access restrictions, such as paywalls or other (technical) restrictions.
- Check for the existence of opt-out reservationsConsider making sure that rightsholders have not reserved the right to make copies for TDM purposes, for example by searching collective rights organizations' websites to see which works contain opt-out criteria.
- Include necessary contractual protectionsWith regard to machine learning, two types of agreements in particular are relevant: (1) agreements with owners of datasets used to train your AI model, and (2) agreements with customers of your AI model. In either case, consider crafting agreements that provide a fair and balanced allocation of liability for inadvertent uses of opted-out copyrighted works.
- Put guardrails around AI models to prevent them from being used for anything other than TDMFor copyrighted content that may be used for TDM purposes, consider implementing technical and organizational restrictions on the use of the content to ensure it is used only for training AI models.
1 Article 52c(1) of the EU AI Law and Article 4 of the EU Text and Data Mining Directive.