French privacy watchdog CNIL has released an action plan on artificial intelligence that indicates where it will focus on generative AI technologies such as OpenAI’s ChatGPT in the coming months.
A dedicated artificial intelligence service has been established within the CNIL, working to study the details of the technology and make recommendations on “privacy-friendly AI systems.”
A key goal stated by regulators is to lead the development of AI that “respects personal data”, including by developing means to: Audit and control AI systems to “protect people.”
Understanding how AI systems affect people is another major focus, alongside supporting innovative players in the local AI ecosystem who apply CNIL best practices.
“The CNIL wants to establish clear rules to protect the personal data of European citizens in order to contribute to the development of privacy-friendly AI systems,” it reads.
Hardly a day goes by without a high-profile phone call from a technologist asking regulators to get their hands on AI. And just yesterday, while testifying before the U.S. Senate, OpenAI CEO Sam Altman called on lawmakers to regulate the technology, proposing a licensing and testing regime.
But European data protection regulators are already well advanced, and products like Clearview AI, for example, are already widely sanctioned across the region for misusing people’s data. Meanwhile, Replika, an AI chatbot, recently faced legal restrictions in Italy.
OpenAI’s ChatGPT also underwent a very public intervention by the Italian DPA at the end of March, resulting in the company rushing to new disclosures and controls for its users, giving users some control over how their information is used. restrictions can now be applied.
At the same time, EU lawmakers are in the process of agreeing on a risk-based framework to regulate AI applications proposed by the EU in April 2021.
This framework, the EU AI law, could be adopted by the end of the year, and the planned regulation is another reason for the CNIL to emphasize the preparation of an AI Action Plan, with this work “preparing for the start of application”. The content of the draft European AI regulation currently under discussion. ”
Existing data protection authorities (DPAs) are likely to play a role in enforcing AI laws, and building understanding and expertise in AI by regulators will be critical for the regime to work effectively. . The themes and details that the EU DPA will focus on will weigh heavily on the operational parameters of future AI, but certainly in Europe, and given how advanced the EU is in developing digital rules, the potential The same is true in more distant areas.
Data scraping in frames
Regarding generative AI, the French privacy regulator has decreed that certain AI modelers can collect data from the internet to train AI systems, such as large-scale language models (LLMs) that can parse natural language. We pay special attention to the act of building the dataset. It acquires language and reacts to communication in a human-like way.
The company says a priority area for its AI services will be “protecting data exposed on the web from being scraped and attacked.” sharpen, Harnessing Data for Tool Design”.
This is an uncomfortable area for LLM authors like ChatGPT, who rely on silently scraping vast amounts of web data and reusing it as training material. Companies that hide web information, including personal data, face specific legal challenges in Europe. In Europe, the General Data Protection Regulation (GDPR), applicable since May 2018, requires a legal basis for such processing.
While the GDPR provides a number of legal bases, the options available for technologies like ChatGPT are limited.
In the view of the Italian DPA, there are only two possibilities. consent or legitimate interest. Also, since OpenAI did not ask individual web users for permission before ingesting the data, the company now bases its processing on legitimate interest claims in Italy. allegations under ongoing investigation by local regulators; Galante. (Note: GDPR penalties, in addition to rectification orders, could extend to his 4% of annual global turnover.)
Pan-EU regulations contain further requirements for entities that process personal data. For example, processing must be fair and transparent. Tools like ChatGPT therefore have additional legal challenges to avoid breaking the law.
And most notably, France’s CNIL emphasizes “fairness and transparency in data processing” in its action plan. [AI tools]says it is a particularly interesting issue for the company’s artificial intelligence services and another internal unit, the CNIL Digital Innovation Lab, to prioritize scrutiny in the coming months.
Other areas of priority manifested in AI scope CNIL flags are:
- Protection of data transmitted when users use these tools. This ranges from collection (via interfaces) to reuse and processing by machine learning algorithms.
- Implications for individual rights to data, both in relation to data collected for training models and data provided by systems, such as content created in the case of generative AI.
- Protection against prejudice and discrimination that may arise.
- These tools present unprecedented security challenges.
Altman, who testified before a U.S. Senate committee yesterday, was questioned by U.S. lawmakers about the company’s approach to protecting privacy, and OpenAI’s CEO said he would only refer to information actively provided by AI chatbot users. I tried to frame this topic in a narrow sense. For example, he pointed out: , ChatGPT allows users to specify that they do not want their conversation history to be used as training her data. (However, it is a feature that was not originally provided.)
Asked what specific steps are being taken to protect privacy, Altman told a Senate committee: Therefore, if you are a business customer of ours and send us your data, we will not give you any training on it… If you use ChatGPT, you can opt out of training on your data. You can also delete your conversation history or your entire account. ”
But it said nothing about the data used to train the model in the first place.
Altman’s narrow framework of what privacy means circumvented the fundamental question of the legality of training data. Call it the “privacy original sin” of generative AI, if you will. But as European regulators push to enforce the region’s existing privacy laws on powerful AI systems, it’s clear that OpenAI and its data-scraping ilk will find it increasingly difficult to ignore the topic. .
In the case of OpenAI, it will continue to apply a patchwork approach to enforcement across Europe, as it does not have an established presence in the region. The GDPR one-stop-shop mechanism does not apply (as it usually does for Big Tech). ) Therefore, the DPA has the power to regulate if his user’s data is processed locally and he considers his rights to be at risk. That is why Italy ramped up its intervention in ChatGPT earlier this year, imposing a cease-and-desist order alongside an investigation into the tool, while French watchdogs only announced an investigation in April following complaints. rice field. (Spain has also said it is investigating the technology, but has also taken no further steps.)
Another difference between the EU DPA is that The CNIL appears concerned about exploring broader issues than Italy’s Tentative List, such as considering how the GDPR’s limited-purpose principle should apply to large-scale language models like ChatGPT. This suggests that if it concludes that the GDPR has been violated, it may eventually order broader operational changes.
“The CNIL will soon submit to the consultation a guide on the rules governing data sharing and reuse,” it said. “This effort involves the issue of reusing data that is freely accessible on the Internet and is currently used to train many AI models. It is related to some of the data processing required for the design.
“We will also continue to work on designing AI systems and building databases for machine learning. Following organized consultations, several publications will be published from the summer of 2023.”
The remaining topics presented by CNIL will be addressed “in phases” through future publications and AI guidance.
Regarding the audit and control of AI systems, French regulators have specified that action this year will focus on three areas. One is adherence to existing positions on the use of “enhanced” video surveillance, which we announced in 2022. Use of AI to combat fraud (e.g. social insurance fraud). And about investigating complaints.
It also acknowledged that it had already received complaints about the legal framework for training and using generative AI, and said it was working to clarify that.
“The CNIL, in particular, has received several complaints against OpenAI, which manages the ChatGPT service, and initiated administrative procedures,” it added, referring to the existence of a recently established dedicated working group within the European Data Protection Authority. The commission will coordinate how various European authorities will approach the regulation of AI chatbots (and produce what it claims to be a “harmonized analysis of data processing implemented by OpenAI tools”). ).
The CNIL has never asked people for permission to use their data, and as a further word of warning to AI system makers who may be hoping for future forgiveness, the CNIL said entities that process personal data have developed, trained , said to pay particular attention to which of the uses is intended. The AI system has the following features:
When it comes to supporting innovative AI players who want to comply with European rules (and values), the CNIL has been running a regulatory sandbox for several years now, allowing AI companies and researchers to leverage the regulatory sandbox to develop AI. We encourage you to work on developing your system. Please check the Personal Data Protection Regulation and contact us (via ia@cnil.fr).
