As we move closer to regulating the creation and use of generative artificial intelligence (AI) technology, Congress should look to the current state, or lack thereof, of data privacy laws in the United States as an important guidepost for determining roadway rules. For AI technical developers who need to train algorithms on large amounts of data.
that’s the point report It was issued May 23 by the Congressional Research Service (CRS), a bipartisan public policy research agency for lawmakers and their staff.
The report provides a primer on generative AI techniques such as Open AI’s ChatGPT, and then points out policy and legislative issues that Congress may consider.
“Generative AI models are receiving significant attention and scrutiny due to their potential harm, including risks associated with privacy, misinformation, copyright and non-consensual sexual images,” the report states. .
“This report focuses on privacy issues and the relevant policy considerations of Congress,” it continued. “Some policy makers and stakeholders have expressed privacy concerns about how personal data is used in the development and deployment of generative models. Nor is it unique to AI, but the scale, scope and capacity of such technology could pose new privacy challenges for Congress.”
On the data side, the report says that generative AI models, especially those built on large scale language models (LLMs), “need a lot of data.”
“For example, OpenAI’s ChatGPT was built on LLM, partially trained on over 45 terabytes of text data retrieved (or “scraped”) from the internet. LLMs were also trained on items from Wikipedia and a corpus of digitized books,” he said. “Open AI’s GPT-3 model was trained on about 300 billion ‘tokens’ (or word fragments) collected from the web and had over 175 billion parameters. is a variable that influences the properties of the model of
“Critics argue that such models rely on privacy-invasive methods to collect large amounts of data, usually without the consent or compensation of the original user, creator, or owner. ,” the report said. “Furthermore, some models may be trained on sensitive data, exposing personal information to users.”
“Academic and industry studies have found that some existing LLMs may reveal sensitive and personal information from their training datasets,” CRS said, adding that “some models are commercially available. It is either used for its intended purpose or embedded in other downstream applications,” he added.
In the face of these data privacy concerns, CRS noted: “There is currently no comprehensive data privacy law in the United States. not.”
The report mentions health-related data, data collected from minors, and various state data laws as impacting generative AI applications.
“In many cases, the collection of personal information usually involves certain state privacy laws that give individuals a ‘right to know’ what companies are collecting about them. how the data is used and shared; “Right to access and delete” data. or right to opt out of data transfers and sales,” the report states. “However, some of these laws include exemptions from public data collection, and how and whether they apply to generative AI tools that use information collected from the internet. doubts may arise about
“In the absence of comprehensive federal data privacy laws, some individuals and entities rely on other legal frameworks (copyright, defamation , publicity rights, etc.),” the report said. .
“Congress may consider enacting a comprehensive federal privacy law that specifically addresses concerns related to generative AI tools,” the CRS said, while lawmakers will seek to regulate data regulation, disclosure, and documentation. It added that it might want to consider a proposed AI law in the European Union that would cover There is a category of AI systems.