Dangers of using large language models before they are baked

The Berryville Institute of Machine Learning (BIML) recently joined Accelerate AI 2023 at Calypso AI Conference in Washington DC. The conference was attended by practitioners from government and industry, regulators and academics. I participated in a very timely panel entitled “New Risks and Opportunities for LLMs Post-NATSEC”. That panel spurred this article.

The content of this column was not created by LLM.

Large Language Models (LLMs) are a class of machine learning systems that have taken the world by storm. ChaptGPT (aka GPT-3.5), an LLM created and operated by OpenAI, is one of the most popular of the many generative AI models for text. Other generative models include image generators Dall-E, Midjourney, and Stable Diffusion, and code generator Copilot.

LLM and other generation tools offer incredible opportunities for applications, and the rush to bring such systems to fruition is going full steam ahead. LLM can be a lot of fun, solve hard problems in science, solve nasty problems in knowledge management, create interesting content, and most importantly, bring the world a little closer to artificial general intelligence (AGI). conceals sexuality. A theory of representation that yields breakthroughs in large-scale cognitive science.

But using today’s early LLMs and generative AI tools comes with real risks.

About parrots and pollution

LLM is trained on a large information corpus (300 billion words, mostly collected from the internet) and uses automatic association to predict the next word in a sentence. As such, most scientists agree that LLMs don’t really “understand” or do any real thinking. Rather, they are “probabilistic parrots,” as some researchers call them. LLM still produces impressive text streams in context, in often useful and very surprising ways.

Feedback loops are an important class of problem in all kinds of ML models (and what BIML said a few years ago was “[raw:8:looping]”). Here’s what we said then:

Models confounded by subtle feedback loops. What happens if the data output from a model is later used as input to the same model? Note that it is rumored to have happened with Google Translate. Cheerfulness continued. To this day, Google restricts some translated search results through its own policies..

Imagine what happens when most of what is easily scraped from the internet is generated by AI models of questionable quality and these LLMs start eating their own tails. Talk about information pollution.

Some AI researchers and information security professionals are already deeply concerned about information pollution today, but an endless eruption of information pollution pipelines is no good.

Mansplaining as a Service

LLM is a very good BS artist. These models confidently and regularly express false opinions or alternative facts, often fabricating information in conversation to justify their answers (including pretended scientific references). ). Imagine what it would be like to create new content using average knowledge (with lots of mistakes) gleaned from the internet.

The ultimate “replyer” is not someone who needs to write news stories for us in the future. Facts really matter.

mirroring broken security

LLMs are often trained on data that is biased in many ways. Sexist, racist, xenophobic, and misogyny systems can and will be modeled from datasets first collected when societies were less progressive is. Bias is a serious problem with LLM and may not be corrected with practical band-aids.

trust deepfakes

Deepfakes are a side effect of generative AI and have been discussed for some time. Counterfeiting and impersonation is always a security risk, but here lies the ability to create believable, high-quality counterfeits. These include market shifts, war outbreaks, and growing cultural divides. Please note the source of that video.

Automate everything

Generative models like LLM have the ability to replace many jobs, including low-level white-collar jobs. Millions of people derive great satisfaction from their jobs, but what if all those jobs were arbitrarily replaced by ML systems with low operating costs? You can already see it in content creation jobs for stories and marketing copy. Should LLMs write our articles? Should they practice the law? What about your medical diagnosis?

Use of exploit tools

Generative AI can create fun things. If you’ve never played with ChapGPT, give it a try. I haven’t enjoyed a new technology so much since I got my first Apple ][+in1981orJavaappletsin1995[+in1981orsinceJavaappletcameoutin1995[+を手に入れて以来、または1995年にJavaアプレットが登場して以来、私は新しいテクノロジーをこれほど楽しんだことはありません。[+in1981orsinceJavaappletscameoutin1995

But the power of generative AI can also be used for evil. For example, what about using ML for the task of designing a new virus with the transmissibility of COVID and the delayed onset parameters of rabies? Or the task of making plants out of pollen, which is also a neurotoxin. If you can come up with something, what happens when the real bad guys start brainstorming in ML?

Tools have no intrinsic morals or ethics. Just to be clear, the idea of ”protecting the prompt” is a pretend solution similar to using command-line filtering to protect access to a supercomputer.

Keep information in a broken cup

Data motes are now important. This is because if an ML model can (with the help of humans) get to the data and learn how to train on the data to make a profit, then so be it. This leads to multiple risks that are best resolved by carefully securing datasets, rather than stuffing them into ML models and making them available to the public. In the gold rush world of ML, money is data.

Protecting data is harder than you might think. Consider that the government’s current information classification system cannot always protect sensitive information. Clearly, protecting confidential and confidential information is already a major challenge. So imagine what would happen if you deliberately trained your ML system on sensitive information and launched them into the world.

Extraction and transfer attacks currently exist, so be careful what you pour into your ML cup.

What Should We Do About LLM Risk?

So should there be a magical moratorium on ML research for months, as some technologists have selfishly suggested? What we must do is recognize these risks and face them directly.

Fortunately, an entire industry is built around securing ML systems (I call them Machine Learning Security, or MLsec). Check out what these new hot startups are building to control ML risk. But do so with a skeptical and informed eye.

Source link