Keep large language models off track

The vibrant and exciting days of ChatGPT and other generative AI and large language models (LLMs) have given way to an understanding that companies need a firm grasp on how these models are being used operationally. starting. Privacy, security, legal, and other issues may arise in the future.

Microsoft’s aggressive push of OpenAI’s various products into its portfolio of enterprise and consumer products and services, accelerating the pace at which Google and others are developing such technology; Given , organizations need to get ahead of it or put down potentially dangerous fires.

They are now getting help to do this.

CalypsoAI, founded by people who worked for NASA, DARPA, the Pentagon, and others to develop tools to help businesses and governments use AI more safely in their environments, announced a moderator product last week. bottom. Give your organization control over the LLMs it brings into the environment.

Its purpose is to enable businesses to enjoy the benefits LLM provides while managing risks such as data exfiltration, malicious information (also known as “hallucinations”), malware, spyware, and lack of auditability. is. Improperly managed generative AI tools can expose your company’s data or let the bad guys infiltrate.

CalypsoAI CEO Neil Serebryany said in a statement:

Nvidia, a long-time proponent and enabler of AI in the enterprise, is an open-source toolkit that essentially monitors conversations (or chats) between users and AI applications to ensure that the rules they create are followed. Deployed a NeMo guardrail. Do not enter areas that could expose confidential information, pave the way for cyberattacks, or violate Company Code.

The Python-based dialog engine prevents LLM from executing malicious code or calling external applications that pose security risks, ensuring that responses from LLM are factual and based on trusted sources. can be used to verify that Hackers break into AI systems and enter false, inappropriate, and biased information.

“The concept of AI safety has obviously gotten a lot of attention lately. So hallucinations, toxic content generation, misinformation, etc. can all be used to steer conversations away from or toward certain types of reactions. ,” Jonathan Cohen, vice president of applied research, said in a virtual press and analyst briefing. “The concept of security is becoming more and more important as large language models can connect to third-party APIs and applications. This can become a very attractive surface for cybersecurity threats. Whenever you allow a language model to actually perform some action in the world, it monitors the requests that are being sent to it and what it is doing in response, and It should provide a place to implement all kinds of checks, showing different kinds of attacks and security threats.”

Guardrail “monitors both sides of the conversation. what were you talking about) and provide a programmable way for developers to implement all these different types of guardrails.

Enterprises are aware that developers may expose sensitive information when developing AI apps, and LLM-based services (such as those used by HR departments to answer employee benefits questions) , are concerned that they may do the same during conversations with employees.

Nvidia guardrails (such as those lined up along highways) are designed to keep chats out of dangerous areas. Guardrails is part of his Nvidia’s NeMo framework, which is itself part of his Nvidia AI platform, for building, customizing, and deploying generative AI models that can have billions of parameters. A cloud-native tool for

Guardrails allow developers to create programmable rules for interactions between users and AI apps. It supports LangChain, a collection of toolkits containing templates and patterns that connect LLMs, APIs, and other software. Guardrails add another layer of security and reliability. A guardrail is placed between the user and her AI app, or between the user and her LangChain (between the guardrail and the app).

Nvidia’s tools are built on Colang, a conversational AI modeling language that comes with the runtime and is designed to use natural language to define chatbot behavior. Developers can create guardrails by defining flows in Colang files. Guardrails include canonical forms (to determine the topic of conversation and match it to rules), messages (to categorize the user’s intentions), and flows (messages and actions between the user and her AI app). included.

Guardrails are classified into three areas. Topical guardrails aim to keep conversations on topic, safety guardrails protect against misinformation and inappropriate content, and security guardrails prevent LLMs from executing malicious code.

The toolkit uses LLM itself.

“The simplest fact check is to ask another large language model,” says Cohen. “This language model produced this result based on this data. Is it factually accurate? Because you can have very general language models, there is also a lot of value in training language models with large amounts of data on very specific tasks, many examples There’s a lot of evidence, and the community has a lot of evidence, that if you use and tweak these models, they actually perform much better, and that’s the concept here. The possibility of calling the language model rather than forcing it on someone via prompt engineering in a general way to think, encouraging the language model to fact-check, stay on the rails, avoid certain topics, etc. You can actually have another system with

Having specialized models is also more efficient. The Guardrails engine that monitors conversations is relatively inexpensive to compute, but requires the use of LLM for checking. In fact, he said, a model designed specifically for checking could be less expensive than his LLM in general.

In terms of hardware, the Colang runtime runs on the CPU and is connected to LLM in various cases. The hardware companies need for their Guardrails will vary depending on the service they’re calling, he said. Basically anything LangChain supports will work with Nvidia.

Cohen was asked why Nvidia would use a separate entity like Guardrails instead of simply including various conversational parameters in LLM training. Cohen admitted that a method called “following instructions” or “human alignment” might work.

“But when you actually use a language model, you really need multiple layers of responsibility,” he said. “The guardrail system sits on top of the language model and is just another check to see if you can create a rule that says, ‘If someone says something insulting, do this.’ You can get a little bit more confidence by layering in. While it’s important to train a language model to respond in a certain way, it’s important to have a programmable system that allows you to explicitly write the rules you want. There is a lot of value in , you can change it dynamically.”

Guardrails is now available on NeMo available on GitHub. It is also available as part of Nvidia’s AI Foundations family of enterprise-grade generative AI cloud services.

Source link