Rockets: A good analogy for AI language models

What does Rocket have to do with large-scale language models?

By now, everyone has seen ChatGPT and experienced its power. Unfortunately, they also experience its drawbacks, such as hallucinations and other unpleasant hiccups. The core technology behind it is very powerful, but to properly control large language models (LLMs) you need to surround them with a collection of other smaller models and integrations.

As a rocket geek and aerospace graduate, I feel that rockets are a good analogy here. Everyone has witnessed a rocket take off and been impressed by its main engine. But what many people don't realize is that there is a small rocket attached to the side of the rocket called a vernier thruster.

These thrusters may seem like a small addition, but they actually provide the rocket with much-needed stability and maneuverability. Without these thrusters, the rocket would not be able to follow a highly controlled trajectory. In fact, without these thrusters, a larger engine would definitely cause the rocket to crash.

The same applies to large language models.

The power to combine models

For years, AI practitioners have developed task-specific machine learning models and chained them together to perform complex language tasks. Moveworks helps you understand what your users are looking for, from language detection, spelling correction, and named entity extraction to key entity identification and statistical grammar models to understand what your users are looking for. leverages several machine learning models to perform unique tasks. . This system is very powerful and works very well.

First, it is very fast and computationally cheap. More importantly, this system is very easy to control. When several different models work together to perform a task, you can observe which parts of this stack fail or perform poorly. This allows you to take advantage of the system and influence its behavior. However, it is a complex system.

Large language models like OpenAI's GPT-4 are coming.

Introducing GPT-4: A Game Changer

GPT-4 can be controlled through prompts provided in the model.

This means you can give a user a query and ask them to perform various tasks on that query. To do this programmatically, there are tools like Langchain that allow you to build applications around this. So essentially you're going to have one model of her to rule them all.

It's not that fast.

LLMs like GPT-4 currently lack controllability. There is no guarantee or prediction that the model will fill the slots correctly.

Do you understand company-specific terminology well enough to trust? Do you understand when you're hallucinating? Or are you sharing sensitive information with people who shouldn't see it? In all three cases, the answer is no.

At its core, the language model is designed to be a creative engine. These are trained on large datasets from the internet. That is, as a ready-to-use model, it is constrained by the data fed to it. When given prompts based on things for which they have not been trained, they hallucinate or take away their creative freedom to fit the model.

For example, let's say you want to find a phone number for someone in your organization. If you ask ChatGPT for the phone number of Larry the Accountant, chances are they'll spit out her convincing 10-digit number. However, if the model is not trained on that information, it is impossible for the model to provide accurate responses.

The same goes for organization-specific languages. Conference room names are a great example. Let's say your Toronto office has a conference room named Elvis Presley, but you don't know where it is. If you ask ChatGPT where Elvis Presley is, instead of showing you a map of his Toronto office, they might tell you he's six feet underground.

Furthermore, based on the size of the prompt, GPT-4 calls are expensive and have much higher latency. Therefore, it can be cost-prohibitive if used without care.

Controlling LLM power

Similar to rockets, LLM-based systems have a primary engine, a GPT class model that provides superior capabilities. However, to harness this power effectively, we need to surround them with what we like to call our version of vernier thrusters: a collection of small models and integrations that provide the necessary control and verifiability .

To avoid misleading and dangerous outputs, models must access company-specific data sources such as HRIS systems and knowledge bases. You can then build a “vernier thruster” by fine-tuning the model with internal documentation, chaining model APIs with data lookups, and integrating existing security and permission settings. This is a type of technology that is considered search enhancement. Healing enhancements do not eliminate hallucinations. Therefore, you can consider adding a class of models that can verify that the output produced by LLM is based on facts and grounded data.

These complementary models not only monitor the imagination of the core models but also validate the outputs of these models based on the real world in organizational detail.

With the right vernier thrusters in place, companies can launch these powerful rockets from the ground and steer them in the right direction.

About the author

Varun Singh is President and Co-Founder of Moveworks, the leading AI co-pilot for enterprises. Varun oversees Product Management, Product Design, Customer Success, and Professional Services functions, and is committed to delivering the best AI-powered support experience to businesses around the world. He holds his Ph.D. He holds a PhD in engineering and design optimization from the University of Maryland, College Park and a master's degree in engineering and applied mathematics from UCLA.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Source link