Large language models are not a silver bullet for conversational AI

Machine learning Large Language Models (LLMs) such as ChatGPT, GPT3, and BERT have recently captured the world’s attention. And for good reason.

Simply put, LLMs are artificial intelligence (AI) tools that read, summarize, translate, and generate text. They can predict the next word in a sentence with a high degree of confidence and can generate language that mimics the way humans speak and write. In fact, these models are so advanced that some even question their ability to achieve sensation.

However, while it is no secret that LLM has become an important foundation for conversational AI systems, many people mistakenly assume that LLM will eventually be the silver bullet to solve all conversational AI problems. but it is not.

There are several reasons.

significant hallucination rate

LLM-trained conversational systems are known to produce statements that are not based on any source content, or worse, contradict the source content. This is a phenomenon known as hallucinations.

Hallucinations are present in all currently available LLMs, although each model has a different rate. GPT3, the largest currently available LLM, has a hallucination rate of 41%. This means that about every 2.5 prompts you get a hallucinogenic response.

For example, if you ask a model if 1-800-APLCARE is the actual support line, they will say, “1-800-APLCARE is not the actual Apple support line. Apple provides support through its website, phone numbers, and online forums. but 1-800-APLCARE is not one of them.” In fact, according to Apple’s website, 1-800-APLCARE is the actual support line. In this example, the generated text is effectively incorrect.

Even quoting from various sources does not solve the problem. This is because models often take sentences out of context and restructure them into paragraphs to produce inaccurate answers. For example, you can ask your model, “How much will AWS charge for a g4dn.16xlarge GPU instance?” “A g4dn.16xlarge GPU instance on AWS costs $0.526 per hour.” In this case, g4dn.xlarge pricing is ported to gfdn.16xlarge.

These subtleties of imprecision are fine in these examples, but they have broader implications when dealing with more sensitive subjects. In that study, OpenAI said hallucinations pose a very real threat when LLM is used for real-world applications. For example, answering employee questions in a business environment or providing automated patient support in a healthcare environment. It is also expected that the incidence of hallucinations will improve in the future, but currently there are no mechanisms to confirm this fact. Therefore, to the untrained eye, these hallucinations can very well appear real.

The problem is that LLM is a black box with little explainability. But to create a conversational AI system that produces reliable truthful responses, we need to add layers of algorithms that help ensure predictability.

lack of control

LLM is not built like traditional systems like Google. Traditional systems are built with hundreds of algorithmic layers that ultimately need to be interconnected. LLM is very powerful because it provides an out-of-the-box end-to-end system that essentially fuses these layers.

On the one hand, this significantly reduces the time required to build and train complex systems. However, it is also very restrictive as it offers little control. This means that there is no way to manipulate the model to produce responses that exceed the data it was trained on.

For example, let’s say you want to use conversational AI to support your employees. Your employee may ask where a particular conference room named “Elvis Presley” is located. Without the additional control required for such a custom use case, the model would spew nonsensical responses based on the data supplied about the entity “Elvis Presley”. In this context, domain-specific context is important for generating meaningful and actionable responses.

old knowledge

LLMs such as ChatGPT and GPT3 are trained to memorize knowledge and reason at once. But the knowledge that LLMs are trained in can quickly become obsolete, especially in the enterprise domain. This is because knowledge is fluid and the amount of data is increasing year by year. As a result, the model’s response based on the current dataset is inaccurate.

For example, a model trained on 2020 data is unaware of recent developments. For example, the James Webb Space Telescope revealed the universe in a way never before seen by the human eye. Instead, James Webb would say it’s still in development and doesn’t recognize the success of the past year.

The aforementioned lack of control makes it difficult to separate this old knowledge from the rest of the model’s data. Also, there is no obvious mechanism for overriding the knowledge base to tell the model the most appropriate answer for a given prompt.

Also, retraining LLM requires a large amount of computational resources, which costs money every time a model needs to be retrained. For enterprise applications such as chatbots for customers and employees, this is neither practical nor effective.

For enterprise LLM applications like this, it’s important that the model is alive and breathing. This means that we always capture and deliver the latest information.

So what is an LLM good for today?

The excitement around large language models is similar to that seen in the early days of computer vision. When AlexNet first came out, many were quick to say that computer vision was “solved,” but it really wasn’t. Much innovation is still needed to turn such a powerful technology into a viable product that actually solves everyday problems.

Similarly, although LLM provides a new frontier for building conversational AI use cases, it was never intended to be a panacea for conversational AI problems.

Instead, a large amount of additional innovation is required if companies want to produce meaningful results.

For example, you can use LLM as a starting point for conversational AI systems built for customer support. It does a great job in terms of understanding and interpreting language, but it requires building a custom he algorithm that can understand context, identify domain-specific language, and take the desired action from user interactions. I have.

How does an LLM application like ChatGPT change this?

The development of ChatGPT is a remarkable achievement. The speed of moving from traditional Natural Language Understanding (NLU) techniques to transformation models, LLM, and ChatGPT far exceeds what was expected a few years ago. And with its development, conversational AI became mainstream almost overnight due to its speed and creativity in generating responses to given prompts.

For example, you can ask a customer to draft a thoughtful email thanking them for their transaction, or use it to read a creative bedtime story to your child. It handles these tasks with ease and is sure to impress and entertain those who operate it.

However, ChatGPT is not affected by the above challenges. It still suffers from a hallucination rate of 21%. That’s one in five! Also, in the current interface, ChatGPT is very limited to prompt input and output. So the only way to take advantage of ChatGPT is to use OpenAI’s existing chat functionality.

In reality, ChatGPT’s true potential is still largely unknown. The full power of ChatGPT will be revealed when the model is open to developers around the world to leverage and innovate on. Similar to his traditional LLM, with the added layer of controllability, his ChatGPT can be used by companies to create custom conversational AI use cases never seen before.

But make no mistake, applications like LLM and ChatGPT require significant innovation on top of existing LLM systems if we want to create meaningful output. This also means that companies not designed to leverage LLMs will need to recalibrate their machine learning strategies to include his LLMs and adopt LLM applications. Otherwise, you will quickly fall behind.

Jiang Chen is VP of Machine Learning and co-founder of Moveworks. With over 10 years of engineering experience at Google, AirBnB, and Yahoo!, Chen has built groundbreaking information retrieval systems that leverage machine learning to deliver…

Large language models are not a silver bullet for conversational AI

significant hallucination rate

lack of control

old knowledge

So what is an LLM good for today?

How does an LLM application like ChatGPT change this?

Leave a Reply

RECENT POSTS

Pentagon announces contracts with seven AI companies for sensitive systems | US and Israel’s war against Iran News

Forget Big Tech: Small businesses will employ nearly 1 million new graduates in 2026

No, AI is not coming to your demand planning job – Demand Planning, S&OP/IBP, Supply Planning, Business Forecasting Blog

significant hallucination rate

lack of control

old knowledge

So what is an LLM good for today?

How does an LLM application like ChatGPT change this?

Related Posts

Leave a Reply