Generative AI, especially its language flavor – ChatGPT is everywhere. Large Language Model (LLM) technology will play an important role in the development of future applications. LLM is very good at understanding language because it has undergone extensive pre-training against its underlying model on trillions of lines of public domain text containing code. Techniques such as supervised fine-tuning and human feedback-based reinforcement learning (RLHF) make these LLMs more efficient at answering specific questions and conversing with users. As we enter the next phase of LLM-powered AI apps, the following key components will be critical to these next-generation applications: The diagram below illustrates this progress, increasing the intelligence and autonomy of your applications as you move up the chain. Let’s take a look at these different levels.

LLM call:
These are direct calls to completion or chat models by LLM providers such as Azure OpenAI, Google PaLM, Amazon Bedrock. These calls have very basic prompts and mostly use LLM’s internal memory to produce output.
example: Ask a basic model like ‘text-davinci’ to ‘tell a joke’. Given little context, the model relies on its internal pretrained memory to figure out the answer (highlighted in green in the diagram below – using Azure OpenAI).

prompt:
The next level of intelligence is adding more context to your prompts. There are prompt engineering techniques that can be applied to LLMs to give them customized responses. For example, when generating an email to a user, the context, past purchases, and behavioral patterns about the user may serve as prompts to better customize the email. Those familiar with ChatGPT will be familiar with the different methods of prompting, including examples that LLM uses to construct responses. Prompt extends his LLM’s internal memory with additional context. An example is shown below.

embedded:
Embedding takes prompts to the next level by looking for context in the knowledge store, retrieving that context and adding it to the prompt. The first step here is to make a large document store containing unstructured text searchable by indexing the text and populating it into a vector database. For this, an embedding model like OpenAI’s ‘ada’ is used, which takes chunks of text and transforms them into n-dimensional vectors. These embeddings capture the context of the text, so similar sentences have embeddings that are close to each other in vector space. When the user enters a query, that query is also converted to embeddings and its vectors are matched against those in the database. So you get the top 5 or 10 matching text chunks for your query that form the context. Queries and context are passed to LLMs to answer questions in a human-like way.
chain:
Chains is currently the most advanced and mature technology and is widely used for building LLM applications. A chain is deterministic when a series of LLM calls are combined and the output from one LLM flows into one or more LLMs. For example, an LLM call can query a SQL database to get a list of a customer’s emails and send that list to another her LLM to generate a personalized email for the customer. . You can integrate these LLM chains into your existing application flows to produce more valuable results. Chaining allows him to enrich his LLM calls with external inputs, such as API calls, or integrate with the Knowledge Graph to provide context. Additionally, multiple LLM providers are available today, including OpenAI, AWS Bedrock, Google PaLM, and MosaicML, so you can combine and chain LLM calls together. For chain elements with limited intelligence you can use lower LLMs like ‘gpt3.5-turbo’, but for more advanced tasks you can use ‘gpt4’. Chain abstracts data, applications, and LLM calls.
Agent:
Agents are the subject of much debate online, especially in that they are Artificial General Intelligence (AGI). Agents plan tasks using advanced LLMs such as ‘gpt4’ or ‘PaLM2’ instead of predefined chains. So when there is a user request, the agent decides which set of tasks to invoke based on the query and builds the chain dynamically. For example, when configuring an agent with a command such as “Notify customers when loan interest rates change due to government regulatory updates”. The agent framework makes LLM calls to determine which steps to execute or chains to build. Here it involves calling an app that scrapes regulated websites to extract the latest his APR rate. An LLM call then searches the database to extract the emails of the affected customers and finally generates an email to notify everyone.
final thoughts
LLM is a highly evolving technology, with better models and applications being announced every week. LLM to agents is the intelligence ladder, building complex autonomous applications as you move up. Better models mean more effective agents, and next-generation applications will leverage these. Only time will tell how advanced next-generation applications will be and what patterns they will follow.
