With the growing popularity of generative artificial intelligence (AI), companies are considering foundational models (FM) and realizing the immediate benefits it can bring to their businesses. FM is a large-scale machine learning model that is pre-trained on vast amounts of data and can perform many tasks such as generating text, code, and images. As more companies train their own models, fine-tune FM, and deploy applications that leverage these models, there is a need to operationalize the modeling process and follow best practices to optimize speed, cost, and quality. Sex has become a priority.
Large-scale language models (LLMs) are a class of FMs that focus on language-based tasks such as summarization, text generation, classification, and Q&A. Large Language Model Operations (LLMOps), a subset of Foundation Model Operations (FMOps), focuses on the processes, techniques, and best practices used to manage LLM operations. LLMOps improves model development efficiency and enables scalability for managing multiple models. FMOps originates from the concept of machine learning operations (MLOps), which is the combination of people, processes, and technology to efficiently deliver machine learning solutions to production. Adopt MLOps methodologies and add the skills, processes, and technology needed to operationalize generative AI models and applications.
This is part 1 of a 3-part blog series that details LLMOps for the gaming industry. Learn more about the use cases, services, and code required to implement LLMOps on AWS. This blog provides an introduction to LLMOps, high-level solution design, and specific use cases in the gaming industry.
Examples of using LLMOps in games
Let's take a look at some use cases in the gaming industry to see how customers are leveraging Generative AI to improve developer efficiency and quality of gameplay.
Unique non-player character (NPC) dialogue
Game replays are important for players to return to the game and not quit if they need to replay a certain section, for example because they keep losing against difficult enemies. You can increase player satisfaction by creating a unique experience every time the player interacts with her NPC or watches a cut scene. Creating an NPC backed by LLM allows you to uniquely generate her NPC responses during each interaction while maintaining the necessary lore and information needed for presentation.
Scripting efficiency
NPCs may be intended to say a specific script, and uniqueness is not a requirement. In this case, LLM can help improve the efficiency of scripting. By providing your model with your game's lore, setting, and expected results, you can quickly generate scripts for different NPC personas. This makes screenwriters more efficient and gives them time to create and explore new characters and ideas.
Chat and voice moderation and toxicity detection
Online games that offer chat and audio services face challenges in maintaining gaming communities where friendly banter and vulgar conversation are allowed, but inappropriate language is not. You can build a moderation workflow to analyze reported player chat and audio to determine if the player's language fits the game publisher's guidelines. LLM can be used as an evaluation agent to understand the player's language context and decide whether action should be taken.
Design patterns for model customization
Generative AI applications are needed for enterprises to take advantage of these use cases. The core of a generative AI application is one or more models. The underlying model can be used as-is in the application, and extensive rapid engineering can yield high-quality and acceptable results. However, most use cases can benefit from model customization. Customization can be done in a variety of ways, as described in the next section.
How to customize the model
Fine-tune – Modify the underlying model using your own data. This process changes the parameters of the model and requires a large amount of up-front computational power, but it allows the FM to be trained to perform tasks that it was previously unable to do.
Pre-training – Train your model from scratch using your own repository of unlabeled data. This allows for the highest level of control and customization. However, it requires huge amounts of data (often terabytes), deep knowledge of machine learning, and large amounts of computing power. Model training should be used for use cases where there is no FM that can be fine-tuned to perform the task.
Search Extension Generation (RAG) – This is an alternative way to fine-tune the model without changing model parameters. Instead, the domain data is converted to vector embeddings, indexed in a vector database, a prompt embedding similarity search is performed against the index, and the resulting data is served as context within the prompt.
Customization choices depend on your use case. Fine-tuning is great for domain adaptation. For example, you can use the lore of the game used as the basis for the NPCs to adjust the model and leverage different prompt templates for each NPC. RAG performs better for use cases where situational knowledge and verifiable responses are more important, such as writing scripts for different character personas. These scripts can change at a more frequent pace, allowing the database to be continually reindexed as data changes, and the scripts can be updated more frequently. RAG is particularly popular with game studios for protecting intellectual property and securing game-specific data. RAG allows you to provide secure and controlled access to FM without incorporating data directly into the model through a retraining or fine-tuning process.
Regardless of the type of customization, the LLMOps pipeline for processing changes to the model and changes to the entire application speeds up iteration cycles.
Overview of LLMOps
LLMOps includes three main phases: continuous integration (CI), continuous deployment (CD), and continuous tuning (CT).
CI consists of merging all working copies of an application's code into one version and running system and unit tests on it. When using LLM or other FMs, unit tests often require manually testing the output of the model. For example, for an NPC supported by LLM, the test consists of asking his NPC questions about the background, other characters in the game, and the setting.
CD consists of deploying the application infrastructure and model into a specified environment after the performance and quality of the model has been evaluated through metric-based evaluation or human involvement. A common pattern consists of deploying to a development environment and a quality assurance (QA) environment before deploying to a production environment (PROD). By placing manual approval between deployments in the QA and PROD environments, you can ensure that new models are tested in QA before deploying to PROD.
CT, as described in the previous section, is the process of using additional data to fine-tune the underlying model, update model parameters, and optimize and create new versions of the model. This process typically consists of data preprocessing, model tuning, model evaluation, and model registration. Once the model is saved to the model registry, you can review and approve the deployment.
LLMOps on AWS
The following diagram shows an LLMOps solution on AWS.

At a high level, this architecture deploys the following infrastructure:
In the second blog post of this three-part series, we dive deeper into how architecture and services work together.
introduction
Below are several ways to deploy this solution on AWS.
- This architecture is used as the backbone for building dynamic non-player character dialogs on AWS. The Github repository explains how to deploy the solution.
- The workshop “Operationalizing Generative AI Applications with LLMOps” provides step-by-step instructions for learning and deploying LLMOps on AWS.
conclusion
Many game companies now spend a lot of effort writing scripts for NPCs, mapping different script scenarios, and reviewing reported players. In this blog, we will discuss how his LLM in Games and Game Backends can create unique player experiences and reduce development time on manual tasks to help businesses create the best gaming experiences for their customers. I thought about how I could concentrate. LLMOps is the backbone for ensuring you have an operational platform for tuning and managing your models at scale.
In Part 2, we will explore the above architecture in detail and explain how the services work together to create a solution that enables you to manage the AI applications you generate across AWS Regions and accounts.
