Editor's Note: This post was AI Decode SeriesWe're demystifying AI by making the technology more accessible, and showcasing new hardware, software, tools, and acceleration for NVIDIA RTX PC and workstation users.
There is a surge in demand for tools that simplify and optimize generative AI development. Applications and customized models based on Search Augmented Generation (RAG) – a technique that uses facts obtained from specified external sources to increase the accuracy and reliability of generative AI models – allow developers to tailor AI models to their specific needs.
Previously this would have required complex setup, but our new tools make it easier than ever.
NVIDIA AI Workbench simplifies AI developer workflows by enabling users to build their own RAG projects and customize models. It is part of the RTX AI Toolkit, a suite of tools and software development kits for customizing, optimizing and deploying AI capabilities, announced at COMPUTEX earlier this month. AI Workbench abstracts the complexity of technical tasks that can daunt experts and deter novices.
What is NVIDIA AI Workbench?
Available at no cost, NVIDIA AI Workbench enables users to develop, experiment, test and prototype AI applications on any GPU system, from laptops and workstations to data centers and the cloud, providing a new approach to creating, using and sharing GPU-accelerated development environments across people and systems.
A simple installation allows users to get started with AI Workbench on a local or remote machine in just a few minutes. Users can then start a new project or clone a project from an example on GitHub. Everything works through GitHub or GitLab, making it easy for users to collaborate and distribute their work. Learn more about getting started with AI Workbench here.
How AI Workbench helps you solve your AI project challenges
Developing AI workloads can be a manual and complex process from the start.
Managing GPU setups, driver updates, and version control incompatibilities is tedious. Reproducing projects across different systems requires multiple manual processes. Inconsistencies when replicating projects, such as data fragmentation and version control issues, can hinder collaboration. Different setup processes, moving credentials and secrets, changing environments, data, models, and file locations can all limit project portability.
AI Workbench makes it easier for data scientists and developers to manage work and collaborate across disparate platforms. It integrates and automates various aspects of the development process, providing the following capabilities:
- Ease of setup: AI Workbench streamlines the process of setting up a GPU-accelerated development environment, even for users with limited technical knowledge.
- Seamless collaboration: AI Workbench integrates with version control and project management tools, such as GitHub and GitLab, reducing friction when collaborating.
- Consistency when scaling from local to cloud: AI Workbench ensures consistency across environments and supports scaling up or down from a local workstation or PC to a data center or cloud.
RAG for documentation, easier than ever
NVIDIA provides sample development workbench projects to help you get started with AI Workbench, such as the Hybrid RAG Workbench project, which runs a custom text-based RAG web application with documents on the user's local workstation, PC, or remote system.
All Workbench projects run inside “containers”, which are pieces of software that contain all the components needed to run an AI application. The Hybrid RAG example pairs a Gradio chat interface frontend on a host machine with a containerized RAG server, a backend that serves user requests and routes queries between a vector database and a selected large-scale language model.
The workbench project supports a variety of LLMs available on NVIDIA's GitHub page, and the hybrid nature of the project allows users to choose where they want to run their inference.

Developers can run embedded models on their host machine and perform inference locally on the Hugging Face text generation inference server, on target cloud resources using NVIDIA inference endpoints such as the NVIDIA API catalog, or using self-hosted microservices such as NVIDIA NIM or third-party services.
The Hybrid RAG Workbench project also includes:
- Performance Metrics: Users can evaluate how RAG-based and non-RAG-based user queries perform in each inference mode. Metrics tracked include retrieval time, time to first token (TTFT), and token velocity.
- Search Transparency: The panel displays precise snippets of text taken from the most contextually relevant content in the Vector database, which are fed into LLM to improve the relevance of responses to users' queries.
- Response customization: The response can be tuned with various parameters, including maximum tokens generated, temperature and frequency penalty.
To get started with this project, simply install AI Workbench on your local system. Hybrid RAG Workbench projects can be brought into your account from GitHub and cloned to your local system.
Further resources are available in the AI Decoded User Guide, plus community members have provided helpful video tutorials, such as this one from Joe Freeman:
Customize, Optimize, Deploy
Developers often try to customize AI models for specific use cases. Fine-tuning, a technique for modifying a model by training it with additional data, is useful for style transfer and changing the model's behavior. AI Workbench also helps with fine-tuning.
The Llama-factory AI Workbench project enables QLoRa, a fine-tuning method that minimizes memory requirements for a wide range of models, and model quantization through a simple graphical user interface. Developers can use public or proprietary datasets to fit their application needs.
Once fine-tuning is complete, you can quantize the model to improve performance and reduce its memory footprint before deploying it to a native Windows application for local inference or to an NVIDIA NIM for cloud inference. A complete walkthrough of this project is available in the NVIDIA RTX AI Toolkit repository.
True Hybrid – Run AI Workloads Anywhere
The Hybrid RAG Workbench project described above is hybrid in more ways than one: in addition to offering a choice of inference modes, the project can run locally on NVIDIA RTX workstations and GeForce RTX PCs, or scale to remote cloud servers and data centers.
The ability to run projects on the system of your choice, without the overhead of setting up infrastructure, extends to all Workbench projects. For detailed examples and instructions on tweaking and customization, see the AI Workbench Quick Start Guide.
Generative AI is transforming games, video conferencing, and interactive experiences of all kinds. To learn more about the latest developments and where they're headed, AI Decode Newsletter.
