A Clear Guide to Containerization for AI Work • The Register

Practice One of the biggest pain points associated with AI workloads is managing all the drivers, runtimes, libraries, and other dependencies required to run them.

This is especially true for hardware accelerated tasks: if you have the wrong version of CUDA, ROCm, or PyTorch, you're likely to find yourself scratching your head while staring at errors.

To make matters worse, some AI projects and apps have conflicting dependencies, and different operating systems might not support the required packages. However, by containerizing these environments, you can avoid much of this confusion by building images configured specifically for the task, and more importantly, deploying them in a consistent, repeatable way every time.

Containers are also largely isolated from each other, so you can run apps with competing software stacks – for example, you can run two containers with CUDA 11 and 12 at the same time.

This is one reason why chipmakers often provide users with containerized versions of their high-speed computing software libraries, as this provides a consistent starting point for development.

Prerequisites

This tutorial explores different containerization methods that can help you develop and deploy either CPU- or GPU-accelerated AI workloads.

This guide makes the following assumptions:

I'm running it on Ubuntu 24.04 LTS (it should work on other distributions, but your situation may vary).
You should have the latest release of Docker Engine installed and a basic understanding of container runtimes.
If so, you're running Nvidia's proprietary drivers.

There are many container environments and runtimes, but we'll focus on Docker here due to its simplicity and wide compatibility, although many of the concepts presented here also apply to other containerization runtimes such as Podman, albeit executed slightly differently.

Exposing Intel and AMD GPUs to Docker

Unlike virtual machines, you can hand off a GPU to as many containers as you like and won't run into any issues as long as you don't exceed the available vRAM.

If you are using an Intel or AMD GPU, the process is very simple and just requires passing the appropriate flag when starting the container.

For example, say you want to make an Intel GPU available to an Ubuntu 22.04 container. --device /dev/dri To docker run Command: If you have a bare-metal system with an Intel GPU, run a command similar to the following:

docker run -it --rm --device /dev/dri ubuntu:22.04

On the other hand, for AMD GPUs: --device /dev/kfd

docker run -it --rm --device /dev/kfd --device /device/dri ubuntu:22.04

Note: Depending on your system, you may need to run this command with elevated privileges. sudo docker run Or in some cases doas docker run.

Exposing Nvidia GPU to Docker

If you're running one of the Team Green cards, you'll need to install the Nvidia Container Toolkit before publishing to a Docker container.

First, add the toolkit software repository to your sources list and update Apt. (You can find Nvidia's documentation here for installation instructions on RHEL and SUSE-based distributions.)

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update

Now you can install the container runtime and configure Docker to use it.

sudo apt install -y nvidia-container-toolkit

Once the container toolkit is installed, we need to tell Docker to use the Nvidia runtime. /etc/docker/daemon.json To do this, simply run the following command:

sudo nvidia-ctk runtime configure --runtime=docker

The final step is to restart the docker daemon and start a container to test if everything is working: --gpus=all flag.

sudo systemctl restart docker

docker run -it --rm --gpus=all ubuntu:22.04

Note: If you have multiple GPUs, gpus=1 or gpus '"device=1,3,4"' flag.

In the container, nvidia-smi You should see something similar on your screen.

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 6000 Ada Gene...    Off |   00000000:06:10.0 Off |                  Off |
| 30%   29C    P8              9W /  300W |    8045MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A       941      C   python3                                      7506MiB |
|    0   N/A  N/A     40598      C   /usr/local/bin/python3                        528MiB |
+-----------------------------------------------------------------------------------------+

Use Docker containers as your development environment

One of the best ways to use Docker containers when working with AI software libraries and models is as a development environment, because you can spin up as many containers as you need and destroy them when you're done without worrying about breaking your system.

Now you can launch a base image of your distribution of choice, expose your GPU to it, and start installing CUDA, ROCm, PyTorch, or Tensorflow. For example, to create a basic GPU-accelerated Ubuntu container, run the following ( --gpus or --device Create a container with the appropriate flags set and access it.

docker run -itd --gpus=all -p 8081:80 -v ~/denv:/home/denv --name GPUtainer ubuntu:22.04

docker exec -it GPUtainer /bin/bash

This will create a new Ubuntu 22.04 container named GPUtainer with the following capabilities:

You have access to Nvidia GPUs
Expose port 80 on the container as port 8081 on the host
mount /home/denv As a container denv Create a folder in your host's home directory to easily transfer files
Continue running after it has finished

Using pre-built images

Building containers from scratch using CUDA, ROCm, or OpenVINO can be useful at times, but it can also be tedious and time-consuming, especially when there are pre-built images that do most of the work for you.

For example, if you want to get a basic CUDA 12.5 environment up and running, nvidia/cuda Take the image as a starting point and run it. To test, run:

docker run -it --gpus=all -p 8081:80 -v ~/denv:/home/denv --name CUDAtainer nvidia/cuda:12.5.0-devel-ubuntu22.04

Or, if you have an AMD card, you can use one of the ROCm images such as: ROCm/dev-ubuntu-22.04 One.

docker run -it --device /dev/kfd --device /device/dri -p 8081 -v ~/denv:/home/denv —-name ROCmtainer ROCm/dev-ubuntu-22.04

Meanwhile, Intel GPU owners should be able to create a similar environment using this OpenVINO image.

docker run -it --device /dev/dri:/dev/dri -p 8081 -v ~/denv:/home/denv —-name Vinotainer openvino/ubuntu22_runtime:latest

Convert the container to an image

By design, Docker containers are ephemeral in nature, so any changes you make to a container are not saved if you, for example, delete the container or update the image, although you can save the changes by committing them to a new image.

To commit the changes you made to the CUDA development environment you created in the last step, run the following command to create a new image called “cudaimage”.

docker commit CUDAtainer cudaimage

You can then start a new container based on it by running the following command:

docker run -itd --gpus=all -p 8082:80 -v ~/denv:/home/denv --name CUDAtainer2 cudaimage

Building a custom image

Converting an existing container into a reproducible image is useful for creating checkpoints or testing changes, but if you plan to share your image, you generally want to document your work in dockerfile.

This file is essentially a list of instructions that tell Docker how to turn an existing image into a custom image. Like many parts of this tutorial, Docker and docker build Most of these commands are self-explanatory.

If you're new to generating Docker images, here's a quick example using this AI weather forecast app written in Python that uses Microsoft's Phi3-instruct LLM to generate human-readable reports in the style of a TV weather personality, from statistics collected every 15 minutes from the Open Weather Map.

import json import time from entering import Dict, Any

import request, import torch from transformer, import pipeline, BitsAndBytesConfig

# Constants ZIP_CODE = YOUR_ZIP_CODE API_KEY = “YOUR_OPEN_WEATHER_MAP_API_KEY” # Replace with your OpenWeatherMap API key WEATHER_URL = f”http://api.openweathermap.org/data/2.5/weather?zip={ZIP_CODE}&appid={API_KEY}” UPDATE_INTERVAL = 900 # Seconds

# Initialize the text generation pipeline. quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16) pipe = pipeline(“text-generation”, “microsoft/Phi-3-mini-4k-instruct”, device_map=”auto”, model_kwargs={“quantization_config”: quantization_config})

def kelvin_to_fahrenheit(kelvin: float) -> float: “””Converts Kelvin to Fahrenheit.””” return (kelvin – 273.15) * 9/5 + 32

get_weather_data() definition -> Dictionary[str, Any]: “””Get weather data from the OpenWeatherMap API.””” response = request.get(WEATHER_URL) response.raise_for_status() return response.json()

def format_weather_report(weather_data: Dictionary[str, Any]) -> str: “””Formats weather data into a report string.””” main_weather = weather_data[‘main’] Location = Weather data[‘name’] Condition = weather_data[‘weather’][0][‘description’] Temperature = Kelvin to Fahrenheit (main_weather[‘temp’]) Humidity = main_weather[‘humidity’] Wind speed = weather data[‘wind’][‘speed’]

return (f”Time: {time.strftime('%H:%M')},” f”Location: {location},” f”Condition: {conditions},” f”Temperature: {temperature:.2f}°F,” f”Humidity: {humidity}%, ” f”Wind speed: {wind_speed} m/s”)

def generate_weather_report(weather_report: str) -> str: “””Generates a weather forecast using the text generation pipeline.””” chat = [ {“role”: “assistant”, “content”: “You are a friendly weather reporter that takes weather data and turns it into short reports. Keep these short, to the point, and in the tone of a TV weather man or woman. Be sure to inject some humor into each report too. Only use units that are standard in the United States. Always begin every report with ‘in (location) the time is'”}, {“role”: “user”, “content”: f”Today’s weather data is {weather_report}”} ] response = pipe(chat, max_new_tokens=512) return response[0][‘generated_text’][-1][‘content’]

def main(): “””The main function for running the weather forecast loop.””” try: while True: try: weather_data = get_weather_data() weather_report = format_weather_report(weather_data) generated_report = generate_weather_report(weather_report) print(generated_report) except requests.RequestException as e: print(f”Error retrieving weather data: {e}”) except Exception as e: print(f”An unexpected error occurred: {e}”)

time.sleep(UPDATE_INTERVAL) except KeyboardInterrupt: print(“\nWeather forecast has stopped.”)

If __name__ == “__main__” : main()

Note: If you follow the steps, make sure to set your zip code and Open Weather Map API key appropriately.

If you're interested, the app works by passing weather data and instructions to LLM via a Transformers pipeline module. You can find more details here.

The app itself has minimal dependencies and is already fairly portable, provided you have the CUDA runtime installed correctly, and containerizing the app makes it easier to manage.

First, create an empty dockerfile Together weather_app.py The above Python script. dockerfile First, define the base image to use and the working directory to use.

FROM nvidia/cuda:12.5.0-devel-ubuntu22.04
WORKDIR /ai_weather

Below this we tell the Dockerfile to copy it. weather_app.py Copy the script into your working directory.

ADD weather_app.py /ai_weather/

From here, you just need to specify what command you want to run. RUN Set up the container and install the dependencies, in this case just a few Python modules and the latest release of PyTorch for GPUs.

RUN apt update
RUN apt upgrade -y
RUN apt install python3 python3-pip -y
RUN pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
RUN pip3 install requests accelerate transformers
RUN pip3 install bitsandbytes>=0.39.0 -q

lastly, CMD The command or executable you want to run when the container first starts. dockerfile When you're done, it should look like this:

FROM nvidia/cuda:12.5.0-devel-ubuntu22.04
WORKDIR /ai_weather
ADD weather_app.py /ai_weather/
RUN apt update
RUN apt upgrade -y
RUN apt install python3 python3-pip -y
RUN pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
RUN pip3 install requests accelerate transformers
RUN pip3 install bitsandbytes>=0.39.0 -q
CMD ["/bin/bash", "-c", "python3 weather_app.py"]

What you need to do now is dockerfile Run the following command to convert to a new image and then wait:

docker build -t aiweather .

After a few minutes, your image will be ready and you will be able to start the container in interactive mode. Note: --rm Use the ./etc/init.d/vmware-destroy bit if you want the container to not destroy itself when it is stopped.

docker run -it --rm --gpus=all aiweather

After a few seconds, the container will start, download Phi3 from Hugging Face, quantize it to 4-bit precision, and display the first weather forecast.

 "In Aurora, the time is 2:28 PM, and it's a hot one! We've got scattered clouds playing hide and seek, but don't let that fool you. It's a scorcher at 91.69°F, and the air's as dry as a bone with just 20% humidity. The wind's blowing at a brisk 6.26 m/s, so you might want to hold onto your hats! Stay cool, Aurora!"

Of course, this is an intentionally simple example, but I hope it illustrates how containerization can be used to easily build and deploy running AI apps. If you need more complex information, I encourage you to take a look at the Docker documentation here.

What about NIM?

Like any app, containerizing your AI projects has a variety of benefits, including making your projects more reproducible and easier to deploy at scale, as well as enabling you to ship models with configurations optimized for specific use cases or hardware configurations.

That's the idea behind the Nvidia Inference Microservices (NIMs for short), which we covered at GTC this spring. These NIMs are really just containers built by Nvidia with specific versions of software like CUDA, Triton Inference Server, and TensorRT LLM that are tuned to deliver the best performance on your hardware.

And because they're built by Nvidia, whenever the GPU giant releases an update to one of its services that delivers new features or better performance on new or existing hardware, users will be able to take advantage of those improvements simply by downloading a new NIM image. At least, that's the aim.

In the coming weeks, Nvidia will make NIM available for free through its developer program for research and testing purposes, but before you get too excited, if you want to put it into production you'll need an AI Enterprise license, which costs $4,500 per GPU per year in the cloud, or $1 per GPU per hour.

We plan to take a closer look at Nvidia's NIM in the near future. However, if you don't have the budget to purchase an AI Enterprise license, there's nothing stopping you from creating your own optimized image, as shown in this tutorial.®

Editor's note: Nvidia provided The Register with an RTX 6000 Ada Generation graphics card in support of this article and similar articles. Nvidia had no role in the content of this article.

Source link