Expand scalable AI inference using NVIDIA NIM Operator 3.0.0

AI models, inference engine backends, and distributed inference frameworks continue to evolve in architecture, complexity, and scale. The rapid pace of change makes it a critical challenge to deploy and efficiently manage AI inference pipelines that support these advanced features.

The NVIDIA NIM operator is designed to help you scale intelligently. This allows Kubernetes cluster administrators to operate the software components and services needed to run the latest LLMS and multimodal AI models of NVIDIA NIM inference microservices, including inference, search, vision, speech, biology, and more.

The latest release of NIM Operator 3.0.0 introduces enhancements to simplify and optimize the deployment of NVIDIA NIM microservices and Nvidia Nemo microcell services across Kubernetes environments. NIM Operator 3.0.0 supports efficient resource utilization and seamlessly integrates with existing Kubernetes infrastructure, including KSERVE deployments.

NVIDIA customers and partners use NIM operators to efficiently manage inference pipelines for a variety of applications and AI agents, including chatbots, agent RAGs, virtual drug discovery, and more.

Nvidia recently collaborated with Red Hat to enable NIM deployments in KSERVE with NIM operators. “Red Hat contributed to the open source Github Repo of NIM operators to enable the deployment of NVIDIA NIM on KSERVE,” said the director of Babaku Mozafari at Red Hat. “This feature allows NIM operators to deploy NIM microservices that benefit from KSERVE lifecycle management and simplify scalable NIM deployments using NIM services. Native KSERVE support in NIM operators allows NIM caches to benefit from model caches and leverage Nemo's capabilities to use Nemo Guardrails' Endive Endpoint.

This post covers new features in the NIM Operator 3.0.0 release.