NVIDIA BlueField-4 powers a new class of AI-native storage infrastructure for AI's next frontier

NVIDIA announced that the NVIDIA BlueField-4 data processor, part of the full-stack NVIDIA BlueField platform, powers the NVIDIA Inference Context Memory Storage Platform, a new class of AI-native storage infrastructure for the next frontier of AI.

As AI models scale to trillions of parameters and multi-step inference, they generate vast amounts of contextual data represented in key-value (KV) caches that are critical to accuracy, user experience, and continuity.

The KV cache cannot be stored on the GPU for long periods of time as it becomes a bottleneck for real-time inference in multi-agent systems. AI-native applications require a new kind of scalable infrastructure to store and share this data.

Also read: AiThority Interview Featuring: Pranav Nambiar, Senior Vice President, DigitalOcean AI/ML and PaaS

The NVIDIA Inference Context Memory Storage Platform provides context memory infrastructure by expanding GPU memory capacity, enabling fast sharing across nodes, delivering up to 5x more tokens per second and up to 5x more power efficiency than traditional storage.

“AI is revolutionizing the entire computing stack and now storage,” said NVIDIA Founder and CEO Jensen Huang. “AI is no longer a one-shot chatbot, but an intelligent collaborator that understands the physical world, reasons over time, acts on facts, uses tools to do real work, and retains both short-term and long-term memory. With BlueField-4, NVIDIA and our software and hardware partners are reinventing the storage stack for the next frontier of AI.”

The NVIDIA Inference Context Memory Storage Platform increases KV cache capacity and accelerates context sharing across clusters of rack-scale AI systems. Additionally, persistent context for multi-turn AI agents improves responsiveness, increases AI factory throughput, and supports efficient scaling of long-context multi-agent inference.

Key features of platforms powered by NVIDIA BlueField-4 include:

NVIDIA Rubin cluster-level KV cache capacity provides the scale and efficiency needed for long-context, multi-turn agent inference.
Up to 5x more power efficient than traditional storage.
Smart and fast sharing of KV caches between AI nodes is enabled by the NVIDIA DOCA framework and tightly integrated with the NVIDIA NIXL library and NVIDIA Dynamo software to maximize tokens per second, reduce time to first token, and improve multi-turn responsiveness.
Hardware-accelerated KV cache placement managed by NVIDIA BlueField-4 eliminates metadata overhead, reduces data movement, and ensures secure and isolated access from GPU nodes.
Efficient data sharing and retrieval enabled by NVIDIA Spectrum-X™ Ethernet serves as a high-performance network fabric for RDMA-based access to AI-native KV cache.

Storage innovators such as AIC, Cloudian, DDN, Dell Technologies, HPE, Hitachi Vantara, IBM, Nutanix, Pure Storage, Supermicro, VAST Data, and WEKA are among the first to build next-generation AI storage platforms using BlueField-4, which will be available in late 2026.

Also read: The end of serendipity: What happens when AI predicts every choice?

[To share your insights with us, please write to psen@itechseries.com]

Source link