China Mobile Hubei and Huawei complete validation of China’s first AI inference acceleration solution for carriers

AI News


[Shanghai, China, June 24, 2026] At MWC Shanghai 2026, China Mobile Communications Group Hubei Co., Ltd. (abbreviated as China Mobile Hubei) and Huawei announced that they have successfully completed live network verification of Huawei’s AI inference acceleration solution, a first in the Chinese carrier industry. The solution is powered by Huawei’s OceanStor A800 storage, Ascend A3 SuperPoD, and Unified Cache Manager (UCM) to improve token throughput by up to 372% for long-sequence artificial intelligence (AI) inference workloads. This milestone provides critical technical support for the efficient deployment of AI computing services by carriers.

Innovation: Huawei UCM eliminates bottlenecks in long-term sequence inference

As AI applications rapidly transition to AI agents, long-sequence scenarios such as code generation and multi-turn interactions are becoming increasingly common. However, the limited capacity of traditional on-chip memory and dynamic random access memory (DRAM) severely limits the KV cache hit ratio and limits the overall performance.

Huawei introduced UCM in 2025 to directly address this challenge. By using external high-performance storage, UCM breaks the traditional capacity limitations of on-chip memory and DRAM and enables petabyte-scale KV cache capabilities. This solution implements hierarchical management and scheduling throughout the KV cache lifecycle and significantly extends the context window of single turn dialogs. For multi-turn dialogs, UCM reuses the historical KV cache to eliminate redundant computations and provide an optimized inference experience with lower inference costs.

Dramatic performance improvements: Multi-model validation showed significant improvements in both TTFT and TPS

In this validation, we deployed the vLLM-Ascend framework in China Mobile Hubei’s live network environment and simulated long sequence inputs ranging from 8K to 190K tokens across mainstream models such as MiniMax M2.5 and GLM-5.1. The main findings are:

  • MiniMax M2.5: Enabling UCM improved time to first token (TTFT) by 26% to 62% and significantly improved tokens per second (TPS) per NPU. Taking a closer look at the different sequence lengths, TPS increased by 58% with a sequence length of 64K and by 78% in a 128K sequence length environment.
  • GLM-5.1: TTFT improved by 51% to 93% and TPS increased by 56% to 372%. TPS increased by 313% for 64K sequence lengths and jumped by 372% for 128K long sequence environments.

The test results show that the benefits of the AI ​​inference acceleration solution become more pronounced as the context length increases. This solution effectively solves the KV cache capacity bottleneck that often occurs in long-term sequence inference.

Value amplification: Powering mission-critical services in the agent era

A representative from China Mobile Hubei said, “Hubei Province is located in a core area with latency of just 10ms for the country’s eight major computing power hubs. This test validates the need for storage, compute, and network collaboration. In scenarios such as AI agent interaction and code generation, AI inference acceleration solutions can improve throughput by more than 50%, and China Mobile Hubei’s AI “We are building a solid foundation for large-scale deployment of our services.”

Industry Perspective: Reinventing the AI ​​Data Infrastructure

Michael Qiu, President of Huawei Global Data Storage Marketing & Solutions Sales Department, said, “With major carriers announcing token packages, the large-scale adoption of AI agents has clearly entered a new stage. Token consumption is expected to increase exponentially in the future. AI inference acceleration solutions not only significantly reduce TTFT, but also help reduce token costs, enabling carriers to build efficient and green AI computing infrastructure.”

This successful validation represents a major step forward in the collaborative optimization of AI computing infrastructure for carriers and provides a replicable technical model for the global AI industry.

MWC Shanghai 2026 will be held in Shanghai, China from June 24th to June 26th. During the event, Huawei will exhibit its latest products and solutions in Hall N1 of Shanghai New International Expo Center (SNIEC).

The ICT industry is rapidly moving into the era of token monetization. Huawei is working with global carriers and partners to explore monetizing 5G-A high uplinks and experiences, and leveraging AI to upgrade their businesses through enhanced connectivity and computing. Together, we will capture the opportunities presented by token monetization.

For more information, please visit https://carrier.huawei.com/minisite/mwcs2026/en/.



Source link