Token costs emerge as a bottleneck for scaling AI applications, and the industry is looking for a breakthrough in computing power.

Reporters from China's Financial News Agency observed on the ground that the focus of discussion between experts and corporate representatives has shifted from macro-level policy interpretations to specific “implementation blueprints.” An interview conducted by news agency reporters revealed at the meeting that high token costs are a core issue for many companies in increasingly scaling AI applications.

On September 27th, reporter Guo Sungkiao, China's financial and telecommunications agency: It's only been a month since the opinion on the deep implementation of the “Artificial Intelligence+” action was published, and the acceleration of the industry's “start run” is already clear.

The 2025 Artificial Intelligence Computing Conference held in Beijing yesterday served as an excellent observation window. A reporter from China's Financial News Agency has noticed on the ground that the focus of discussion between experts and corporate representatives has shifted from macro-level policy interpretations to specific “implementation blueprints.”

This year's conference is closely aligned with the construction of artificial intelligence infrastructure and optimization of domestic AI computing power systems, focusing on innovation in algorithms and facilitating application implementation. Computing Power is collaborating resources in the fields of academia, industry, research and applications as a core element of driving innovation, collaborating with the development of the artificial intelligence industry in high quality. Over 30 companies and agencies, including China Mobile, Inspur Information, Zhiyuan Research Institute, and Kunlun Chip, have jointly released “Building Industry Intelligence Based on the Beijing Solutions for Intelligent Computing Applications – Supernode Innovation Consortium” to address national opinions to deepen the “Artificial Intelligence+” action.

“Achieve cross-regional, cross-hardware computing power connections and comprehensive sharing.”

At the conference, Westlake University founder Jin Yakai, a “reliable Institute of Artificial Intelligence” and a member of the European Academy of Sciences, outlined the main trajectories of artificial intelligence development. He noted that its developmental pathways resemble the emergence of human brain intelligence, experiencing three important mechanisms: evolution, development and learning. From this perspective, he further elaborated on the requirements of reliable artificial intelligence and the importance of artificial intelligence governance, sharing industrial artificial intelligence in general brain-inspired artificial intelligence and laboratory practices in exploration.

Lin Yonghua, vice president and chief engineer at Beijing Zhiyuan Artificial Intelligence Institute, shared the technological advancements in the “Crowdwisdom Flagos” platform. As an open, unified system software stack, the platform aims to break down barriers within the AI computing power ecosystem, achieving connectivity and comprehensive sharing of cross-regional, cross-hardware computing power, providing global developers with unified computing foundations across chips, frameworks and scenarios.

Wang Haifeng, Chief Technology Officer at Baidu, reviewed the development of artificial intelligence, ranging from rule-based methods to statistical machine learning, deep learning, and large-scale models. He emphasized that the universality and comprehensive capabilities of large-scale modeling technologies provide a promising outlook for achieving general artificial intelligence.

Liu Jun, Chief AI Strategist at Inspur Information, introduced two innovative systems in the age of intelligent agents. He discussed the challenges faced by sustainable AI computing power development, such as scale, electricity and investment. He proposed a rethinking and redesigning AI computing systems, from scale-oriented approaches to efficiency-oriented approaches, and developing specialized AI computing architectures.

Dai Beijie, vice president of Beijing Zhongguancun Artificial Intelligence Institute, has introduced a project-based human resources development system founded by Beijing Zhongguancun College to meet the growing demands of unconventional AI leaders. This initiative promotes deep integration with multiple disciplines of AI, allowing for two-way empowerment through the implementation of scientific research results and feedback from industrial needs to innovation, thereby injecting strong momentum into the development of new quality productivity.

Hardware innovation targets token cost bottlenecks

Cailian Press reporters learned on the ground that many companies have become a central problem in expanding their AI applications at meetings.

“Our platform handles a huge amount of customer service, recommendations, and risk control scenarios that need to call large models. Token costs are like the “Damocles Sword” hanging from your head. ” At the meeting, the technical director of the e-commerce company's AI platform division spoke to Cailian Press Reporters, adding that he came specifically to find cost-saving solutions.

“As intelligent agent applications grow further, token consumption sessions per interaction are rapidly increasing. With the current cost structure, many valuable innovative applications have reached obstacles due to “economic feasibility” even before reaching scale, bringing a major challenge to profitability,” the aforementioned director admitted. This sentiment has become one of the most common voices heard by Kaylian's reporters at this year's AI conference.

Guo Tao, assistant director of China E-Commerce Expert Service Center, said in an interview with Cailian Press that the AI industry is moving from “model competition” to “application implementation.” Inference costs and interaction rates have become more important competitive aspects than model parameter sizes. The effectiveness of infrastructure in terms of “increasing speeds and reducing costs” directly determines the depth and width of “AI+” penetration across vertical industries.

At the meeting, representatives from the publicly-publicized company also told reporters that the open source model represented by DeepSeek has significantly lowered innovation thresholds and accelerated the industrialization of intelligent agents as scaling methods continue to drive advances in model capabilities. The three core elements of intelligent agent industrialization are capabilities, speed, and cost. Among these, the model features determine the upper limit of the application's potential, the interaction rate defines commercial value, and the token cost determines profitability.

This issue resonated widely at the meeting. In response to the universal demand of the industry, computing infrastructure providers are looking to seek breakthroughs at the hardware level.

In terms of hardware, Inspur Information announced the Yuannao HC1000 super scalable AI server at a meeting. Based on the newly developed, fully symmetrical DirectCom super-speed architecture, the lossless and ultra-scalable design aggregates a huge array of domestic AI chips and supports extremely high inference throughput. For the first time, the cost of inference was reduced to less than 1 RMB per million token, providing an innovative computing power system with the ultimate performance to overcome the bottleneck of token costs for intelligent agents.

From a technical standpoint, Liu Jun told Cailanian Press that the Yuannao HC1000 is achieving comprehensive optimization for cost reductions and synergistic effects of hardware software through innovations such as 16-card computing module design and balanced single-card design that integrates Compute-Memory-InterConnect. These improvements significantly reduce both individual card costs and per-card system overhead. Additionally, the fully symmetric system topology supports lossless expansion at ultra-large scales. According to calculations, the Yuannao HC1000 achieves 1.75x improvement in inference performance compared to traditional ROCE solutions, and the computational efficiency of single-card models increases by up to 5.7x through deep computer work synergy and full-domain lossless technology.

The exponential surge in inference computing demands brought about by intelligent agents in the future is undoubtedly recognized by the industry.

INSPUR Information has revealed to reporters that it will continue to drive innovation and breakthroughs in AI computing architectures through co-design and deep optimization of software hardware. The company is committed to accelerating token generation while reducing costs and actively fostering deep integration of artificial intelligence technologies such as large-scale models and intelligent agents. This initiative aims to make artificial intelligence a driving force behind productivity and innovation in a variety of industries.

Source link