DeepSeek touts new training method as China pushes for AI efficiency

DeepSeek published a paper outlining a more efficient approach to AI development and described China's artificial intelligence industry's efforts to compete with the likes of OpenAI despite lacking free access to Nvidia Corp.'s chips.

The document, co-authored by founder Liang Wenfeng, introduces a framework called Manifold-Constrained Hyper-Connections. According to the authors, this is designed to improve scalability while reducing the computational and energy demands needed to train advanced AI systems.

DeepSeek publications like this one have heralded major model releases in the past. The Hangzhou-based startup surprised the industry a year ago with its R1 inference model, developed at a fraction of the cost of its Silicon Valley rivals. DeepSeek has since released several smaller platforms, but expectations are high for its next flagship system, widely referred to as R2, expected around Chinese New Year in February.

Chinese startups continue to operate under significant constraints as the United States blocks access to cutting-edge semiconductors essential to developing and implementing AI. These limitations have required researchers to pursue unconventional methods and architectures.

Bloomberg intelligence statement

Despite Google's recent gains, DeepSeek's upcoming R2 model (likely to launch in the next few months) has the potential to upend the global AI sector once again. Google's Gemini 3 model overtook OpenAI in November to enter the top three in LiveBench's global large-scale language model performance rankings. A low-cost Chinese model, developed at a fraction of the cost of its competitors, earned two spots in the top 15.

– Robert Lee, Jasmine Liu, Analyst

Click here for research.

DeepSeek, known for its unconventional innovations, published its latest paper this week through its open repository arXiv and open source platform Hugging Face. The paper lists 19 authors, with Liang's name listed last.

The founders, who have consistently led DeepSeek's research agenda, challenged the team to rethink how large-scale AI systems are conceived and built.

The latest research addresses challenges such as training instability and scalability limitations, noting that the new method incorporates “rigorous infrastructure optimizations to ensure efficiency.” The tests were conducted on models ranging from 3 billion to 27 billion parameters based on ByteDance Ltd.'s 2024 Hyperconnection Architecture study.

The authors said that this technology holds promise for “evolution of basic models.”

This article was generated from an automated news agency feed without modifications to the text.

Source link