DeepSeek's latest technical paper, co-authored with the company's founder and CEO Liang Wenfeng, has been cited as a potential game-changer in artificial intelligence model development, as it could lead to improvements in the basic architecture of machine learning.
Manifold-constrained hyperconnections (mHC), the subject of this paper, represent an improvement on traditional hyperconnections in residual networks (ResNets), the fundamental mechanism underlying large-scale language models (LLMs), and demonstrate the continued efforts of Chinese AI startups to train powerful models with limited computing resources.
In the paper, a team of 19 DeepSeek researchers said they tested mHC on models with 3 billion, 9 billion, and 27 billion parameters and found that it could scale without adding significant computational burden.
The paper, published on January 1, immediately sparked interest and discussion among developers despite its detailed technical content.
Professor Quan Long from the Hong Kong University of Science and Technology said the new discovery is “very important for transformer architectures made for LLM”. Quan said he is “very excited to see the significant optimizations made by DeepSeek that have already revolutionized LLM efficiency.”
The paper comes at a time when most AI startups are focused on turning LLM's AI capabilities into agents and other products. But DeepSeek, a side project of Liang's quantitative trading firm, has been seeking to improve the fundamental technical mechanisms of how machines learn from data.
