DeepSeek unveils new AI training method to scale LLM more easily

AI For Business


DeepSeek started the year with new ideas for training AI. And analysts say it could have a huge impact on the industry.

A Chinese AI startup published a research paper on Wednesday, saying it describes how to train large-scale language models that could shape the “evolution of fundamental models.”

In the paper, co-authored with founder Liang Wenfeng, DeepSeek introduces a training approach it calls “Manifold-Constrained Hyper-Connections” (mHC), designed to scale models without them becoming unstable or completely broken.

As language models grow, researchers often try to improve performance by allowing different parts of the model to share more information internally. However, the paper said this increases the risk of information instability.

He added that DeepSeek's latest research allows models to share richer internal communications in a constrained way, maintaining training stability and computational efficiency as models scale.

DeepSeek's new method is an 'amazing breakthrough'

Wei Sun, principal analyst for AI at Counterpoint Research, told Business Insider on Friday that the approach is an “amazing advance.”

Sun said DeepSeek combined a variety of techniques to minimize the additional cost of training a model. She added that new training methods could yield much higher performance, even if costs increased slightly.

Sun said the paper describes DeepSeek's internal capabilities. By redesigning its training stack end-to-end, the company is showing that it can combine “rapid experimentation with highly unconventional research ideas.”

Referring to the “Sputnik moment” in January 2025, when the company unveiled its R1 inference model, DeepSeek said it could “again avoid computational bottlenecks and unleash leaps in intelligence.”

The launch sent shockwaves through the technology industry and the US stock market, showing that the R1 model could rival top competitors such as ChatGPT's o1 at a fraction of the cost.

Lian Jie Hsu, principal analyst at technology research and consulting firm Omdia, told Business Insider on Friday that the published research could have ripple effects across the industry, with rival AI labs developing their own approaches.

“China's willingness to share important discoveries with the industry while continuing to deliver unique value through new models demonstrates renewed confidence in China's AI industry,” Su said of the DeepSeek paper. Openness is accepted as a “strategic advantage and key differentiator,” he added.

Will we see the next DeepSeek model?

The paper comes as DeepSeek is reportedly working towards the release of its next flagship model, the R2, following previous delays.

According to a June report in Information, the R2 was scheduled for mid-2025, but was postponed after Liang expressed dissatisfaction with the model's performance. The report said the launch was also complicated by a lack of advanced AI chips, a constraint that increasingly shapes how Chinese labs train and deploy frontier models.

Although the paper did not mention R2, its timing raised some eyebrows. DeepSeek previously announced basic training research ahead of the launch of the R1 model.

Su said DeepSeek's track record suggests the new architecture will “definitely be implemented in new models.”

Sun, on the other hand, is more cautious. “We probably won't see a standalone R2,” Sun said. The technology could form the backbone of DeepSeek's V4 model, he added, as DeepSeek has already integrated the initial R1 update into its V3 model.

Business Insider's Alistair Barr wrote in June that DeepSeek's R1 model update failed to garner much traction in the tech world. Barr argued that distribution is key, and DeepSeek still lacks the broad reach enjoyed by major AI labs such as OpenAI and Google, especially in Western markets.





Source link