First response to the controversy led by Liang Wenfeng

Summary: Deepseek also responded to the “distillation” controversy for the first time.

On September 17, 2025, another high light moment for Chinese artificial intelligence arrived. The Deepseek-AI team led by Liang Wenfeng and his colleagues have published the findings of the journal's open model Deepseek-R1 Nature And I made it on the cover of that question.

Images | From the Internet

This paper points out that the inference capabilities of large-scale language models (LLMs) can be significantly improved through pure reinforcement learning, thus reducing the reliance on manual annotation. Compared to traditional training methods, models trained in this way show improved performance on mathematical problems. It is a problem at the graduate level in the field of solution, competitive programming, and STEM.

Here, Deepseek also responded to the “distillation” debate for the first time. In communicating with reviewers, Deepseek clearly stated that R1 does not learn by copying the inference examples generated by the OpenAI model. Like most other large language models, the R1 base model is trained on the Internet, which absorbs existing AI on the generated Internet.

“Low-Cost Miracle”: From $290,000 to the World Stage

In the AI world, there is a tough consensus. The threshold for the large model at the top was cost rather than algorithm. Openai is GPT -4. It is estimated that more than $100 million has been spent training for Google, Anthropic and Meta. It also competes with a budget of tens of millions of dollars. Funds and computing power are core factors that determine your right to speak.

However, Deepseek has broken this “implicit rule.” The inference cost of Deepseek -R1 is surprisingly low, at just $294,000, according to details disclosed by the researchers in supplemental materials for the paper. Even if the training cost of the base model adds about $6 million, the overall cost is much lower than that of the foreign giant.

The true breakthrough of Deepseek -R1 is not only in cost, but also in methodological innovation.

The research team Nature They adopted a pure reinforcement learning (RL) framework, introduced a group relative policy optimization (GRPO) algorithm, and introduced that instead of mimicking the model the human inference path, they only reward the model based on the accuracy of the final answer.

Surprisingly, this seemingly “broad” training method allows the model to naturally demonstrate sophisticated behaviors such as self-reflection, self-verification, and the creation of actual long thought chains. Sometimes they generate hundreds or even thousands of tokens to deliberate on the issue.

This is especially true in mathematical tests. Paper data shows that in the American Invited Mathematics Examination (AIME 2024), the accuracy of DeepSeek-R1-Zero jumped from 15.6% to 77.9%, reaching 86.7% after exceeding the average human level using self-integrity decoding.

Nature We commented that this indicates that models can independently form complex thinking patterns through reinforcement learning without human reasoning demonstrations.

In subsequent multi-stage optimizations (including RL, rejection sampling, monitored fines-tuning, and secondary RL), the final version of DeepSeek -R1 was not only deferred to run on hardcore tasks such as mathematics and programming, but also showed the flowability and consistency of common tasks such as writing and Q&A. This means that Deepseek “lets AI learn to think for itself,” rather than “teach AI to think.”

Liang Wenfeng's 10 Years – A Long Journey

Behind the success of Deepseek -R1 is also a little known tale of the struggle, with the exception of technical breakthroughs. Liang Wenfeng was born in 1985 into a normal family in Zhangjiang Province, Guangdong. His father is an elementary school teacher. His path to growth is not successful, but is generally known, but it shows in detail early curiosity and perseverance.

2002, 17 – Old wife Wenfeng was hospitalized in the Department of Electronics and Information Engineering at Z Jiang University. Five years later, he continued to pursue a master's degree in information and communications engineering under the supervision of Xiang Zhiyu, focusing on machine vision research. During his master's degree, he and his classmates sought to explore fully automated quantitative trading by applying machine learning to financial markets. That year, the global financial crisis was wiping the world out. There have been many opportunities like Wang Tao, the founder of DJI, but Liang Wenfeng invited him to start a business together, but chose to travel less. Convinced that artificial intelligence would change the world, I decided to start an independent business.

After graduating from his master's degree, Liang Wenfeng first combined artificial intelligence technology with quantitative trading, founded Jacobi Investment and Magic Square Technology, which has developed steadily over more than a decade. Until 2023, he turned his attention to artificial intelligence in general, founded Deepseek and embarked on the path of research and development of large-scale AI models. With a dual focus on algorithms and cost – efficiency, DeepSeek has released its V2 and V3 models in just two years. Not only does it reduce the inference costs of large domestic models, it also shocks the global market with amazing costs – performance.

Liang Wenfeng's concept of team building is also extraordinary. He follows the principle of “first ability.” Most of the core positions are made up of fresh graduates and young people with one or two years of experience. “We may not be able to find the top 50 talents in China, but we can train ourselves.” This belief is also key to Deepseek's ability to achieve high reasoning skills at low cost.

Looking at it now, the value of Deepseek's research goes far beyond a powerful model. It's like a “methodological manifesto,” demonstrating a more sustainable path to AI evolution that doesn't rely on a large amount of labeled data in the world. Breaking the “finance as a barrier” spell and returning AI development initiatives to scientific innovation itself.

This is not only a high and light moment for Chinese AI, but also a significant milestone as global AI moves towards the “inference revolution.” Lewis Tunstall, Machine – Reviewer with Hugging Face Learning Engineers Naturebelieves that “R1 has begun a revolution.” More and more people are applying R1 methodology to improve existing large-scale language models.

In the future AI competition, there is a very high chance that it will move from an “army race of data and computing power” to an “innovation race of algorithms and wisdom.” And the Deepseek -R1 horned for this new competition.

This article is from the WeChat Public account “Phoenix Tech” and author: Jiang Fan. Reissue by 36kr with permission.

Source link