September 17, 2025
4 Minimum reading
The secrets of the Deepseek AI model are revealed in a groundbreaking paper
The first peer-reviewed study of the Deepseek AI model shows how Chinese startups won the market for $300,000

Deepseek said the R1 model did not learn by copying examples generated by other LLMs.
Iain Masterton/Alamy Live News
Researchers at Chinese companies say the success of Deepseek's powerful artificial intelligence (AI) model R1 was not trained on rival production volumes, as the US stock market fell sharply when it was released in January. The statement was published in a document released along with a peer-reviewed version of the R1 model released today Nature.
The R1 is designed to excel at “inference” tasks such as mathematics and coding, and is a cheaper rival to tools developed by US technology companies. As an “Open Weight” model, it is available for anyone to download, and is the most popular model on the AI community platform, and has been downloaded 10.9 million times.
This paper updates the preprints released in January. This explains how Deepseek augments the standard large language model (LLM) to tackle inference tasks. That supplementary material reveals for the first time how much R1 training costs. This amounts to 294,000 US dollars. This is above the $6 million spent on the Hangzhou-based company creating the base LLM to build the R1, but it's significantly less than the millions that are believed to have the cost of rival models. Deepseek said the R1 was primarily trained on Nvidia H800 chips and was banned from being sold to China in 2023 under US export control.
Supporting science journalism
If you enjoy this article, consider supporting award-winning journalism. Subscribe. Purchase a subscription helps ensure a future of impactful stories about discoveries and ideas that will shape our world today.
A strict review
R1 is considered to be the first major LLM to undergo a peer review process. “This is a very welcome precedent,” says Lewis Tunstall, machine learning engineer at Facing Face, who reviewed it. Nature paper. “Without this norm of publicly sharing a large part of this process, it would be extremely difficult to assess whether these systems pose risks.”
In response to Peer-Review's comments, the DeepSeek team has reduced personification in its descriptions and added technical details such as the type of data the model was trained and its safety. “The AI researcher at Ohio State University in Columbus is one of the best places to go,” said Huan Sun, an AI researcher at Ohio State University. “Other companies should do the same.”
A major innovation in Deepseek was to use an automated kind of trial and error approach known as pure reinforcement learning to create R1. This process rewarded the model to reach the correct answer, rather than teaching humans to follow the example of reasoning chosen. The company says this is how the model has learned strategies like its own reasoning, such as how humans can validate their work without following prescribed tactics. To increase efficiency, the model scored its own attempts using estimations rather than using another algorithm, a technique known as group relative policy optimization.
The model is “very influential” among AI researchers, Sun says. “With almost everything in 2025 so it's possible that the reinforcement learning at LLMS was somehow inspired by R1.”
Training techniques
Media reports in January suggest that researchers from San Francisco, California-based Openai created a series of inference models with ChatGPT and a set of inference models, which Deepseek trained R1 using the output of the Openai model.
Deepseek does not publish training data as part of its paper. However, in an interaction with the judge, the company's researchers said that R1 was not learned by copying the inference examples generated by the Openai model. However, they acknowledged that, like most other LLMSs, the base model of R1 is trained on the web, which means that they are already ingesting AI-generated content on the Internet.
The rebuttal is “as persuasive as what we can see in any publication,” says Sun. Tunstall cannot be 100% sure that R1 is not trained in Openai's example, but attempts at replication by other labs suggest that Deepseek's inference recipe probably doesn't need to do this. “I think the evidence is pretty clear now that you can get very high performance using pure reinforcement learning,” he says.
For researchers, R1 is still very competitive, says Sun. In the challenge of completing scientific tasks such as data analysis and visualization, known as Scienceagentbench, Sun and colleagues found that R1 was one of the best models of balancing capacity and cost, although not initially due to accuracy.
Other researchers are currently trying to create R1 to improve the inference-like capabilities of existing LLMs, and apply methods used to extend to domains beyond mathematics and coding, Tunstall says. That way, R1 added, “kickstarted the revolution.”
This article was reproduced with permission and was First published September 17, 2025.
It's time to stand up for science
If you enjoyed this article, I would like to ask for your support. Scientific American Having been a science and industry advocate for 180 years, it may be the most important moment in its two-century history.
I Scientific American I have been a subscriber since I was 12 and it helped shape the way I see the world. Sciam Always educate me, joy, and inspire us to our vast and beautiful universe. I hope I do that for you too.
you Subscribe to Scientific Americanyou ensure that our coverage is centered around meaningful research and discovery. Having resources to report decisions that threaten labs across the United States. And we support both budding and working scientists when the value of science itself is not recognized too often.
In return, you get essential news, A fascinating podcast, great infographics, Miss newsletter, must-see videos, must-see videos, Challenging games and the world's best writing and reporting on science. You can do it too Give someone a subscription.
There was no more important time for us to stand up and show why science is important. I hope that you will support us on that mission.
