Deepseek’s new reasoning model, named R1, challenges the performance of Chatgpt O1 from Openai, despite limited material resources and a relatively limited budget.
In a context marked by American export controls restricting access to advanced fleas, the Chinese startup of artificial intelligence founded by the investment manager Liang Wenfeng illustrates how the efficiency and sharing of resources can advance development of AI.
The rise of the company drew the attention of technological circles both in China and in the United States. Deepseek’s R1 model offers advanced performance, while being censored in accordance with the directives of the Chinese Communist Party.
The rise of Deepseek
Deepseek’s adventure began in 2021, when Liang, best known for his High-Flyer quantitative trading fund, began to acquire thousands of NVIDIA GPU.
At the time, this decision seemed unusual. As one of Liang’s business partners said to the Financial Times, “When we met him for the first time, he was very geek with an unflattering hairstyle and spoke of building a 10,000 chip cluster to train his own models. We did not take it seriously. ”
According to this same source, “He didn’t really know how to express his vision apart from saying: I want to build this, and that will change the situation. We thought it was only possible for giants like Bytedance and Alibaba. ”
Despite this initial skepticism, Liang focused on the preparation in the face of the possible American export controls. This foresight allowed Deepseek to secure a large quantity of NVIDIA equipment, including A100 and H800 GPUs, before general restrictions took effect.
Deepseek made the headlines by announcing that it has led to its R1 model, with a capacity of 671 billion parameters, at a cost of only $ 5.6 million using $ 2,048 H800 GPU.
Although the performance of the H800 is deliberately limited for the Chinese market due to the restrictions imposed by the United States, Deepseek engineers have optimized the training process to obtain high-level results at a cost much lower than that usually associated with large -scale language models.
In an interview published by MIT Technology Review, Zihan Wang, former researcher at Deepseek, explains how the team has managed to reduce the use of memory and calculation time while preserving accuracy.
He mentioned that the technical limitations had pushed them to explore innovative engineering strategies, allowing them to remain competitive in the face of better funded American technological laboratories.
Remarkable results on mathematical and programming assessments
The R1 model demonstrates excellent capacities in various mathematical and programming benchmarks. Deepseek revealed that R1 had obtained a score of 97.3 % (Pass@1) on Math-500 and 79.8 % on AIM 2024.
These results compete with those of the O1 series of Openai, highlighting how meticulous optimization can challenge models trained on more powerful fleas.
Dimitris PapaiLiopoulos, principal researcher at the AI Frontiers laboratory in Microsoft, told MIT Technology Review: “Deepseek targets specific answers rather than detailing each logical step, thereby reducing calculation time while maintaining a high level of efficiency.”
Beyond the main model, Deepseek has also published smaller R1 versions that can operate on consumer equipment. Aravind Srinivas, CEO of Perplexity, tweeted in reference to these compact variants: “Deepseek largely replied O1-Mini and made it open source.”
Reasoning in a chain of thought and R1-Zero
In addition to the standard R1 training, Deepseek explored learning by pure strengthening with a variant called R1-Zero. This approach, detailed in the company’s research documentation, abandons supervised fine-tuning in favor of optimizing the relative group policy (GRPO).
By eliminating a separate critical model and relying on grouped reference scores, R1-Zero presented a reasoning in thought chain and self-reflection behaviors. However, the team recognized that R1-Zero produced repetitive outings or in mixed languages, indicating a need for partial supervision before it can be used in daily applications.
The ethics of the open source that underpins Deepseek distinguishes it from many private laboratories. While American companies such as Openai, Meta and Google Deepmind often keep their secret training methods, Deepseek makes its code, its model weights and its training recipes publicly available.
According to Liang, this approach comes from a desire to establish a culture of research promoting transparency and collective progress. In an interview with the Chinese media 36kr, he explained that many Chinese AI companies are fighting with efficiency compared to their Western counterparts, and that filling this gap requires collaboration both on the material and on the strategies of training.
-He maintains that the boom in open source models is in high expansion, with Alibaba Cloud having introduced more than 100 open-source models, and 01.ai, founded by Kai-Fu Lee, having recently established a partnership with Alibaba Cloud to establish An industrial AI laboratory.
The response of the global technological community was a mixture of admiration and prudence. On X, Marc Andreessen, co-inventor of the Mosaic web browser and now a leading investor at Andreessen Horowitz, wrote: “Deepseek R1 is one of the most amazing and impressive breakthroughs that I have ever seen – and as Open Source, it’s a deep gift for the world.”
Yann Lecun, chief scientist of AI at Meta, underlined on LinkedIn that if Deepseek’s feat could suggest that China outlined the United States, it would be fairer to say that open source models collectively catch up with alternative owners.
“Deepseek has taken advantage of open research and open source (like Pytorch and Llama de Meta),” he explained. “They had new ideas and built them on the work of other people. As their work is published and open source, everyone can take advantage of it. It is the strength of open research and open source. ”
Mark Zuckerberg, founder and CEO of Meta, spoke of a different path for AI in 2025, announcing massive investments in the infrastructure of data centers and GPUs.
On Facebook, he wrote: “This year will be decisive for AI. In 2025, I expect Meta Ai to be the leading assistant serving more than 1 billion people, that Llama 4 became the advanced model and that we build an IA engineer who will begin to contribute more and more Code to our R&D efforts. To feed this, Meta builds a data center of more than 2 GWs which would be so large that it would cover a significant part of Manhattan. ”
“We will put online ~ 1 GW of calculation in ’25 and we will finish the year with more than 1.3 million GPUs. We plan to invest 60-65 billion dollars in capex this year while also increasing our AI teams considerably, and we have the capital to continue investing in the coming years. It is a monumental effort, and in the coming years, it will propel our products and our company, will unlock a historic innovation and will extend American technological leadership. Let’s build! ”
Zuckerberg’s remarks suggest that strategies requiring many resources remain force majeure in the configuration of the AI sector.
Extended impacts and future perspectives
For Deepseek, the combination of local talents, an early GPU supply and emphasis on open source methods propelled it under the spotlight, usually reserved for large technological companies. In July 2024, Liang said that his team was aimed at filling what he called a gap in efficiency in Chinese AI.
He has described many local IA companies requiring twice as much computing power to match foreign results, a problem that is still complicated when the use of data is taken into account. The benefits of the Hedge Fund High-Flyer offer Deepseek a stamp against immediate commercial pressures, allowing Liang and its engineers to focus on their research priorities. Liang said:
“We believe that the best national and foreign models can have a difference of once in the structure of the model and the training dynamics. For this reason alone, we must consume twice as much computing power to obtain the same effect. ”
“In addition, there could also be a difference of once in data efficiency, that is to say that we must consume twice as much training and calculation data to obtain the same effect. Together, we must consume four times more computing power. What we have to do is continuously reduce these differences. ”
The reputation of Deepseek in China was also reinforced when Liang was the only leader of the AI invited to a high -level meeting with Li Qiang, the second manager of the country, where he was encouraged to focus on the development of fundamental technologies.
Analysts see this as an additional signal according to which Beijing is betting strongly on smaller local innovators to push the limits of AI despite the material restrictions.
While the future remains uncertain – all with American restrictions that can be more strengthened – deepseek is distinguished by its ability to take up challenges to transform constraints into rapid problem solving opportunities.
By making its breakthroughs public and by offering smaller training techniques, the start-up has aroused broader discussions about how the effectiveness of resources can really compete with huge supercalculculculculculculculculifies.
As Deepseek continues to refine its R1 model, engineers and decision -makers on both sides of the Pacific carefully monitor if the achievements of this model can open a lasting path for AI progress in an era of evolutionary restrictions.
In conclusion, the development of Deepseek raises interesting questions about the recent challenges that the technological industry has to face. While the competition becomes more and more intense, especially between China and the United States, it is crucial to reflect on the way in which these developments will influence the future of global technological innovation. Can open source really redefine the dynamics of power in the field of AI, and more broadly in technology?
--Related News :