OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet.
In addition to being more efficient than its competitors, DeepSeek represents an economic model based on open source. Transparency which allows developers and researchers to freely access the model, adapt it and improve it according to their needs. This openness promotes collaborative innovation and provides increased flexibility for diverse applications.
DeepSeek-V3 stands out for its MoE (Mixture of Experts) architecture integrating 671 billion parameters, of which 37 billion are activated by token, thus optimizing efficiency and performance. The MoE architecture is a machine learning approach that divides an artificial intelligence model into several specialized subnetworks, called “experts”. Each expert is trained to excel in a specific area of input data. A mechanism determines which experts are the most relevant to activate for a given task. By activating only the experts needed for a specific task, the MoE architecture reduces the computational load compared to traditional dense models.
A context window of 128k tokens
Trained on a dataset of 14.8 trillion tokens, it ensures accurate text comprehension and generation. Its expanded context window, with a capacity of 128,000 tokens, makes it possible to manage long conversations and complex tasks without compromising contextual consistency. Additionally, it generates up to 60 tokens per second, a 300% improvement over the previous version, DeepSeek-V2.
In terms of performance, DeepSeek-V3 performs better than its competitors on various benchmarks. For example, it obtains a score of 75.9% on the MMLU-Pro (Exact Match), surpassing GPT-4o (72.6%) and getting closer to Claude 3.5 (78%), demonstrating its ability to handle tasks question-answer. On the MATH 500 test, he reached 90.2%, ahead of Claude 3.5 (78.3%) and GPT-4o (74.6%), illustrating a more advanced aptitude in mathematical reasoning. Additionally, on Codeforces, it ranks 51.6 percent, surpassing GPT-4o (23.6).
Better cost and resource efficiency
One of the most notable aspects of DeepSeek-V3 is its cost and resource efficiency. Its development required approximately 2.788 million GPU hours, for a total estimated cost of $5.57 million, a fraction of the resources typically required for models of this scale. In this way, it undermines the current discourse on the expensiveness of models developed at great expense by competitors, demonstrating the fact that it is possible to train a high-performance model for a fraction of the cost declared by certain publishers. In comparison, GPT-4 training is estimated to cost more than $100 million.
Additionally, unlike closed models, DeepSeek-V3 is open source, offering developers and researchers the ability to adapt and improve it according to their needs. DeepSeek’s API is also compatible with OpenAI formats, making integration easier for developers accustomed to these environments. Proprietary models, although efficient, often present limitations in terms of cost and adaptability. DeepSeek-V3 addresses these concerns as an open source alternative that can compete with market leaders while allowing for increased customization.
Beyond performance and training cost, DeepSeeker, and by extension China, is entering the generative AI market through the front door. The publisher took the time to develop a model that charts a path distinct from that of competitors. It adopts a strategy for penetrating the generative AI market which clearly differs from that of its American competitors such as OpenAI, Anthropic or Google DeepMind. The Chinese company’s approach is based on a combination of technological innovation, strategic differentiation and democratization, with a vision of global accessibility, particularly in emerging countries.
A credible and accessible alternative
Unlike its competitors who rushed to occupy the generative AI space from the start of the current wave (2020-2022), DeepSeek took the time to develop a solid technological proposition. Its open source model is based on an advanced architecture which allows it to integrate 671 billion parameters while remaining economical in terms of resources used. This technical choice is not only a question of performance, but also a strategic decision to minimize training and operating costs.
By reducing development costs, DeepSeek shows that it is possible to produce state-of-the-art models while minimizing exorbitant computing power requirements. This feat sends a clear message: innovation in AI is not reserved for tech giants with unlimited resources.
DeepSeek also positions itself as a serious alternative to American models, thanks to its open source commitment. Open code allows local developers to adapt models to the specific languages, cultures and needs of their market. This approach promotes international collaboration, community innovation and adoption by organizations that might not be able to afford access to proprietary models.
A strategy focused on emerging countries
DeepSeek’s strategy seems particularly suited to penetrating the markets of emerging countries, often neglected by large American players. Countries where local businesses and governments are seeking technological solutions adapted to their economic realities. This democratization strategy has already borne fruit in other areas for Chinese companies, notably in telecommunications with Huawei or in e-commerce with Alibaba.
DeepSeek’s positioning is also a direct response to the technological monopoly of American companies in generative AI. In China, authorities are encouraging the development of local solutions to reduce dependence on Western technologies, particularly in the face of restrictions imposed by the United States on semiconductors and access to cutting-edge technologies. DeepSeek, by offering competitive technology at a lower cost, strengthens Chinese technological autonomy while asserting a presence on the international scene.
In the global AI market, the arrival of DeepSeek-V3 could disrupt the current dynamics of the generative AI market. By challenging the dogma that only companies with colossal resources can tackle this, DeepSeek opens the door to a greater diversity of players. This development could encourage fairer competition, foster innovation and, above all, extend the benefits of AI to previously marginalized regions and sectors.