The tumultuous stock market values and wild allegations accompanied the release of a new AI chatbot by a small Chinese business. What makes him so different?
The reason behind this tumult? The “large language model” (LLM) which feeds the application has reasoning capacities comparable to American models such as O1 of Openai, but would need a fraction of the cost to train and operate.
Analysis
Dr Andrew Duncan is the director of the fundamental AI of sciences and innovation at the Alan Turing Institute in London, in the United Kingdom.
Deepseek claims to have reached it in deployment of several technical strategies which have reduced both the amount of calculation time necessary to form its model (called R1) and the amount of memory necessary to store it. The reduction in these general costs has led to a spectacular cost reduction, explains Deepseek. Basic R1 V3 model would have required 2.788 million hours to train (performing many graphic treatment units – GPU – at the same time), at an estimated cost of less than $ 6 million (4.8 million pounds sterling), compared to more than 100 million dollars (80 million pounds sterling) that the boss of Openai, Sam Altman, was necessary to form GPT-4.
Despite the blow to Nvidia’s market value, the Deepseek models were formed around 2,000 GPU NVIDIA H800According to a research document published by the Company. These chips are a modified version of the widely used H100 chip, designed to comply with the export rules to China. These were probably stored before the restrictions were more tightened by the Biden administration In October 2023, which indeed prohibited Nvidia from exporting the H800 to China. It is likely that, working in these constraints, Deepseek has been forced to find innovative means to use the most effective of the resources she had.
Reduction in the cost of calculating training and the execution of models can also meet concerns concerning the environmental impacts of the AI. The data centers on which they operate have huge electricity and water requests, largely to prevent servers from overheating. While most technological companies do not disclose the carbon footprint involved in the exploitation of their models, a recent estimate puts monthly carbon dioxide emissions from Chatgpt to More than 260 tonnes per month – This is the equivalent of 260 London flights in New York. Thus, increasing the efficiency of AI models would be a positive direction for industry from an environmental point of view.
Of course, the question of whether the Deepseek models offer real energy savings remains to be seen, and it is not difficult to know if a more expensive and more efficient AI could lead to more people using the model , and therefore an increase in overall energy consumption.
If nothing else, it could help push sustainable AI in the agenda next time Paris Ai Action Summit So that the AI tools we use in the future are also harder for the planet.
What surprised many people is how Deepseek speed appeared on the stage with such a competitive language model – the company was only founded by Liang Wenfeng in 2023, which is now praised in China as something of a “hero ai”.
The latest Deepseek model also stands out because its “weights” – the digital parameters of the model obtained from the training process – were openly released, as well as a technical document describing the model development process. This allows other groups to run the model on their own equipment and adapt it to other tasks.
This relative opening also means that researchers from the world peer under the hood of the model To discover what makes him vibrate, unlike O1 and O3 of Openai which are indeed black boxes. But it still lacks details, such as data sets and the code used to train models, so groups of researchers are now trying to bring them together.
All the techniques for reducing Deepseek costs are not new either – some have been used in other LLM. In 2023, Mistral AI openly published its 8x7b Mixtral model which was tied with the advanced models of the time. Mixtral and the Deepseek models both exploit the technique of “experts’ mixture”, where the model is built from a group of much smaller models, each with expertise in specific fields. Given a task, the mixture model assigns it to the most qualified expert.
Deepseek even revealed his unsuccessful attempts to improve LLM reasoning through other technical approaches, such as the search for Monte Carlo Tree, an approach long presented as a potential strategy to guide the reasoning process of an LLM. Researchers will use this information to study how the already impressive problem solving capacities of the model can still be improved – improvements that are likely to be found in the next generation of AI models.
So, what does all this mean for the future of the AI industry?
Deepseek potentially demonstrates that you do not need vast resources to build sophisticated AI models. I suppose that we will start to see very capable AI models under development with resources that are always less, because companies find the means to make the training and functioning of models more effective.
Until now, the AI landscape has been dominated by companies in “major technologies” in the United States – Donald Trump has called the rise of Deepseek “An alarm clock“For the American technological industry. But this development is not necessarily bad for Nvidia in the long term: because the financial cost and the time of the development of AI products is reduced, companies and governments will be able to adopt this technology more easily.
It seems likely that small businesses such as Deepseek will have an increasing role to play in creating AI tools that have the potential to make our lives easier. It would be a mistake to underestimate this.
For more stories of science, technology, environment and health of the BBC, follow us Facebook,, X And Instagram.
North KoreaThe soldiers are implacable, almost fanatical, faced with death. They are determined and capable…
The Dogecoin whales have sold another important part of their assets in the last 24…
Columbus, Ohio - The news from Chip Kelly on Sunday leave Ohio State Football to…
Kanye West and his wife Bianca Censori the exchange during their scandalous appearance on the…
Brussels (AP) - The Prime Minister of Denmark insisted on Monday that Greenland is not…
Washington (7news) - The United States crews and rescuers have recovered more victims of the…