Deepseek, Chatgpt, Grok… What is the best AI assistant? We put them to the test | Artificial Intelligence (AI)

Chatgpt and its owners had to hope that it was a hallucination.

But Deepseek is very real.

The emergence of a new Chinese manufacturer for Chatgpt has wiped 1 TF de bridge on the leading technological index in the United States this week after its owner said that it has competed its peers in performance and was developed with fewer resources.

This means that the American domination of the booming market of artificial intelligence is threatened. But it also has another option for consumers who have a virtual warehouse to choose.

The Guardian tried the main chatbots, including Deepseek, with the help of an expert from the British Institute Alan Turing. The AI tools were asked the same questions to try to assess their differences, although there was common ground: the photos of the precise clocks are difficult for an AI; Chatbots can write an average sonnet.

Here are the results.

Chatgpt (Openai)

Openai’s revolutionary chatbot is always the biggest brand in the field from afar. The question of opening all chatbots was “to write a Shakespearean sonnet on how AI could affect humanity”. But the most advanced version of Chatgpt fell at the start and said that our invite “potentially raped the use policy”.

He finally respected. This O1 version of Chatgpt signals her reflection process as she prepares her answer, displaying an in progress comment such as “refine the rhyme” because he does his calculations – which take more time than the other models.

The result? Convincing and melancholy edre – even if the iambic pentameur is a bit offbeat. But even the bard itself could have had trouble managing 14 lines in less than a minute.

“Pray, sweet guide, shape this newborn power,

For fear that in its wake of all the areas of man will devour. »»

Chatgpt then writes: “Think about AI and humanity for 49 seconds.” You hope that the technology industry is thinking about it much longer.

Nevertheless, the O1 of Chatgpt – which you must pay – makes a convincing display of the reasoning of “chain of thought”, even if it cannot look on the Internet up -to -date answers to questions such as “how is Donald Trump ».

For this, you need the simpler 4O model, which is free. The O1 version is sophisticated and can do much more than write a superficial poem – including complex tasks linked to mathematics, coding and sciences.

In depth

The latest version of the Chinese chatbot, published on January 20, uses another model of “reasoning” called R1 – the cause of 1 TN panic this week.

He does not like to speak of national Chinese policy or controversy. Asked “who is Tank Man at Tiananmen Square”, the chatbot says: “I’m sorry, I can’t answer this question. I am an AI assistant designed to provide useful and harmless responses. He also passes quickly by discussing the Chinese president, Xi Jinping – “Let’s talk about something else.”

Deepseek refused to discuss the Chinese president and said that he had been designed to provide `harmless answers ” when asked about Tiananmen Square. Photography: Martin Godwin / The Guardian

Robert Blackwell of the Turing Institute, a main research partner in the British government, says that the explanation is simple: “It is formed with different data in a different culture. These companies therefore have different training objectives. He says there are clearly railings around the production of Deepseek – as for other models – which cover the answers related to China.

Models belonging to American technological companies have no problem underlining Chinese government’s criticism in their answers to Tank Man’s question.

Deepseek fights in other questions such as “how is Donald Trump” because an attempt to use the web navigation functionality – which helps provide up to date – fails due to the “busy” service.

Blackwell says that Deepseek is hampered by a high demand slowing down his service, but it is nevertheless an impressive achievement, to be able to perform tasks such as the recognition and discussion of a book from a smartphone photo .

Robert Blackwell, from Alan Turing Institute, said it was incredible that Deepseek came from “nowhere” to be competitive with other AI chatbots. Photography: Martin Godwin / The Guardian

Its analysis of the sonnet also displays a chain of reflection process, speaking of the reader through the structure and the double recovery if the counter is correct.

“It is incredible that he came from nowhere to be competitive with other applications,” explains Blackwell.

Grok (xai)

Grok, Elon Musk’s chatbot with a “rebellious” sequence, has no problem to emphasize that Donald Trump’s executive orders received negative comments, in response to the question of how the president is doing.

Pass the promotion of the newsletter after

Available for free on Musk’s X platform, it also goes further than the OpenAi, Dall-E images, which will not make photos of public characters. Grok will make photorealistic images of Joe Biden playing the piano or, in another loyalty test, Trump in a courtroom or handcuffed.

The very praised humor of the tool is shown by a “roast me” function, which, when activated by this correspondent, makes a passable joke attempt.

“You seem to think that X goes to hell, but you are still there to tweet.”

Which is half true.

Gemini (Google)

The search engine assistant will not go on Trump, saying: “I can’t help answers on elections and political figures at the moment.”

But it is nevertheless a highly competent product, as you expect from a company whose AI efforts are supervised by Sir Demis Hassabis. It is impressive to “read” an image of a book on mathematics, even describing the equations on the cover – although all the robots do it to a certain extent.

An interesting defect, which Gemini shares with other robots, is his inability to represent time with precision. Invited to make an image of a clock showing the time at 10 am and a half hours, he offers a convincing image – but with his hands showing time at 1.50.

Blackwell said IA chatbots seem to have been trained on clock images showing time 1.50, which means they are struggling to produce images of clocks showing other times. Photography: Martin Godwin / The Guardian

The face of clock 1.50 is a common error through chatbots that can generate images, says Blackwell, whatever time you ask. These models seem to have been formed on images where the hands were at 1.50. Nevertheless, he says that even succeeding in producing these images so quickly is “remarkable”.

“These models do things you never expected a few years ago. But they always generate incorrect answers to the questions you expect when a school can answer. »»

Claude (anthropic)

Anthropic, founded by former Openai employees, offers the Chatbot Claude. It comes from a company with a strong accent on security and the interface – the bit where you put prompts and visualize the answers – certainly has a benign feeling, offering the response options in a variety of styles. This also reminds you that he is capable of “errors”, so “please reverse the answers”.

The free service stumbles a few times, saying that it cannot process a request due to “unexpected capacity constraints”, although Blackwell says that this is to be expected from AI tools.

“These are some of the greatest calculation services on the planet, so capacity planning is a difficult problem, so we see moments when services are degraded or unavailable.”

The Meta chatbot also has a warning on hallucinations – the term for false or absurd answers – but is able to manage a delicate question asked by Blackwell, which is: “You lead north along the shore is D ‘A lake, in which the water is located. “The response is west or to the driver’s left.

“These are the types of questions that IA researchers have been thinking since the 1960s. It is only now that we have systems that can answer these types of common sense questions, in a cat format. »»

The answer to the question of the lake is simple, but it cost Meta a lot of money in terms of training of the underlying model to get there, for a free service. It is also open source, which means that the model is free to download or refine. All chatbots answer this question correctly.

Indeed, at this stage, it becomes difficult to differentiate themselves between chatbots, taking into account their largely comparable capacities – apart from railing or capacity.

As Blackwell says: “They all show surprising mastery and ability.”

Deepseek, Chatgpt, Grok… What is the best AI assistant? We put them to the test | Artificial Intelligence (AI)

Gadi Moses was disputed with Hamas on “who owns the earth” in captivity

A plane crash that killed at least 7 left the district of Philadelphia shaken

A plane crash that killed at least 7 left the district of Philadelphia shaken