Meta's landmarks for its new AI models are a bit misleading

One of the new AI Meta flagship models published on Saturday, Maverick, ranks second on LM Arena, a test that has human assessors compare the results of the models and choose what they prefer. But it seems that the version of Maverick that Meta deployed on LM Arena differs from the version widely available for developers.

As many IA researchers have underlined it on X, Meta noted in its announcement that the Maverick on LM Arena is an “version of experimental cat”. A graphic on the official site of Llama, on the other hand, reveals that the LM Arena tests of Meta were carried out using “Llama 4 Maverick optimized for the conversation”.

As we have written before, for various reasons, LM Arena has never been the most reliable measure in the performance of an AI model. But AI companies have generally not personalized or otherwise refined their models to better mark on LM Arena – or have not admitted to do so, at least.

The problem of adapting a model to a reference, to retain it, then the release of a “vanilla” variant of this same model is that it is difficult for developers to predict exactly how the model will work in particular contexts. It is also misleading. Ideally, the references – terribly inadequate as inadequate – provide an instantaneous strengths and weaknesses of a single model through a range of tasks.

Indeed, researchers on X observed differences struck in the behavior of the Maverick downloadable publicly compared to the model hosted on LM Arena. The LM Arena version seems to use a lot of emojis and give incredibly long answers.

Oking Llama 4 is due to a lol cooked almost, what is this yap city pic.twitter.com/y3gvhbvz65
– Nathan Lambert (@natolambert) April 6, 2025

For any reason, the Llama 4 model in Arena uses many more emojis
on set. Ai, it seems better: pic.twitter.com/f74odx4ztt
– Tech Dev Notes (@TechDevnotes) April 6, 2025

We contacted Meta and Chatbot Arena, the organization that keeps LM Arena, to comment.

Meta’s landmarks for its new AI models are a bit misleading

Uconn wins the 12th NCAA women’s basketball title with a dominant victory over South Carolina: NPR

The actor “Dennis the Menace” was 73 years old

The actor "Dennis the Menace" was 73 years old