One of the new AI Meta flagship models published on Saturday, Maverick, ranks second on LM Arena, a test that has human assessors compare the results of the models and choose what they prefer. But it seems that the version of Maverick that Meta deployed on LM Arena differs from the version widely available for developers.
As many IA researchers have underlined it on X, Meta noted in its announcement that the Maverick on LM Arena is an “version of experimental cat”. A graphic on the official site of Llama, on the other hand, reveals that the LM Arena tests of Meta were carried out using “Llama 4 Maverick optimized for the conversation”.
As we have written before, for various reasons, LM Arena has never been the most reliable measure in the performance of an AI model. But AI companies have generally not personalized or otherwise refined their models to better mark on LM Arena – or have not admitted to do so, at least.
The problem of adapting a model to a reference, to retain it, then the release of a “vanilla” variant of this same model is that it is difficult for developers to predict exactly how the model will work in particular contexts. It is also misleading. Ideally, the references – terribly inadequate as inadequate – provide an instantaneous strengths and weaknesses of a single model through a range of tasks.
Indeed, researchers on X observed differences struck in the behavior of the Maverick downloadable publicly compared to the model hosted on LM Arena. The LM Arena version seems to use a lot of emojis and give incredibly long answers.
Oking Llama 4 is due to a lol cooked almost, what is this yap city pic.twitter.com/y3gvhbvz65
– Nathan Lambert (@natolambert) April 6, 2025
For any reason, the Llama 4 model in Arena uses many more emojis
on set. Ai, it seems better: pic.twitter.com/f74odx4ztt
– Tech Dev Notes (@TechDevnotes) April 6, 2025
We contacted Meta and Chatbot Arena, the organization that keeps LM Arena, to comment.
Could you hear the sigh of collective relief through Hollywood? After a lamentable start until…
London Cnn - Rich business leaders turn against US President Donald Trump on his plan…
This story is available exclusively to subscribers of commercial initiates. Become an initiate and start…
The royal family cries an appreciated member of the service. Graham "Crackers" Crakerwhich served as…
Police investigation after incident where Jack Grenish would have been slappedListen now: everything is launched!…
Nick Castellanos (Phi): 1-4, HR, R, 4 RBI Nick Castellanos had a huge success in…