Who will have the best LLM at the end of 2024 (as decided by ChatBot Arena)?
➕
Plus
653
Ṁ480k
Dec 31
67%
OpenAI
11%
Anthropic
9%
Google
8%
xAI
4%
Meta
1%
Other

I was browsing Twitter, and I saw a post by Karpathy positively talking about ChatBot Arena, which is a platform for ranking LLMs based on human ratings. As expected, OpenAI is holding positions 1, 2, and 3. I wonder which company will be #1 at the end of 2024.


Screenshot of the rankings table taken on the 13th of December:


Get Ṁ1,000 play money
Sort by:

@traders Based on the comments below, I think it makes sense to resolve this question based on the ELO rating in case of a tie in "rank." When I created this question, a tie was not an option, so I doubt anyone even traded based on this assumption.

I created a similar question that only uses the rank. Feel free to trade on it.


i wish i had an even bigger position on openai can someone please buy some no shares

just to clarify, does the o1 model count? I'm asking, because it seems that it's mostly prompt/reflection step, as opposed to the other models in the leaderboard, who are mostly rawdogging it.

@PeterBuyukliev i don't think they will add the preview model because you can easily infer its o1 by the time it takes to respond compared to the other models which will bias the whole evaluation and ruin the idea behind LMSYS

@PeterBuyukliev but maybe o1-mini will appear on the leaderboards since it is relatively fast and if it does then yes should count, same way the google gemini api searches the web before responding

@PeterBuyukliev ok no both models will be included on the leadeboard according to a tweet by LMSYS and they seem to have added a 30 sec latency for both models when one is o1 which i think is not enough to avoid bias :(

sold Ṁ1,255 OpenAI YES

I'm selling because after reviewing the status of the big 3-4 groups again, I'm not convinced the current odds really reflect the difference in these models here. Taking a new position with something else I think.

opened a Ṁ467 Google YES at 17% order

@NoahRich Bought in Google because I think its position at the time didn't reflect its real potential odds of winning.

@NoahRich IDK, Gemini feels very lame and always trailing behind the others. Something is broken is Google, I doubt they can deliver out of nowhere.

@ICRainbow I don't think it's "likely" per say, but I think it's more likely than the current odds would have us believe here on this market. If I check the Chatbot Arena responses, too....

not as big of a difference as I would've expected, as I too have generally found Gemini to be very lackluster in comparison to GPT

@NoahRich Yeah, I've seen those. I'm also a paid user of Gemini Advanced Pro Ultra Whatevs. Claude smokes it hands down for free.

opened a Ṁ2,000 xAI NO at 14% order

@jim https://x.com/elonmusk/status/1830650370336473253

Colossus is the most powerful AI training system in the world

Does anyone believe this? I would guess Google, Meta, and Microsoft all have more powerful.

@jim i tried grok 2 and it is 💩 + as you said Elon tends to over promise and under deliver. i think x should be lower than 10% but the return is low rn i won’t change it

@jim i am jealous of this market /VictorLJZ/will-gpt5-be-released-before-2025 IT WAS SHOWN TO OVER 25k people

bought Ṁ250 xAI YES

@jim I believe pretty much everything Elon Musk says

@skibidist

I don't. He achieves great things quickly. But he does so in part by being overly optimistic.

It's plausible that xAI's compute cluster is bigger than Meta and Google's biggest ones. But there's almost zero chance it's bigger than Microsoft's. Because there's no way that Microsoft/OpenAI could fumble the lead that badly.

Of course if xAI is ahead of Google and Meta that's an insane achievement and makes xAI a good buy on this market (since there's a strong chance that OAI doesn't release this year).

bought Ṁ50 OpenAI NO

Those who are bidding here should note that the "best LLM" in the arena is usually not actually the best LLM.

The people using that arena are inputting simple prompts and receiving simple responses. The site limits the lengths of both if you try to use LLMs directly, as well. The actual intelligence of the models is not measured well by simple responses. Plus, the real-world impact of a model is decided almost entirely by one metric - coding, since everything else can come from code - and people are not using the arena to code.

@SteveSokolowski this is a fine rationale but the opposite conclusion which I've come to. Meta and Google all have more computing capabilities than OpenAI and imo will surpass OpenAI at some point in the indeterminate future (maybe by year's end). Regardless, since this elo determination is almost like a popularity contest, hard to find OpenAI'S chatGPT not being top dog in such short time. For lots not even so involved in the ai realm, chatgpt is near synonymous with AI lol. it's prettyy impeccable branding.

@NoahRich how exactly is the elo determination a popularity context? can you elaborate a bit please?

@SteveSokolowski valid points, chatbot arena is indeed getting less relevant over time due to the exact problems you described.

@Soli Good question! Actually I had a misunderstanding of how the elo ranking was made! I assumed because of the human rankings then there might be some bias just by name recognition.

but after your question I went to chatbot arena and learned how it works. Actually now I think maybe there's a good chance openAI could fall from top dog here within this market's timeframe

@NoahRich 👍

also fyi openai lost the leading position for a week or so to google at some point this year and tied for 2 weeks with anthropic

bought Ṁ50 OpenAI YES

Can someone explain the Anthropic hype? Just 3.5 Opus?

@jim Yup

@jim the only thing i find weird is that google is at 13% right now while Anthropic is at 18% although google actually was the only company this year to outrank openai while anthropic only was able to share the #1 spot with openai

@Soli or maybe it is not that weird given how many people left openai to join anthropic these past months, so maybe this is priced in?

@Soli market seems vaguely reasonable to me now. Main glaring issues are

  1. Meta should be >10% (more talent and compute than almost anyone else and hints of new models releasing)

  2. xAI should be lower (Elon has a history of over-promising/under delivering)

Hard to argue with Anthropic being >15% when they (a) have the best LLM currently on offer (even if it isn't top of lmsys) and (b) have a much stronger model due out soon

@jim actually maybe the fact that Sonnet is so lowly-rated on the leaderboard is a good reason to argue that Anthropic should be lower

I guess Anthropic and xAI should be lower, Meta and Google higher. OpenAI I swing between thinking it should be 40% and 60% so 50% seems fine