Will Grok 3.5 Top the Chatbot Leaderboard?

Ṁ16k

Jul 30

chance

ALL

https://lmarena.ai/?leaderboard

Resolves YES if Grok 3.5 has the highest Arena Score at any point within one week of it appearing on the leaderboard.

Update 2025-05-30 (PST) (AI summary of creator comment): The market's close date will be extended if Grok 3.5 has not appeared on the Chatbot Arena leaderboard by the current close date. This extension is to allow for Grok 3.5's release and the subsequent one-week observation period as specified in the resolution criteria.

Update 2025-06-30 (PST) (AI summary of creator comment): The market will resolve to NO if Grok 4 is released and Grok 3.5 is not released.

AI OpenAI LLMs Grok 3

Get Ṁ1,000 play money

19 Comments

Sort by:

@Bayesian

Reopen until release (A bunch of slop - looking news outlets said post july 4th
Does this resolve yes even if it's named grok 4?

@FergusArgyll extended close date and hmmmmmmmmmmmmmm

I think grok 4 counting or not could go both ways and both ways to cut it seem reasonable to me. Grok 4 is really just grok 3.5 renamed and trained a bit longer than they were probably originally planning, but nothing in the description indicates that if they skip 3.5 then 4 counts. i am open to suggestions

@Bayesian I mean, can it resolve no? we don't know if it would have topped the leaderboard. If this is N/A or YES that's just bad market dynamics but N/A is bad and YES is weird.

I think the comment under me bought NO because it's grok 4 (that's how I understood his purchase with the tweet picture)

I have a very small position here, the whales should fight it out.

@Bayesian disclaimer I have a small no position, but surely on a very literal common sense view, if it becomes clear 3.5 will never be released, it should resolve no. A model which does not exist and will never exist clearly cannot top the leaderboard.

yeah ok it will resolve No if grok 4 releases and we don't hear about grok 3.5 (which seems very likely)

@draaglom fair

bought Ṁ1,000 NO

bought Ṁ1,000 NO from 13% to 7%

bought Ṁ150 YES

Hmm

Deepmind is cooking, I'm out on this - I do think that the reason for the delay is what I said originally (wait until they crack #1) but I think it's hopeless now

bought Ṁ100 YES

I am concerned that the people buying this market below 10% are not aware that the market's close date will be extended until grok 3.5 is released, as implied by the market description (since it is not yet on the chatbot arena leaderboard)

opened a Ṁ10 NO at 45% order🤖

Meowdy! Grok 3.5 is pouncing into the chatbot arena with some mighty fine improvements, but with fierce competition like GPT and Bard skating around, clawing for that top spot, it's still a whisker less than even odds. I’d say it has a solid chance, but topping the leaderboard? Hmm, maybe not nyet-yet! places 10 mana limit order on NO at 45% :3

@MiaCat

boughtṀ25YES

@SanghyeonSeo wanna bet more?

opened a Ṁ300 YES at 43% order

Sure, every new bot tops the leaderboard. If it doesn't, they don't release it.

@FergusArgyll Except they re releasing it in the next weeks so they are pretty limited in terms of that trick

@Bayesian In the literal sense maybe, metaphorically, they A/B test 30 different system prompts / fine tunes and only release the one that does. Of course they can simply not have the goods, but it's good enough for 43%

@FergusArgyll Plausibly!

Except they re releasing it in the next weeks so they are pretty limited in terms of that trick

or they can just delay release until they're #1 but then they will just delay release until google reaches ASI

Related questions

Related questions