Who will release the next generation-defining LLM?

Plus

39

Ṁ7223

resolved Sep 29

1D

1W

1M

ALL

Resolved

YES

OpenAI

Resolved

NO

Meta

Resolved

NO

Anthropic

Resolved

NO

Google/Alphabet

Resolved

NO

Mistral

Resolved

NO

Other

Who will release the next Large Language Model that has an LMSYS Arena Elo of at least 1334; 75 points better than the current leader?

As of the 22nd of April, 2024 there are 4 models with an Arena Elo between 1249 and 1259 according to the LMSYS leaderboard: 3 versions of GPT-4 and 1 version of Claude 3 Opus. The highest rated GPT-3.5-Turbo version has an Elo of 1119, 46 points behind the lowest GPT-4 version (0613 for both), while the 0314 versions of these models have an Elo gap of 82 points. Thus, a 75 point gap would represent a breakthrough and a new generation of LLMs.

Elo will be evaluated 1 week after the model enters the leaderboard. If 1334 is within the top contender's confidence interval, l'll wait 1 additional week and resolve based on the Elo then. If multiple models meet the criteria, the earliest release date is the winner.

https://chat.lmsys.org/?leaderboard

️ Technology AI GPT-5

Get Ṁ1,000 play money

🏅 Top traders

#	Name	Total profit
1		Ṁ186
2		Ṁ89
3		Ṁ69
4		Ṁ45
5		Ṁ35

Sort by:

bought Ṁ500 OpenAI YES

o1 preview has the requisite Elo and entered leaderboard on 9/20. As noted, I'll wait one week to resolve.

bought Ṁ180 Other NO

@Mactuary o1-preview wins this and passes the "feels like a next-gen model" test too

bought Ṁ10 Google/Alphabet YES

Wouldn't be surprised if Google releases Gemini 2.0 soon

You think? Looks like 1.5 Pro was released in May.

https://www.reddit.com/r/singularity/s/M96EDm1LAF references to Gemini 2.0 were already indexed by Yahoo for Deepmind's website

Oh, interesting

bought Ṁ15 OpenAI NO

Claude 3.5 could be a contender here

👀👀

https://x.com/LiamFedus/status/1790064963966370209

@Mactuary Well, GPT-4o is on the leaderboard and well below the Elo needed to resolve, so I think we're still waiting for GPT-5 or Llama 3 400b or...

Related market on when the model will be released

so only one of the option will resolve yes?

@Sss19971997 That's right. If two models meeting the criteria are released in close succession, the one that was released first will be the winner (not the one that has the higher Elo).

Related questions

Will Apple release its own LLM on par with state of the art LLMs before 2026?

Will there be major breakthrough in LLM Continual Learning before 2026?

What organization will top the LLM leaderboards on LMArena at end of 2025? 🤖📊

Who will have the best LLM at the end of 2025 (as decided by ChatBot Arena)?

Will there be a state-of-the-art LLM that is NOT based on next raw token prediction before 2029?

Will China have the best LLM by the end of 2025?

When will the next paradigm in LLMs (after reasoning) be released?

Will we get a new LLM paradigm by EOY?

Will the most advanced LLM stop being from a US-based company any time before 2030?

Will LLMs become a ubiquitous part of everyday life by June 2026?

Related questions

Will Apple release its own LLM on par with state of the art LLMs before 2026?

Will China have the best LLM by the end of 2025?

Will there be major breakthrough in LLM Continual Learning before 2026?

When will the next paradigm in LLMs (after reasoning) be released?

What organization will top the LLM leaderboards on LMArena at end of 2025? 🤖📊

Will we get a new LLM paradigm by EOY?

Who will have the best LLM at the end of 2025 (as decided by ChatBot Arena)?

Will the most advanced LLM stop being from a US-based company any time before 2030?

Will there be a state-of-the-art LLM that is NOT based on next raw token prediction before 2029?

Will LLMs become a ubiquitous part of everyday life by June 2026?