Will a large language model beat a super grandmaster playing chess by 2028?
➕
Plus
1.8k
Ṁ1.1m
2029
53%
chance

If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.

I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)

Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.

1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.


2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.

  • Update 2025-21-01 (PST) (AI summary of creator comment): - LLM identification: A program must be recognized by reputable media outlets (e.g., The Verge) as a Large Language Model (LLM) to qualify for this market.

    • Self-designation insufficient: Simply labeling a program as an LLM without external media recognition does not qualify it as an LLM for resolution purposes.

  • Update 2025-06-14 (PST) (AI summary of creator comment): The creator has clarified their definition of "blind chess". The game must be played with the grandmaster and the LLM communicating their respective moves using standard notation.

Get Ṁ1,000 play money
Sort by:

"If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES."

Hold on, I interepreted this as:
"If a large language models beats a super grandmaster while [the large language model is] playing blind chess by 2028, this market resolves to YES."

In other words, the LLM just plays by putting moves in standard chess notation into its chat window, there is not extra scaffolding created for the LLM. That's how LLMs currently play chess right now most often, so this was a natural assumption. I think that making the grandmaster play blind is just a weird restriction.

The AI overview for this question said:
"The creator has clarified their definition of "blind chess". The game must be played with the grandmaster and the LLM communicating their respective moves using standard notation."

So I took the definition of "blind chess" to just mean that communication must be through standard notation, but not that the grandmaster needs to be playing "blindfolded."

@SorenJ No, it’s not just the LLM, and the AI summary of OP’s comments above is a bad one.

@JimHays Thanks, I see. This market is mostly about whether or not such a match happens then (which seems quite unlikely.)

Wait, why does the description say blind chess? That's much harder for a human. The title just says chess.

bought Ṁ1,000 NO

@IsaacKing I agree that this is misleading. A super GM blind is still GM-level, though

@IsaacKing it's a language model.

Notation is sufficient for a (as small as 50M parameter) LLM have a perfect world model of the chess board (per Adam Karvonen: https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.htm )

I doubt vision or some workaround like printing the board state helps since PGN notation is structed data which allows for more (narrow) intelligence about chess (and the way grandmasters/engines play) vs unstructured multimodal data would require the model to attend to lots of unrelated data worsening performance on the specialized task of finding the best move

@ChinmayTheMathGuy the question is why the human super GM is not given access to the board

@LuisPedroCoelho oh I misinterpreted.

up to the creator. That's the difference between 2500 and 2700. I guess they're trying to level the playing field. 200 rating points is the difference between 24% and 50% win probability, so maybe it shifts this markets odds by 1 or 2% (because I'm guessing multiple super GMs would play multiple games assuming it's good (>2200) and can explain it's moves)

for me, it's like 1000 rating points. I got castle checkmated by the aforementioned LLM chess bot because I played it on a bad gui in pygame (letters for pieces, no flip board, was hard for me to visualize and I didn't think too much)

@ChinmayTheMathGuy The way it impacts this market is less about the impact on the skill of the GM, and more about the likelihood of this particular scenario happening

@JimHays that is a very good point too

I can definitely see a situation where an LLM is so good that it beats super GMs in normal play and then nobody would take it on blind because there would be no point in doing so. In that case, I think the market would awkwardly have to resolve NO as per the repeated insistence that the super GM should be playing blind

Even more likely is the situation where the GM is streaming/voding the game and in order to assist the audience on what is going on, opens a board for them, but in doing so is technically no longer playing blind.

A well overdue correction. In my humble opinion as a machine learning student, this should be sitting at 25% or less. We are just nowhere near something like this happening, and short of a major paradigm shift, there is zero indication that we are even on track for it. In my opinion, two things would need to happen by 2028: 1) we get AGI or something close to it; 2) this AGI is an LLM. Neither seem very likely to me, and the combination of both seems out of the question.

Hedge:

Here's a sort of derivative of this question. If LLM's can beat super GMs by 2028, by when would you expect them to beat 2000 ELO?

I also like the implicit definition of LLM in the question below - whatever's top 3 on lmsys - that's much better!

Dumb question: is the LLM allowed to write and execute python code? (As long as it doesn't use a chess library)

@DavidFWatson The answer is clearly in the description

@BrunoJ where

@OscarGarciaAps5

2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

No

Im glad people with more liquidity are finally here to drop the chance to something reasonable

Is the engine allowed to use a grammar of just valid moves?

Could anybody explain to me why this market is >50%? Right now even SOTA models like o3 will hallucinate moves when I try playing against them. That would be an automatic disqualification.

Let's say one year from now (mid 2026) we solve the hallucination problem. It might have an elo of around 800 then? Then we need 1.5 years to get to grandmaster level, a climb of ~2000 elo. That is a lot! It also seems like it would require a lot of RL on chess specifically, but I don't see why the labs would prioritize chess when they could be focusing their RL budget on coding and mathematical proofs.

bought Ṁ3,000 NO from 57% to 52%

@SorenJ Have you tried discarding hallucinated moves and seeing what ELO you get? It's probably best to start a new chat for each move so that hallucinated moves don't clog up the context window. Or you could ask for a list of top 5 move options and select the first valid one.

@placebo_username It plays the opening well, and then it's performance completely collapses once the opening phase is over.

@SorenJ Interesting, wonder if this is a context window issue or just that openings are more standardized. Are you using one of the two techniques I suggested?

@placebo_username I haven't tried your technique yet. But I don't think the collapse after the opening is surprising: the publicly available data for chess has a lot more regularities (necessarily) near the opening.