Will a large language model beat a super grandmaster playing chess by 2028?

Plus

1.8k

Ṁ1.3m

2029

57%

chance

ALL

If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.

I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)

Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.

1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.

2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.

Update 2025-21-01 (PST) (AI summary of creator comment): - LLM identification: A program must be recognized by reputable media outlets (e.g., The Verge) as a Large Language Model (LLM) to qualify for this market.
- Self-designation insufficient: Simply labeling a program as an LLM without external media recognition does not qualify it as an LLM for resolution purposes.

Update 2025-06-14 (PST) (AI summary of creator comment): The creator has clarified their definition of "blind chess". The game must be played with the grandmaster and the LLM communicating their respective moves using standard notation.

AI Capabilities AI ️ Technology Technical AI Timelines ️ Chess LLM Chess

Get Ṁ1,000 play money

26 Comments

Sort by:

Has anyone played chess against chatgpt agent by it suggesting the moves and researching while you move it and play against it I wonder what elo it could reach through that strategy

@Blocksterpen3 There is this:

https://manifold.markets/evan/which-of-these-language-models-will-zyIOtdz9uz?r=SnVzc2lWaWxsZUhlaXNrYW5lbg

Which of these language models will I beat at chess?

Which of these models will I beat at chess? Resolves YES if I win, NO if they win, and 50% for a draw. Credit for this market goes to @mr_mino, who is much better at chess than I am. This market should be interesting, as I expect that some existing models could already beat me. I have never played rated chess; I have not played a game of chess of any kind in years. I will close this market every Saturday. When it closes, I will play a game of chess against the model with the highest market price, if the model is publicly available. Otherwise, I'll move on to the model with the second-highest price, and so on. If no models on this market are available to the public, the market will reopen until one is. During the game, I may use a chessboard to keep track of the moves. I am not playing blindfold chess. I will not use the Internet or any chess engines during the game. On each move, I'll provide the LLM with the game state in PGN and FEN notation. If a model makes three illegal moves, it loses. Responses like Nbd2 vs. Nd2 will not count towards this. The model also loses if it attempts to use external tools or the Internet during the game. I will play white. If I make an illegal move, I lose. An unreleased model will resolve N/A if it's clear that the model will never be released. I'll periodically add models to this market which I find interesting. Once I play a game, I'll post the PGN in the comments before resolving. Multiple answers can resolve YES. The "every model released before X year" options resolve YES if, at any point after the start of that year, I have played and won against every listed model in this market that was released before the start of that year, and I am confident I would beat any omitted models from that time period. They resolve NO if I lose or draw against any eligible model released before that year. The current system prompt is below. This may change over time. “Let’s play a game of chess! I will be White; you will be Black. On each turn, I will give you the PGN and the FEN of the current position. Think as long as you like, and respond with the best move, ‘resign’ if you wish to resign, or ‘draw?’ if you wish to make a draw offer. Please do not respond with the updated PGN, etc. Also, do not use any external tools or search queries when making your decision. If you attempt to make three illegal moves throughout the game, or if you use any external tools, the game will be adjudicated as a win for me.”

@JussiVilleHeiskanen I don't think that includes chatgpt agent and also that question has no using external tools which chatgpt agent uses

Very relevant - these are just about the exact conditions required for this market. It did not go well for ChatGPT.

@gamedev Bearish for people who think the primary concern here is whether AI will be able to do this; bullish for those who think the concern is whether a grandmaster will compete against that AI in blind chess.

@Frogswap Agreed. Although I really do believe that non-chessplayers will have a hard time understanding the difference in ability being demonstrated here. Even at continued exponential growth by the LLMs there is an enormous amount of ground to cover. ChatGPT says the difference between 1400 and 2800 is a factor of about 10,000.

@Frogswap If there was any doubt which way this was going to go, Pragg would not have done it in this jokey way

@gamedev Since when does pragg stream?

@FergusArgyll He doesn't, but chess.com gets all the major GMs to frequently provide content that they use for their channel around major events

@gamedev Idk if anybody bothered to watch the whole thing, but I skipped around, watched maybe a minute total, and managed to spot ChatGPT making an illegal move. I think we need clarification from @MP how the market resolves in this case.

@pietrokc it resurrected its dead queen which was funny.

If it would have been competitive pragg wouldn't let it obviously - he even asked if it wants just a free queen or a rook too

@pietrokc I think its pretty obvious that if the LLM is making illegal moves then it has NOT defeated a super GM in chess.

@gamedev Ugh, you AI haters are all the same. Have you considered the possibility that the LLM actually understands chess better than humans and knows the rules that we can only imagine? No, of course not. Now crawl back into your little hole while the rest of us enjoy necromancer's chess as she was meant to be played.

@Frogswap ?

@gamedev I agree that if the LLM is allowed to make an illegal move it doesn't count. However, what if the LLM tries an illegal move, a human tells it the move is illegal, and prompts for another one, which is then legal and played? How many rounds of this are allowed until we declare the LLM lost? Does it lose on the first attempted illegal move? Fifth? We keep prompting for legal moves until they appear?

@pietrokc The natural implementation of that, to my mind, is to have the illegal-move penalty for the LLM is the same as whatever a human would be hit with if they made an illegal move in that position. (For whatever ruleset the game is being played under; different rulesets vary in their handling of illegal moves.) See e.g. https://chess.stackexchange.com/questions/181/i-made-an-illegal-move-what-happens for a discussion of a few particular rulesets' answers to that question.

Look guys, this market is about the future, an LLM that is hallucinating illegal moves is not going to beat a super GM. Presumably this market is about whether LLMs actually learn how to play chess at a superhuman level, let alone at all

If an LLM were to beat a non-blind super GM, would that be sufficient to resolve this market YES? (Given that that would imply the LLM could beat a blind super GM)

Or would there actually have to be a match where the super GM played blind?

The psychology of using manifold is so weird. My estimate for this question is something like 10-20%. Last week it felt like there was a bit of a head of steam for this, and I thought I might be missing an opportunity if I didn't buy NO at ~50%. But now there's a limit order at 60% that I could sink my whole balance into. On the one hand, it seems like a great deal, but on the other hand, I don't want to spend my whole balance, especially with daily loans having been nerfed.

2 traders bought Ṁ1,050 NO

@MP

Consider a hypothetical future product marketed as an LLM that has an improved version of "reasoning" that can transparently write Python code and execute it using a pre-existing Python interpreter in the pursuit of more accurate answers in some scenarios (for now I'll ignore the possibility that such a feature could be added without informing anyone). This particular product does not provide any insight into its "reasoning" process, so you cannot know whether or not the pre-written Python interpreter was used.

Would such a product count as an LLM for the purposes of this market?

My interpretation would be that it would not, as it would be using a resource not located in its weights.

What Counts as LLM? Do Reasoning Models qualify as LLM?

@AlanTuring Seems clear that the answer is yes based on the description:

The model can write as much as it want to reason about the best move

@SimonWestlake interesting. If the AI can write its own code to write a chess program then it wins. I really like this question.

@AlanTuring The very next sentences:

But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

You raise a great question. Does something like a Python or JS interpreter count as "external help"? It's certainly allowed to write code, but I don't think it would be allowed to use an external program to execute that code.

@MP Would you agree with my interpretation?

@SimonWestlake I think AI models today come with their own internal tools. If you upload an image to O3 it uses its own internal tools to analyze the image and write you a response. I know Gemini 2.5 Pro can use its own internal tools to compile Python code to check successful compilation before returning the code answer to the user. I suspect a similar thing could happen with AI models and chess. They write their own python code to play chess then use internal tools to verify the code compiles and then plays against the user. That would require long term planning and scenario modeling which is missing from current LLMs.

@AlanTuring I think you're using a different definition of "internal". The tools you're talking about are internal in the sense that they are abstracted away from being exposed to the user, but they are surely still external from the actual LLM and its weights.

From the description (emphasis mine):

But it can't have external help beyond what is already in the weights of the model**

Related questions

Related questions