If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.
I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)
Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.
1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.
2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.
I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.
Update 2025-21-01 (PST) (AI summary of creator comment): - LLM identification: A program must be recognized by reputable media outlets (e.g., The Verge) as a Large Language Model (LLM) to qualify for this market.
Self-designation insufficient: Simply labeling a program as an LLM without external media recognition does not qualify it as an LLM for resolution purposes.
Update 2025-06-14 (PST) (AI summary of creator comment): The creator has clarified their definition of "blind chess". The game must be played with the grandmaster and the LLM communicating their respective moves using standard notation.
@Blocksterpen3 There is this:
@JussiVilleHeiskanen I don't think that includes chatgpt agent and also that question has no using external tools which chatgpt agent uses
Very relevant - these are just about the exact conditions required for this market. It did not go well for ChatGPT.
@gamedev Bearish for people who think the primary concern here is whether AI will be able to do this; bullish for those who think the concern is whether a grandmaster will compete against that AI in blind chess.
@Frogswap Agreed. Although I really do believe that non-chessplayers will have a hard time understanding the difference in ability being demonstrated here. Even at continued exponential growth by the LLMs there is an enormous amount of ground to cover. ChatGPT says the difference between 1400 and 2800 is a factor of about 10,000.
@Frogswap If there was any doubt which way this was going to go, Pragg would not have done it in this jokey way
@FergusArgyll He doesn't, but chess.com gets all the major GMs to frequently provide content that they use for their channel around major events
@pietrokc it resurrected its dead queen which was funny.
If it would have been competitive pragg wouldn't let it obviously - he even asked if it wants just a free queen or a rook too
@pietrokc I think its pretty obvious that if the LLM is making illegal moves then it has NOT defeated a super GM in chess.
@gamedev Ugh, you AI haters are all the same. Have you considered the possibility that the LLM actually understands chess better than humans and knows the rules that we can only imagine? No, of course not. Now crawl back into your little hole while the rest of us enjoy necromancer's chess as she was meant to be played.
@gamedev I agree that if the LLM is allowed to make an illegal move it doesn't count. However, what if the LLM tries an illegal move, a human tells it the move is illegal, and prompts for another one, which is then legal and played? How many rounds of this are allowed until we declare the LLM lost? Does it lose on the first attempted illegal move? Fifth? We keep prompting for legal moves until they appear?
@pietrokc The natural implementation of that, to my mind, is to have the illegal-move penalty for the LLM is the same as whatever a human would be hit with if they made an illegal move in that position. (For whatever ruleset the game is being played under; different rulesets vary in their handling of illegal moves.) See e.g. https://chess.stackexchange.com/questions/181/i-made-an-illegal-move-what-happens for a discussion of a few particular rulesets' answers to that question.
The psychology of using manifold is so weird. My estimate for this question is something like 10-20%. Last week it felt like there was a bit of a head of steam for this, and I thought I might be missing an opportunity if I didn't buy NO at ~50%. But now there's a limit order at 60% that I could sink my whole balance into. On the one hand, it seems like a great deal, but on the other hand, I don't want to spend my whole balance, especially with daily loans having been nerfed.
Consider a hypothetical future product marketed as an LLM that has an improved version of "reasoning" that can transparently write Python code and execute it using a pre-existing Python interpreter in the pursuit of more accurate answers in some scenarios (for now I'll ignore the possibility that such a feature could be added without informing anyone). This particular product does not provide any insight into its "reasoning" process, so you cannot know whether or not the pre-written Python interpreter was used.
Would such a product count as an LLM for the purposes of this market?
My interpretation would be that it would not, as it would be using a resource not located in its weights.
@AlanTuring Seems clear that the answer is yes based on the description:
The model can write as much as it want to reason about the best move
@SimonWestlake interesting. If the AI can write its own code to write a chess program then it wins. I really like this question.
@AlanTuring The very next sentences:
But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.
You raise a great question. Does something like a Python or JS interpreter count as "external help"? It's certainly allowed to write code, but I don't think it would be allowed to use an external program to execute that code.
@MP Would you agree with my interpretation?
@SimonWestlake I think AI models today come with their own internal tools. If you upload an image to O3 it uses its own internal tools to analyze the image and write you a response. I know Gemini 2.5 Pro can use its own internal tools to compile Python code to check successful compilation before returning the code answer to the user. I suspect a similar thing could happen with AI models and chess. They write their own python code to play chess then use internal tools to verify the code compiles and then plays against the user. That would require long term planning and scenario modeling which is missing from current LLMs.
@AlanTuring I think you're using a different definition of "internal". The tools you're talking about are internal in the sense that they are abstracted away from being exposed to the user, but they are surely still external from the actual LLM and its weights.
From the description (emphasis mine):
But it can't have external help beyond what is already in the weights of the model**