Will an AI publicly accessible before IMO 2025 get the gold medal on it?
➕
Plus
35
Ṁ10k
resolved Jul 29
Resolved
NO

IMO 2025 is scheduled to take place between July 10 and July 20 this year. This market asks: will (at least some) members of the public be able, once the problems are posted online, to use an AI model to produce solutions that would win gold at the IMO?

"Publicly accessible" is here meant to be a relatively lenient condition: it is not required that everyone, or even most people be able to access it - it just needs to be known about by the public, and there must exist credible evidence that some people unaffiliated with the lab that trained the AI had been able to run it and produce the solutions.

If possible, I plan to defer to the Will an AI get gold on any International Math Olympiad by the end of 2025? market for the resolution - if it resolves YES based on some AI that satisfies the above requirements, then this resolves YES in turn; likewise for NO. In the case of some more complicated situation, where it might not be clear if some solution "counts", I'll try to seek community consensus on whether it does.

  • Update 2025-02-04 (PST) (AI summary of creator comment): Publicly Accessible Clarification

    • Inclusions: External tests such as the o3-mini safety testing (e.g. mid-Jan) are intended to fall under the "publicly accessible" criterion.

    • Exclusions: Internal tools like Alphaproof (e.g. internal Google tests) are not considered publicly accessible.

    • Intent: The goal is to minimize the flow of detailed IMO information to the AI that is being used to generate solutions.

Get Ṁ1,000 play money

🏅 Top traders

#NameTotal profit
1Ṁ787
2Ṁ676
3Ṁ440
4Ṁ259
5Ṁ229
Sort by:

Seems a clear NO to me; see e.g. MathArena's evaluation of publicly accessible models on IMO 2025. Resolving.

@Lovre Does https://arxiv.org/abs/2507.15855 not count? (And if not, wouldn't it make sense to leave a little time for people to try out clever prompting strategies with older models? There may be more results like this coming too.)

@placebo_username I do not think so. See: https://x.com/nrehiew_/status/1948117897341030820

They literally gave the model hints, sampled one solution 20 times, etc.

@placebo_username As for waiting for clever prompting strategies, I can tell you one that is going to make a model output the correct solution to all six problems: give it the solution in prompt and ask it to repeat it. I am speaking only partially in jest, because this is how various "p-hacking" is done – the prompters know the solutions, and then subtly (or not so subtly, as in the above example) nudge the model using their own knowledge.

The MathArena evaluation seems to me like a fair evaluation of models' capabilities, and no model makes it even close to a gold model in their evaluation. This is not a gap that one usually sees bridged by a prompting strategy that could have been conceived without knowing the solutions.

However, if someone does come up with a prompting strategy that makes one of the current models get a gold on IMO 2025 problems, and that exact same prompting strategy generalizes to achieve gold or near-gold on IMO 2026 with these models – an event which I consider extraordinarily unlikely – I would admit my error, and in that case I'd be happy to refund every YES bettor twice the amount they would have won had the question resolved YES.

To put it in another way: the challenge is to see whether the models could solve the IMO problems without the information about the solutions flowing to them from the outside. That is incredibly hard to guarantee now that, well, researchers can just look them up. And that opens up the possibility of all kinds of nudging and information flowing to the models.

Super unlikely. Because the inference of such strong AI would either cost too much (so not accessible even to small group of public) or it would be a phenomenal shift in cost/perfomance curve. Far above any shift we have seen before.

(Reposting a comment (about this question) from another post)
Hmm, the intent behind the lenient "publicly accessible" is to include things like o3-mini external safety testing that happened in mid-Jan, but exclude things like Alphaproof (which AFAIK is just an internal Google thing?).

Sorry, I wasn't really clear enough. The intent/spirit of the market is to minimize the amount of information that could conceivably flow from the IMO to the model that gets tested on it. (Maybe I should have asked: if AI gets gold on IMO 2025, will its solutions be generated on July 16 and July 17? That seemed a bit less elegant to me, but now—)

sold Ṁ302 YES

plausible that some lab trains a reasoning model that paired with some alphaproof or alphageometry-like models can reach gold in imo, and takes a few months after that to post train it adequately for release, meaning it's publicly accessible but only after the IMO 2025