Will AI be capable of producing an Annals-quality math paper for $100k by March 2030?

165

Ṁ190k

2030

46%

chance

ALL

I (Tamay Besiroglu) bet the mathematician Daniel Litt that the best AI models in March of 2030 would be capable of generating math papers in Number Theory at the level of quality of papers published in Annals today (i.e. 2025). https://x.com/tamaybes/status/1899262088369106953?s=46

The AI would receive no detailed guidance relevant to the mathematics research, and is required to accomplish this task autonomously.

The AI system(s) are granted a total budget of $100k in inference compute per paper.

This bet would be resolved on the basis of Daniel Litt’s judgment.

Update 2025-03-21 (PST) (AI summary of creator comment): Novel Research Requirement Clarification:
- For a YES resolution, the AI must perform novel research autonomously, not just produce a paper that could pass as research.

Update 2025-03-23 (PST): - Budget Currency: The $100k inference compute budget is expressed in nominal dollars (current currency) with no inflation adjustment. (AI summary of creator comment)

Update 2025-05-17 (PST) (AI summary of creator comment): The creator endorsed an interpretation (via a previously posted ChatGPT response to a user's question) regarding the market's resolution. This endorsement suggests:
- The market generally requires demonstrating repeatable capability in generating Annals-quality math papers.
- However, a single, exceptionally significant autonomous achievement by an AI (such as proving the Riemann hypothesis) before 2030 would also be considered sufficient for a YES resolution.

AI OpenAI Technical AI Timelines AI Impacts AGI

Get Ṁ1,000 play money

26 Comments

Sort by:

AI can autonomously do (simple) AI research, culminating in a high-quality research paper accepted into the main conference proceedings at the #1 NLP conference.

https://www.intology.ai/blog/zochi-acl

Sakana tried something similar earlier this year and it almost worked. I said at the time:

This is another case where the exaggerated claim serves as a good prediction of near-future capabilities.

So this should not be surprising.

I think a yes resolution would basically imply a singularity is underway by 2030. There are around 40 annals of math papers per year and they represent the "best or most important achievements in pure math" in some sense. There are a few other journals that are similar level or one step behind (often journals specific to a field of math) and lot that are two steps behind. I think most valuable math research in a year occurs in journals that are within 2 steps of annals. My vague sense is that 10x the yearly output of annals of math is worth less than all human math research in a year but 100x is worth more. If you can reliably produce an annals quality paper for $100K, then you can replicate the yearly output of annals for 4M and 100x that to exceed a year of human math research for $400 million. If this is possible some company like openAI which is currently plowing $500 billion into building a giant data center to improve their next generation AI plows at least $40 billion into AI related math research and gets the equivalent of 100x the yearly world math research for the trouble. It could easily plow something similar into actually coding up new architectures and testing them, which likely goes superhuman faster than pure math based on what we see of AI coding and you get multiple years of modern AI progress in a single year. Which improves your AIs enough to make similar gains next year. This eventually caps out because any exponential in physical reality eventually runs into physical constraints but it likely caps out somewhere analogous to go where terrance tao looks at chatgpt and sees what what an average elementary schooler sees when they look at terrance tao and this is true for the best human in basically every field.

When you say it is repeatable does this mean something like if you run enough (say a few Thousand AIs with this research/compute budget) more than one of them will get an annals quality paper or does it mean the expected amount of money necessary to get annals papers this way is <= 100k.

@Garzahd I understand it to mean the latter, but best to confirm this is also @TamayBesiroglu ’s understanding.

Is the journal you are referring to Annals of number theory or Annals of mathematics? Annals of mathematics would be the higher bar.

@Garzahd Annals of Mathematics.

bought Ṁ50 NO

All technologies slowly become harder to innovate upon, until, someday, improvements stagnate. It´s especially true in commercial technology, which AI is.

So prolly not.

@CristobalRomanYAdams AI is not primarily a consumer product. It is primarily a research product.

Oops, forgot about this market for a bit, wild it got so low in the meantime.

I think AlphaEvolve is positive news for this market.

I think one possible way of producing qualifying result is first finding (in similar fashion to AlphaEvolve) some construction that was previously considered to be likely impossible. And then writing a paper about implications of this new (not sufficient by itself, likely too mechanistic and not very insightful) construction on various things.

I am not mathematician so I am not sure if my idea makes sense. But in any case I feel like AlphaEvolve is progress towards YES.

sold Ṁ17 NO

@qumeric Definitely is a progress but the approach feels fundamentally expensive. 2030 is a long way away though.

@TamayBesiroglu Is this bet about a single occurrence, i.e. does a single paper having published resolve this market YES, or does it need to repeatable (and have been shown to be repeatable)? I suppose it's the latter since you wrote 'capable of generating math papers' (plural).

@CalibratedNeutral Here's ChatGPT's answer to your question based on its reading of the terms.

@TamayBesiroglu You are the market creator, so why are you posting a ChatGPT answer?

@CalibratedNeutral I understand it to be about a capabilities, so repeatable. The bet is meant to be a proxy for “do the best models have absolute advantage over top mathematicians in doing math research.” That said, if an AI autonomously proves the Riemann hypothesis or something before 2030 I think you should expect the market to resolve YES.

@DanielLittQCSn That is reasonable. Thank you

@CalibratedNeutral ChatGPT is good at reading comprehension.

@DanielLittQCSn ~~wrong, this market is about if frontier AI models are at least competitive with human mathematicians~~ (Edit: I'm dumb sometimes). AI may only be able to produce a subset of worthy Annals Papers by 2030, and this is not a negligible probability to occur.

Though if they hit that threshold before 2030, then they will likely surpass it to become absolutely dominant within a large number of months/<2 yrs, which means if there’s a yes resolution more than likely there’s absolute dominance. Just not so likely that other possibilities aren’t relevant.

Fwiw i think competitive > 70% and dominant < 35%

@Bayesian interesting, that’s a larger gap than I would’ve expected. Does this mean you think it’s highly likely for this capability to only exist by ~mid-2028/early 2029, yet less than 30% not to happen by 2030?

That is to say you have relatively high confidence in a relatively narrow timespan (1-2 yrs) for when resolution occurs?

@DavidHiggs Not rly no, i just think some things are easy to be superhuman about, and some things are hard / require scaling. Some part of math are like chess and some parts are not

bought Ṁ50 NO

@DavidHiggs ‘This bet would be resolved on the basis of Daniel Litt’s judgment.’. Look at the name of the user you are replying to

@CalibratedNeutral I’m aware Bayesian isn’t the creator nor judge of this market, I was just curious about their thinking.

@DavidHiggs What I meant was that you replied ‘wrong’ to Daniel’s explanation of the resolution criteria

@CalibratedNeutral Oh wait you’re right, that’s awkward. Well I guess that’s what the edit button is for… 🙃

@DavidHiggs FWIW I agree this is a bit of a subtle point. I think this market is an imperfect proxy for the question Tamay and I were really discussing; the $100k budget was supposed to be some (rough) estimate of the cost of having a top mathematician produce an Annals paper. But it's not 100% clear to me how the market should resolve if e.g. frontier models can produce great papers in some parts of NT but not in others, say. My hope is that the situation will be more obvious than this; my intent is to err on the side of giving the win to Tamay if the situation is unclear.

Related questions

Related questions