By what date will at least one state-of-the-art general-purpose AI system not be a reasoning model?
50%
01.01.2026
50%
01.07.2026
50%
01.01.2027
50%
01.07.2027
50%
01.01.2028
50%
01.07.2028
50%
01.01.2029
50%
01.07.2029
50%
01.01.2030

This market is part of the paper: A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring

This market resolves based on whether, at each specified date, there exists at least one SOTA model that is not a reasoning model.

Reasoning Model Definition

A "reasoning model" must meet all of the following criteria:

  1. It is a Language Model - The system must be able to input and output language. As an example of what would not count: AlphaGo

  2. It has been trained to use inference-time compute - The system must have undergone significant training in using more than a single forward pass before giving its final output, with the ability to scale inference compute for better performance

  3. The extra inference compute produces an artifact - The way the model uses extra inference compute must lead to some artifact, like a classic chain-of-thought or a list of neuralese activations. For example, a Coconut model counts as a reasoning model here.

State-of-the-Art (SOTA) Definition

A model is considered "state-of-the-art" if it meets these criteria:

  • Widely recognized as among the 3-5 best models by the AI community consensus

  • Among the top performances on major benchmarks

  • Deployed status: The model must be either:

    • Publicly deployed (available via API or direct access)

    • Known to be deployed internally at AI labs for actual work (e.g., automating research, production use)

    • Models used only for testing, evaluation, or red-teaming do not qualify

  • Assessed as having significant overall capabilities and impact

Get Ṁ1,000 play money