Will OpenAI's next-gen math-focused model score at least 95% on the MATH benchmark?
27
Ṁ1802
resolved Sep 16
Resolved
NO
  • Resolve to YES if OpenAI's next generation math-focused model achieves a score of 95% or higher on the MATH benchmark.

  • If the next generation of general models (e.g. GPT-4), code models (e.g. Codex), or any other models specialized for reasoning are released earlier than the math models and score 95% or higher, it will resolve this question to YES.

  • Benchmarking on a subset of MATH is acceptable.

  • Using tools(e.g. calculator) & code is allowed.

Get Ṁ1,000 play money

🏅 Top traders

#NameTotal profit
1Ṁ119
2Ṁ75
3Ṁ51
4Ṁ37
5Ṁ14
Sort by:

Why is this resolving yes? I would have thought no? https://github.com/openai/simple-evals?tab=readme-ov-file#benchmark-results

@Usaar33 My mistake. unresolved and resolved to No.