What will be the best score (5/5 reliability) on ZeroBench by December 31st 2025?

Ṁ1519

Jan 1

ALL

64%

0 - 10

26%

11 - 20

1.4%

21 - 30

1.3%

31 - 40

1.3%

41 - 50

1.3%

51 - 60

1.3%

61 - 70

1.3%

71 - 80

1.3%

81 - 90

1.4%

91 - 100

ZeroBench is a benchmark for visual reasoning, introduced by Roberts et al. in "ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models" (https://arxiv.org/abs/2502.09696).

This market will use the variant of the benchmark frozen one week after the initial release (following the public benchmark red-teaming stage to identify flawed/ambiguous questions).

The temperature used for the 5/5 reliability evaluation will be the default setting provided by each LLM API provider. In cases where this default is ambiguous to determine, we will default to a temperature of 0.7.

️ Technology AI World Technical AI Timelines AI Benchmarks

Get Ṁ1,000 play money

3 Comments

Sort by:

bought Ṁ150 11 - 20 YES

Nov 19, 2025
Gemini 3
"pass@5: 19% (prev SOTA 10%)
5/5 reliability: 5% (prev 3%)"
https://x.com/JRobertsAI/status/1991163723436663125?s=20

As of May 24th 2025, Claude 4 Opus is the new SotA:

https://x.com/JRobertsAI/status/1926325748303872203

4% Pass@1

As of March 28th 2025, Gemini 2.5 Pro is the new SotA: https://x.com/JRobertsAI/status/1905577784300183653

3% pass@1

5% pass@5

1% 5/5 reliability

Related questions

Related questions