Will any model get above human level (92%) on the Simple Bench benchmark before September 1st, 2025.
Mini
22
Ṁ39462025
60%
chance
1D
1W
1M
ALL
Get Ṁ1,000 play money
Sort by:
Is it true that this benchmark can be anything, and can be changed at any point? There are no hashes, no large sample of problems, no error bars, no evaluation code, no specifics on what a model can or cannot use... How do we know what a true performance is, except what the author says?
Description of the benchmark here: https://simple-bench.com/about.html
I have made some irrational bets to subsidize the market - as I cannot be bothered to figure out the correct way to do this.
Related questions
Related questions
Will models be able to do the work of an AI researcher/engineer before 2027?
36% chance
Will any model pass an "undergrad proofs exam" Turing test by 2027?
75% chance
Will there be a model that has a 75% win rate against the latest iteration of GPT-4 as of January 1st, 2025?
58% chance
Will an AI model outperform 95% of Manifold users on accuracy before 2026?
54% chance
Will a single model achieve superhuman performance on all OpenAI gym environments by 2025?
25% chance
When will any model achieve >=human performance on QuALITY?
What will be the best score on the SWE-Bench (unassisted) benchmark before 2025?
39% chance
Will simple-bench scores be reported in a major AI lab paper or blog post by the end of 2024?
52% chance
Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?
68% chance
Will a single model achieve superhuman performance on all Atari environments by 2025?
22% chance