Will any public eval show >1 % intentional deception rate before 2026-12-31? | Manifold

Will any public eval show >1 % intentional deception rate before 2026-12-31?

2

Ṁ32

2026

46%

chance

1D

1W

1M

ALL

This is clearly inspired by the kind of papers you see recently where any bug is cited as a model engaging in deception. great marketing for proprietary models, great buzz for the research publishers, but all of them assume any bug as deception. We will see more of evocative words such as 'deception', 'obfuscate', 'sabotage' when in reality it could be interpreted as bugs in the system which is usually the simplest explanation. So, the challenge for the evaluators is: will we see any public eval or research which can prove intentional deception at >1 % rate? The key here is intentionality has to be proven and not assumed. Why is a model engaging intentionally in deception would be a good question. But, before that, would be prudent to figure out how to prove intent here.

AI ️ Technology OpenAI AI Safety

Get Ṁ1,000 play money

Related questions

Will there exist a compelling demonstration of deceptive alignment by 2026?

Will an AI system be reported by OpenAI as of December 31st 2025 as having a pre-mitigation score of...

What will be the best performance on EnigmaEval by December 31st 2025?

Will deceptive misalignment occur in any AI system before 2030?

Top score on Humanity's Last Exam > 50% by 2028?

AI honesty #3: by 2027 will we have interpretability tools for detecting when an AI is being deceptive?

What will be the best AI performance on Humanity's Last Exam by December 31st 2025?

Will AI regulations that include mechanisms for uncovering AI deception be adopted in the U.S. before 2035?

Top score on Humanity's Last Exam > 50% by 2029?

Top score on Humanity's Last Exam > 50% by 2027?

Related questions

Will there exist a compelling demonstration of deceptive alignment by 2026?

AI honesty #3: by 2027 will we have interpretability tools for detecting when an AI is being deceptive?

Will an AI system be reported by OpenAI as of December 31st 2025 as having a pre-mitigation score of...

What will be the best AI performance on Humanity's Last Exam by December 31st 2025?

What will be the best performance on EnigmaEval by December 31st 2025?

Will AI regulations that include mechanisms for uncovering AI deception be adopted in the U.S. before 2035?

Will deceptive misalignment occur in any AI system before 2030?

Top score on Humanity's Last Exam > 50% by 2029?

Top score on Humanity's Last Exam > 50% by 2028?

Top score on Humanity's Last Exam > 50% by 2027?