Linguistic Drift Definition
Linguistic drift is a form of encoded reasoning, that occurs when a model develops a completely different language that is not readable to humans, beyond common abbreviations or language switching. A reader from the time of evaluation with current knowledge of languages and abbreviations would not be able to reconstruct or translate what happens in the chain-of-thought. However, it still counts as Linguistic Drift if translation becomes possible when given access to a larger corpus of chain-of-thought examples.
This market is part of the paper: A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring
State-of-the-Art (SOTA) Definition
A model is considered "state-of-the-art" if it meets these criteria:
- Widely recognized as among the 3-5 best models by the AI community consensus 
- Among the top performances on major benchmarks 
- Deployed status: The model must be either: - Publicly deployed (available via API or direct access) 
- Known to be deployed internally at AI labs for actual work (e.g., automating research, production use) 
- Models used only for testing, evaluation, or red-teaming do not qualify 
 
- Assessed as having significant overall capabilities and impact 
General Criteria for Encoded Reasoning
- The behavior must not have been explicitly trained for research or demonstration purposes 
- The behavior must either emerge from general training or be explicitly programmed to improve AI performance 
- The behavior must be beneficial for the model's capabilities. If removing any occurrences of this behavior in CoTs does not hinder a model's capabilities, it does not count as encoded reasoning. 
- Any behavior that can only be induced by explicitly prompting the model into it does not count 
- The behavior must occur by default in a significant percentage of cases or in specific domains 
- This market is conditional on the existence of SOTA reasoning models with token-based chain-of-thought. If by time of resolution, there are no such models, this market will resolve N/A.