Will Anthropic open-source the training code of their SAE interpretability effort?
4
Ṁ4652028
1D
1W
1M
ALL
14%
this year, fully
31%
this year, significantly incomplete
19%
next year
22%
not before 2028
14%
We mean the code used for producing Scaling Interpretability blog post.
Get Ṁ1,000 play money
Related questions
Related questions
Do Anthropic's training updates make SAE features as interpretable?
50% chance
Will Anthropic release a model that thinks before it responds like o1 from OpenAI by EOY 2024?
42% chance
Will Anthropic and OpenAI collaborate substantially on a research paper before 2025?
24% chance
Will xAI join the voluntary commitment by OpenAI/Anthropic to AISI to share major new models w/AISI prior to release?
68% chance
Will Meta join the voluntary commitment by OpenAI/Anthropic to AISI to share major new models w/AISI prior to release?
52% chance
Will OpenAI go back on its voluntary commitment to AISI to share major new models w/AISI prior to release?
35% chance
Will OpenAI release weights to a model designed to be easily interpretable (2024)?
22% chance
Will Anthropic have AI-related IP stolen before 2026?
46% chance
Will a model costing >$30M be intentionally trained to be more mechanistically interpretable by end of 2027? (see desc)
57% chance
Will OpenAI allow near full access to the weights of their best-trained model to an external auditor by the end of 2030?
60% chance