Will Transformer based architectures still be SOTA for language modelling by 2026?
53
Ṁ3506
2026
69%
chance

The intent is to capture whether there is a major paradigm shift, on the same order of magnitude of RNNs to Transformers. The market resolves based on the architecture of the SOTA language model. If the architecture is still recognizable as a transformer with modifications, this market resolves as Yes. If the architecture is at least as different from Transformers as Transformers are from RNNs then it resolves No. If the evaluation numbers are ambiguous which architecture is the SOTA due to incomparable evaluations or non Pareto improvement over all evaluations, but one is obviously better overall, then I will use my judgement. If I deem it too close to call, I will resolve this market as Yes, because it indicates that no architecture has clearly surpassed Transformers.

Get Ṁ1,000 play money
Sort by:

I bet yes because I measure SOTA by frontier models, and I would be surprised if the labs were to invest the resources required for a large scale training run on an architecture which was new.

I like this question, added some liquidity

Is Mamba a transformer?

Would you count a model like Stanford's Monarch Mixer as a Transformer?

Is an ARDM (auto-regressive diffusion model) a transformer?

predicts YES

@ampdot Probably not; for a more definite answer you would have to elaborate more on what you have in mind.

Yes.

Plus memory and retrieval.

And hierarchical/nested/capsules (already used in video and high-end competition for long prompts)

And mixture of experts, and flash-attention and other sparsity techniques.

Will Transformer based architectures still be SOTA for language modelling by 2026?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition