Will a Mamba 7b model trained on 2 trillion tokens outperform Llama2-13B | Manifold

Will a Mamba 7b model trained on 2 trillion tokens outperform Llama2-13B

21

Ṁ738

Jul 1

66%

chance

1D

1W

1M

ALL

Question will resolve positive if someone trains a Mamba (https://twitter.com/tri_dao/status/1731728602230890895) language model with <=7.5billion parameters on <=2 trillion tokens that outperforms Llama2-13B on the huggingface open llm leaderboard (https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)

Get Ṁ1,000 play money

Sort by:

https://huggingface.co/nvidia/mamba2-hybrid-8b-3t-4k

Related questions

Will the next major LLM by OpenAI use a new tokenizer?

Will the Jan 2024 version of the LLM detector "Binoculars" be effective against OpenAI's best model at end 2024?

Will a open source pure Mamaba LLM surpass 82 MMLU on MMLU (5-shot) before end of year 2024?

Will a single model running on a single consumer GPU (<1.5k 2020 USD) outperform GPT-3 175B on all benchmarks in the original paper by 2025?

Will Llama 4 be the best LLM in the chatbot arena?

Will a 15 billion parameter LLM match or outperform GPT4 in 2024?

Will Llama 3-multimodal be natively mixed-multimodal? (VQ-VAE+next token prediction)

Will any open-source model achieve GPT-4 level performance on MMLU through 2024?

How many active parameters will the largest Llama 3 have?

Will a text model achieve 100% performance on the MMLU in five years?

Related questions

Will the next major LLM by OpenAI use a new tokenizer?

Will a 15 billion parameter LLM match or outperform GPT4 in 2024?

Will the Jan 2024 version of the LLM detector "Binoculars" be effective against OpenAI's best model at end 2024?

Will Llama 3-multimodal be natively mixed-multimodal? (VQ-VAE+next token prediction)

Will a open source pure Mamaba LLM surpass 82 MMLU on MMLU (5-shot) before end of year 2024?

Will any open-source model achieve GPT-4 level performance on MMLU through 2024?

Will a single model running on a single consumer GPU (<1.5k 2020 USD) outperform GPT-3 175B on all benchmarks in the original paper by 2025?

How many active parameters will the largest Llama 3 have?

Will Llama 4 be the best LLM in the chatbot arena?

Will a text model achieve 100% performance on the MMLU in five years?