If they distill a 70B model into an 8B size model, it counts as an 8B size model

"Actual parameter count" what does this mean? I'm specifically asking about what happens if they distill llama 70b to a 8b size model that's better than the existing 8b size model.

A distilled model is counted as its actual parameter count, and belongs to the highest weight class (8B, 70B, 405B) it is less than or equal to.

Please add liquidity to this market! This is an important question that I care about. I've already added M3000 myself 

I bought M50 when the market was at 99% and I'm slowly exiting my position. Please trade accordingly

Studyfy can be extremely useful for students and learners due to its various features. First, the platform provides access to a large number of educational materials, including interactive lessons, video lessons and test tasks, which helps to deepen knowledge of various subjects. Also, I pay someone to do my homework right here 

https://studyfy.com/service/do-my-homework

. Second, Studyfy allows you to create personalized study plans, which helps you better organize your time and effort to achieve your study goals.

Will any model that performs better than the equivalent size (± 10% in parameter count) LLaMa 3.1 model be officially released by Meta, where "performs better" means "at least 0.5% more accurate at MMLU"? Base model only.

For example, LLaMa 3.1 70B's MMLU score is 83.6% (an improvement over LLaMa 3.0 70B's 79.5% MMLU). LLaMa 3.2 70B would need to perform at 84.1% MMLU to resolve this market YES. Note that 

 model in the family (8B, 70B, 405B) performing at least 0.5% better is enough to resolve this market.

Multimodal models eligible but only text MMLU performance will be evaluated. Models that were fine-tuned, DPO'd, RLHF'd, or CPT'd on synthetic data will not resolve this market. 

For reference, LLaMa 3.0 70B's MMLU score was 79.5, GPT-4o's score is 88.7, and LLaMa 3.1 405B base's score is 85.2. (LLaMa 3.1 405B Instruct score's is 88.7)

Related questions

Related questions