Published: October 30, 2025
24
113
712

After ~4 years building SOTA models & datasets, we're sharing everything we learned in ⚡The Smol Training Playbook We cover the full LLM cycle: designing ablations, choosing an architecture, curating data, post-training, and building solid infrastructure. We'll help you

Image in tweet by Loubna Ben Allal

1/ In the first two chapters, we help you decide if you really need to train from scratch and set up your ablations: - Choosing your training framework - Selecting reliable evals - The art of derisking ✨

Image in tweet by Loubna Ben Allal

2/ Next, we go over model architecture with 20+ ablations testing each transformer component: - Attention: MHA, GQA, MQA, MLA - RoPE, NoPE, IntraDoc Masking, SWA - MoEs & Hybrid models overview - Tokenizers - Optimizers and hyperparameter tuning

Image in tweet by Loubna Ben Allal

3/📚 We then cover the secret sauce: data ! - History of scaling laws - How to curate and mix datasets - Building training curricula to push performance

Image in tweet by Loubna Ben Allal

4/ Next is the training marathon, where things might not go as planned. We show how we debugged SmolLM3's vanishing throughput and share best practices for handling the surprises that emerge at scale (including a sneaky bug that forced us to restart the run at 1T tokens!).

Image in tweet by Loubna Ben Allal

5/ Post-training 2025, turning base models into useful assistants: - Setting up SFT baselines - When to do RL - We apply GRPO to double SmolLM3's math perf

Image in tweet by Loubna Ben Allal

6/ Finally, the often-overlooked infrastructure: GPU fundamentals, CPU/GPU/nodes/storage communication patterns, parallelism strategies, and debugging NCCL errors & cluster bottlenecks 🔧

Image in tweet by Loubna Ben Allal

We hope you enjoy this read! Link: https://huggingface.co/spaces/...

Image in tweet by Loubna Ben Allal

@LoubnaBenAllal1 Woohoo, it's finally out! Reading your chapters made me realise how much more hardcore pre-training is than post-training 😆

@_lewtun hahah you'd be less optimistic if you did pretraining 🙈

@LoubnaBenAllal1 @Thom_Wolf Amazing! Congratulations and thank you to the team @LoubnaBenAllal1

@LoubnaBenAllal1 such beautiful charts 🤗

@LoubnaBenAllal1 i was JUST thinking that you or @eliebakouch should drop a playbook for smol models. Cant wait to read this!!!

@LoubnaBenAllal1 hello loubna have you ever heard about muon?

@eliebakouch from someone who calls people using Adam "lazy"

Image in tweet by Loubna Ben Allal

@LoubnaBenAllal1 Does it include dataset streaming from HF already ? :p

@LoubnaBenAllal1 what were the biggest surprises or counterintuitive findings during those four years that most people building models still get wrong? anything about data curation specifically?

@LoubnaBenAllal1 Is this only available via the space? -it's not loading for me

@LoubnaBenAllal1 How to buy a hard copy of this book?

@LoubnaBenAllal1 Kas čia toks žviegė

@LoubnaBenAllal1 Looks like a huge piece of work, congrats!

@LoubnaBenAllal1 That's a fantastic initiative, Loubna! Sharing your learnings on LLMs is truly valuable, especially for those getting started, right?

@LoubnaBenAllal1 Can we get a pdf ?

@LoubnaBenAllal1 Justo lo que necesitamos: experiencia real, visualizaciones claras y hasta el 'yolo run' de rigor. Mejor que muchos papers. ¡Esto empuja la IA de verdad!

@LoubnaBenAllal1 Good to see a playbook covering the full LLM lifecycle.

@LoubnaBenAllal1 This is amazing. Thank you.

@LoubnaBenAllal1 Wohoo 🎉

@LoubnaBenAllal1 Brilliant work 👍

@LoubnaBenAllal1 the all-in-one guide you need for pre-training 👌

@LoubnaBenAllal1 This RoPE visualization is amazing

Image in tweet by Loubna Ben Allal

@LoubnaBenAllal1 Thank you for sharing!

Share this thread

Read on Twitter

View original thread

Navigate thread

1/35