CS336: Language Modeling from Scratch
by Stanford University
Build a language model end to end — data, architecture, training, and systems — the way researchers do.
Overview
CS336 is unusually deep: instead of using LLMs, students build one across the full stack — data curation, tokenization, the transformer implementation, efficient GPU training and systems, scaling laws, and alignment. Assignments and lecture materials are public. It's demanding and assumes strong fundamentals, but nothing else gets you closer to how frontier language models are actually engineered.
At a Glance
- Topic
- Models
- Level
- Advanced
- Format
- Course
- Cost
- Free
- Duration
- Full university course
- Provider
- Stanford University
- Hands-on
- Yes — code/exercises
- Certificate
- None
What You’ll Learn
- ✓Implementing a transformer and tokenizer from scratch
- ✓Efficient GPU training and systems engineering
- ✓Scaling laws and data curation
- ✓Alignment and evaluation of your model
Highlights
- •Full-stack, build-it-yourself approach
- •Public assignments and lectures
Who It’s For
Best For
- ✓Advanced learners aiming for LLM research/engineering
Prerequisites
- •Strong Python & PyTorch
- •Deep learning fundamentals
FAQ
What is CS336: Language Modeling from Scratch?
Stanford's course where students build a language model from scratch: tokenizer, architecture, training, systems, scaling, and alignment.
Is CS336: Language Modeling from Scratch free?
CS336: Language Modeling from Scratch is free to access.
What level is CS336: Language Modeling from Scratch for?
CS336: Language Modeling from Scratch is aimed at a advanced audience. Recommended background: Strong Python & PyTorch, Deep learning fundamentals.
How long does CS336: Language Modeling from Scratch take?
Expect roughly Full university course. Most learners work through it at their own pace.
What will I learn from CS336: Language Modeling from Scratch?
You'll learn: Implementing a transformer and tokenizer from scratch; Efficient GPU training and systems engineering; Scaling laws and data curation; Alignment and evaluation of your model.