Post-training of LLMs
by DeepLearning.AI
Turn a base model into an assistant with SFT, DPO, and online RL — and know when to use each.
Overview
Post-training of LLMs is a free DeepLearning.AI short course taught by Banghua Zhu, Assistant Professor at the University of Washington and co-founder of NexusFlow. In about 90 minutes of video lessons with code examples, it covers the three most common post-training techniques and, crucially, when to use each: Supervised Fine-Tuning (SFT), where the model is trained on input-output pairs with ideal responses; Direct Preference Optimization (DPO), where you supply a preferred ('chosen') and a less preferred ('rejected') response; and online Reinforcement Learning, where the model generates outputs and receives reward scores from human or automated feedback. Learners download a pretrained model from Hugging Face and post-train it with SFT, DPO, and RL to see how each method changes model behavior.
At a Glance
- Topic
- Fine-Tuning
- Level
- Intermediate
- Format
- Course
- Cost
- Free
- Duration
- ~1.5 hours, self-paced
- Provider
- DeepLearning.AI
- Hands-on
- Yes — code/exercises
- Certificate
- None
What You’ll Learn
- ✓Apply Supervised Fine-Tuning (SFT) on input-output pairs to shape model behavior
- ✓Use Direct Preference Optimization (DPO) with chosen/rejected preference pairs
- ✓Run online Reinforcement Learning with reward signals from human or automated feedback
- ✓Decide which post-training method fits a given adaptation goal
- ✓Download a pretrained Hugging Face model and post-train it end to end
Highlights
- •Taught by a researcher who co-founded NexusFlow and works on model post-training
- •Covers SFT, DPO, and online RL side by side with practical trade-offs
- •Hands-on code that post-trains a real open model from Hugging Face
Who It’s For
Best For
- ✓ML/AI engineers adapting open-weight models beyond prompting
- ✓Practitioners deciding between SFT, DPO, and RL for alignment
- ✓Anyone building preference-tuned or reward-optimized models
Prerequisites
- •Comfort with Python and PyTorch
- •Basic understanding of training neural networks and LLM fundamentals
FAQ
What is Post-training of LLMs?
A hands-on short course on the three core post-training methods that turn a pretrained LLM into a useful, aligned assistant: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and online Reinforcement Learning. For engineers who want to adapt open models rather than just prompt them.
Is Post-training of LLMs free?
Post-training of LLMs is free to access.
What level is Post-training of LLMs for?
Post-training of LLMs is aimed at a intermediate audience. Recommended background: Comfort with Python and PyTorch, Basic understanding of training neural networks and LLM fundamentals.
How long does Post-training of LLMs take?
Expect roughly ~1.5 hours, self-paced. Most learners work through it at their own pace.
What will I learn from Post-training of LLMs?
You'll learn: Apply Supervised Fine-Tuning (SFT) on input-output pairs to shape model behavior; Use Direct Preference Optimization (DPO) with chosen/rejected preference pairs; Run online Reinforcement Learning with reward signals from human or automated feedback; Decide which post-training method fits a given adaptation goal; Download a pretrained Hugging Face model and post-train it end to end.