Fine-TuningMLModels

Post-training of LLMs

by DeepLearning.AI

IntermediateCourseFree~1.5 hours, self-paced

Turn a base model into an assistant with SFT, DPO, and online RL — and know when to use each.

Start LearningReviewed July 4, 2026

Overview

Post-training of LLMs is a free DeepLearning.AI short course taught by Banghua Zhu, Assistant Professor at the University of Washington and co-founder of NexusFlow. In about 90 minutes of video lessons with code examples, it covers the three most common post-training techniques and, crucially, when to use each: Supervised Fine-Tuning (SFT), where the model is trained on input-output pairs with ideal responses; Direct Preference Optimization (DPO), where you supply a preferred ('chosen') and a less preferred ('rejected') response; and online Reinforcement Learning, where the model generates outputs and receives reward scores from human or automated feedback. Learners download a pretrained model from Hugging Face and post-train it with SFT, DPO, and RL to see how each method changes model behavior.

At a Glance

Topic
Fine-Tuning
Level
Intermediate
Format
Course
Cost
Free
Duration
~1.5 hours, self-paced
Provider
DeepLearning.AI
Hands-on
Yes — code/exercises
Certificate
None

What You’ll Learn

  • Apply Supervised Fine-Tuning (SFT) on input-output pairs to shape model behavior
  • Use Direct Preference Optimization (DPO) with chosen/rejected preference pairs
  • Run online Reinforcement Learning with reward signals from human or automated feedback
  • Decide which post-training method fits a given adaptation goal
  • Download a pretrained Hugging Face model and post-train it end to end

Highlights

  • Taught by a researcher who co-founded NexusFlow and works on model post-training
  • Covers SFT, DPO, and online RL side by side with practical trade-offs
  • Hands-on code that post-trains a real open model from Hugging Face

Who It’s For

Best For

  • ML/AI engineers adapting open-weight models beyond prompting
  • Practitioners deciding between SFT, DPO, and RL for alignment
  • Anyone building preference-tuned or reward-optimized models

Prerequisites

  • Comfort with Python and PyTorch
  • Basic understanding of training neural networks and LLM fundamentals

FAQ

What is Post-training of LLMs?

A hands-on short course on the three core post-training methods that turn a pretrained LLM into a useful, aligned assistant: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and online Reinforcement Learning. For engineers who want to adapt open models rather than just prompt them.

Is Post-training of LLMs free?

Post-training of LLMs is free to access.

What level is Post-training of LLMs for?

Post-training of LLMs is aimed at a intermediate audience. Recommended background: Comfort with Python and PyTorch, Basic understanding of training neural networks and LLM fundamentals.

How long does Post-training of LLMs take?

Expect roughly ~1.5 hours, self-paced. Most learners work through it at their own pace.

What will I learn from Post-training of LLMs?

You'll learn: Apply Supervised Fine-Tuning (SFT) on input-output pairs to shape model behavior; Use Direct Preference Optimization (DPO) with chosen/rejected preference pairs; Run online Reinforcement Learning with reward signals from human or automated feedback; Decide which post-training method fits a given adaptation goal; Download a pretrained Hugging Face model and post-train it end to end.

Topics

post-trainingfine-tuningsftdporlhfalignment