TRL: Transformer Reinforcement Learning
by Hugging Face
The library and docs for post-training LLMs: SFT, DPO, PPO, GRPO, and reward modeling.
Overview
TRL is the practical toolkit for the entire post-training stack. Its documentation covers supervised fine-tuning (SFT), preference optimization methods like DPO, classic RLHF with PPO, newer reasoning approaches like GRPO, and reward-model training — each with a trainer class and example scripts. When you're ready to move past basic fine-tuning into alignment and preference tuning, these docs are where you'll live.
At a Glance
- Topic
- Fine-Tuning
- Level
- Advanced
- Format
- Documentation
- Cost
- Free
- Duration
- Self-paced
- Provider
- Hugging Face
- Hands-on
- Yes — code/exercises
- Certificate
- None
What You’ll Learn
- ✓Supervised fine-tuning (SFT) with the SFTTrainer
- ✓Preference tuning with DPO
- ✓RLHF with PPO and reasoning with GRPO
- ✓Training and using reward models
Highlights
- •Covers the full modern post-training stack
- •Trainer classes and runnable scripts
Who It’s For
Best For
- ✓Advanced practitioners doing alignment/preference tuning
Prerequisites
- •PyTorch
- •Fine-tuning fundamentals
FAQ
What is TRL: Transformer Reinforcement Learning?
Official docs for TRL, Hugging Face's library for supervised fine-tuning and preference/RL post-training of language models.
Is TRL: Transformer Reinforcement Learning free?
TRL: Transformer Reinforcement Learning is free to access.
What level is TRL: Transformer Reinforcement Learning for?
TRL: Transformer Reinforcement Learning is aimed at a advanced audience. Recommended background: PyTorch, Fine-tuning fundamentals.
How long does TRL: Transformer Reinforcement Learning take?
Expect roughly Self-paced. Most learners work through it at their own pace.
What will I learn from TRL: Transformer Reinforcement Learning?
You'll learn: Supervised fine-tuning (SFT) with the SFTTrainer; Preference tuning with DPO; RLHF with PPO and reasoning with GRPO; Training and using reward models.