AgenticFrameworks

Evaluating AI Agents

by DeepLearning.AI

BeginnerCourseFree~2.5 hours, self-paced

Add observability to an agent, then evaluate every step with code-based and LLM-as-a-Judge evals.

Start LearningReviewed July 4, 2026

Overview

Evaluating AI Agents is a free short course from DeepLearning.AI created in partnership with Arize AI and taught by John Gilhuly and Aman Khan. Across roughly 2.5 hours of video lessons with embedded code examples, you build an AI agent from scratch (router, skills, and memory), instrument it with tracing to visualize the steps it takes, and then evaluate each component using the right technique — code-based checks, LLM-as-a-Judge, and structured experiments you can iterate on. The course focuses on the practical evaluation loop that turns a demo agent into a reliably improving system, including how to choose the appropriate evaluator per component and how to structure evals as repeatable experiments.

At a Glance

Topic
Agentic
Level
Beginner
Format
Course
Cost
Free
Duration
~2.5 hours, self-paced
Provider
DeepLearning.AI
Hands-on
Yes — code/exercises
Certificate
None

What You’ll Learn

  • Instrument an agent with tracing/observability to inspect each step it takes
  • Choose the right evaluator per component: code-based, LLM-as-a-Judge, or human annotation
  • Build an agent from its core parts — router, skills, and memory
  • Structure evaluations into repeatable experiments to iterate on agent performance
  • Debug and diagnose where an agent's reasoning or tool calls break down

Highlights

  • Co-taught by Arize AI engineers who build agent-observability tooling for a living
  • Component-wise evaluation approach rather than a single end-to-end score
  • Code examples embedded throughout so you evaluate a real agent, not slides

Who It’s For

Best For

  • AI engineers shipping agents who need a measurable quality/eval loop
  • Developers debugging why an agent's tool use or routing misbehaves
  • Teams standing up observability and evals for agentic systems

Prerequisites

  • Basic Python
  • Familiarity with calling LLM APIs and the idea of tool-using agents

FAQ

What is Evaluating AI Agents?

A hands-on short course, built with Arize AI, that teaches AI engineers how to measure and improve agent quality by adding tracing/observability and running systematic evaluations. For developers who can build an agent but need a rigorous way to know whether it actually works.

Is Evaluating AI Agents free?

Evaluating AI Agents is free to access.

What level is Evaluating AI Agents for?

Evaluating AI Agents is aimed at a beginner audience. Recommended background: Basic Python, Familiarity with calling LLM APIs and the idea of tool-using agents.

How long does Evaluating AI Agents take?

Expect roughly ~2.5 hours, self-paced. Most learners work through it at their own pace.

What will I learn from Evaluating AI Agents?

You'll learn: Instrument an agent with tracing/observability to inspect each step it takes; Choose the right evaluator per component: code-based, LLM-as-a-Judge, or human annotation; Build an agent from its core parts — router, skills, and memory; Structure evaluations into repeatable experiments to iterate on agent performance; Debug and diagnose where an agent's reasoning or tool calls break down.

Topics

ai-agentsevalsobservabilityllm-as-a-judgetracingarize