AI Agent Payback: 3.4 Months for SDRs, 9.3 for Engineering

BCG and Bain just dropped the 2026 AI agent payback benchmarks. 22% of deployments lose money. Here's the function-by-function ROI calculator.

By Rajesh Beri·June 7, 2026·15 min read
Share:

THE DAILY BRIEF

Enterprise AIAI ROIAI AgentsCFO StrategyAgentic AI

AI Agent Payback: 3.4 Months for SDRs, 9.3 for Engineering

BCG and Bain just dropped the 2026 AI agent payback benchmarks. 22% of deployments lose money. Here's the function-by-function ROI calculator.

By Rajesh Beri·June 7, 2026·15 min read

SDR agents pay back in 3.4 months. Engineering agents take 9.3. Clinical agents take 18.4. And 22% of all deployments lose money after a full year. That's the headline from a wave of new 2026 benchmarks—Bain's Agentic AI Benchmark 2026, BCG's payback survey, and Forrester's root-cause analysis—that just turned the "AI agent ROI" conversation from religion into arithmetic.

The numbers matter because every CFO is now asking the same two questions: When does this thing pay for itself, and how do I know we're not in the 22% that never does? The new data finally gives them defensible answers—if they know where to look. For CIOs, the same data is a forecasting tool. Pitch an SDR agent and you can credibly promise payback inside two quarters. Pitch an engineering agent and you're looking at three. Both are good investments. They just need different business cases.

This piece breaks down the new benchmarks by function, gives you an ROI calculator framework you can run today, and maps the five mistakes that push deployments into the negative-ROI bucket. Skip the part where you guess.

If you've been tracking the broader enterprise AI ROI crisis—where 95% of pilots fail to deliver measurable returns—the new benchmarks are how you escape it.

What Changed: The 2026 Payback Benchmarks

For two years, "AI agent ROI" was a vibes-based conversation. Vendors quoted hero metrics ("75% faster!"). CFOs countered with anecdata ("Uber blew through their whole 2026 budget by April"). Neither side had defensible benchmarks. The 2026 analyst cycle changed that.

Three new datasets dropped between Q1 and Q2 2026:

  • Bain Agentic AI Benchmark 2026 — Function-level payback medians across 800+ enterprise deployments
  • BCG Agentic AI Pulse 2026 — ROI realization rates at 6, 12, and 24-month marks
  • Forrester + Anaconda 2026 Survey — Root-cause analysis on negative-ROI deployments

The combined picture is sharper than anything we had in 2025. Here's what the data says.

The median payback period for an enterprise AI agent deployment is 5.1 months. That's the BCG/Forrester top-line, calculated across functions. But the function-level dispersion is enormous. Customer service agents pay back in 4.1 months. SDR (sales development) agents pay back in 3.4 months—the fastest in the dataset. Engineering agents take 9.3 months. Finance and operations agents land at 8.9 months. Legal hits 14.8 months. Clinical agents—the slowest—take 18.4 months (Bain Agentic AI Benchmark 2026).

Across the full dataset, 41% of deployments cross positive ROI within 12 months. Another 18% hit it inside six. But 22% are still underwater at the 12-month mark—and Forrester's root-cause work shows the failure pattern is almost never model capability (Forrester + Anaconda 2026 Survey).

The Forrester failure breakdown:

  • 41% of negative-ROI deployments traced back to unclear success criteria
  • 33% to insufficient tool or data access
  • 26% to drift in evaluation coverage (no automated evals on prompt/model changes)

That's a critical finding because it means the lever isn't model selection. It's operational discipline. The deployments that ship to production with a named owner, automated evals, and binary success criteria—those are the ones that hit payback inside two quarters. The deployments that drift into "let's see what it can do" land in the 22%.

Vendor agents reach positive ROI 2.4x faster than custom builds. Deloitte's State of Generative AI Q1 2026 puts vendor time-to-value at 29–41 days vs 89–118 days for in-house builds (Deloitte State of Generative AI Q1 2026). That gap is now too large to ignore on most use cases.

Why This Matters: Dual-Audience Implications

Technical Implications (CIO/CTO)

The function-level payback dispersion has direct architectural consequences. The fastest-paying functions—SDR, customer service, marketing operations—share three traits: high-volume repetitive tasks, well-instrumented success metrics, and existing integrations with systems-of-record (Salesforce, ServiceNow, HubSpot). The slowest-paying functions—legal, clinical, finance reconciliation—have low volume, high variance, and weaker instrumentation.

If you're a CIO sequencing your 2026 agent roadmap, the data tells you where to start. Don't lead with the highest-prestige use case. Lead with the function that pays back fastest, ships proof points, and funds the next wave. Forrester's data on rollback rates makes this concrete: agents shipped without automated evals roll back 47% of the time. Agents shipped with full eval coverage roll back 9% of the time. Eval coverage is now the single biggest predictor of production survival (Forrester Rollback Data).

There's a second technical implication around the vendor-vs-custom decision. The 2.4x time-to-value gap means the default answer is now "buy" unless you have a specific defensibility argument for "build." That flips the 2024 calculus, which favored custom because vendor agents were immature. Vendor agents like Sierra, Decagon, and Glean have matured enough that the custom premium is no longer worth the wait on most use cases.

Business Implications (CFO/COO/CMO)

For CFOs, this is the first dataset that supports defensible AI agent business cases. Until now, "show me the ROI" was a rhetorical question—the data didn't exist. Now you can underwrite a deployment against a known benchmark.

The CFO playbook flips three ways:

  1. Function-based budgeting. Don't budget by total AI spend. Budget by function-specific ROI expectations. An SDR agent that doesn't hit payback by month five is broken. A clinical agent that doesn't hit payback by month five is normal.

  2. Stage-gate by benchmark. If a deployment isn't tracking toward its function-level median by the 60-day mark, that's a signal to course-correct or kill. The data lets you set non-arbitrary stage gates.

  3. The 22% rule. Roughly one in five deployments will lose money at 12 months. Treat that as the failure rate in your portfolio math. Allocate accordingly.

For COOs, the productivity gains are now quantified by function. McKinsey's 2026 survey pegs the median knowledge worker at 6.4 hours saved per week, with customer service reps saving 8.7 hours and software engineers saving 11.3 hours (McKinsey Global AI Survey 2026). The question is no longer "do agents save time?" It's "did we capture the savings as P&L impact or did they evaporate into Slack?"

For CMOs, the marketing operations payback of 6.7 months is roughly mid-pack—not the fastest, not the slowest. But marketing also has the highest measurable cost-per-task reduction in some categories: long-form article drafting drops 156x ($640 to $4.10) and customer service tickets drop 9.1x ($4.18 to $0.46) (Master of Code 2026 Report).

Market Context: Why This Data Hit Now

The benchmarks arrived in the same quarter as a $950M raise by Sierra (Bret Taylor's enterprise AI agent platform, now at a $15B+ post-money valuation) and a $122B raise by OpenAI (TechCrunch, May 4, 2026). The capital surge is forcing the analyst community to publish defensible measurement frameworks. Without them, the next 18 months of agent deployments will run blind.

Sierra alone now claims 40% of the Fortune 50 as customers, with $150M+ in ARR after 24 months in market (CMSWire May 2026). Decagon, Glean, AI21, and Cognition (Devin, recently valued at $2.5B) round out the vendor side. On the hyperscaler side, Microsoft just shipped MAI-Code-1-Flash at Build 2026, Google launched a $100/month developer tier, and AWS continues to push Bedrock Agents. The vendor agent market is now real—and the new data on payback periods is the first apples-to-apples scoring system available to enterprise buyers.

Gartner's Q2 2026 forecast pushes the market toward 80% of enterprise applications embedding at least one AI agent by year-end, up from 33% in 2024. Production-grade deployments are also climbing fast: 9% in 2024 → 19% in 2025 → 31% in Q1 2026 (Gartner Q2 2026 Outlook). But Gartner's also warned that 40%+ of agentic AI projects will be canceled by end of 2027—driven by escalating costs, unclear ROI, and inadequate risk controls. The payback benchmarks are the missing measurement layer that determines which deployments survive the coming cull.

The IDC/McKinsey consensus forecast pegs total AI agent spend at $1.4 trillion by 2027 (IDC + McKinsey Consensus 2026). That's roughly the size of the entire global software industry in 2020. At that scale, "I'm sure it's working" is no longer an acceptable answer. The 22% failure rate, multiplied by $1.4T, is roughly $300B of negative-ROI spend over the next 24 months. That's the spend the payback benchmarks are designed to prevent.

For context on how token-side economics complicate this picture, see the $7M budget trap where token prices fell 98% but enterprise AI bills tripled. The payback benchmarks are the consumption-side counterweight to the cost-side explosion.

Framework #1: The Function-Level ROI Calculator

Use this framework to size any agent deployment against its 2026 benchmark. It works at three deployment sizes—small team pilot, mid-market scale-out, and enterprise rollout.

Inputs You Need

  1. Employee count affected by the deployment
  2. Fully-loaded hourly rate (use BLS 2026 default: base wage × 1.42)
  3. Current task time (minutes per transaction)
  4. Expected agent-assisted task time (use vendor benchmarks or 55% speedup default)
  5. Annual license + integration cost (use $50K SMB / $250K mid / $1.5M enterprise)
  6. Deployment scope (% of eligible workflows actually migrated)

Function-Specific Benchmarks (Bain 2026)

Function Median Payback Hours Saved/Week Cost-Per-Task Reduction
Sales Development (SDR) 3.4 months 5.4 4.8x
Customer Service 4.1 months 8.7 9.1x
Marketing Operations 6.7 months 6.1 12x (content)
IT Helpdesk 8.0 months 5.9 6.2x
Finance / Accounting 8.9 months 3.8 3.4x
Software Engineering 9.3 months 11.3 66x (code review)
Human Resources 11.2 months 4.6 2.4x
Legal 14.8 months 2.9 1.8x
Clinical 18.4 months 1.8 1.2x

Worked Example: SDR Agent for a Mid-Market SaaS

  • Inputs: 50 SDRs, $85/hour fully loaded, 35 min per outbound sequence (current), 12 min agent-assisted, $250K annual platform cost, 80% workflow scope
  • Annual time saved: 50 × 40 sequences/week × 52 weeks × 23 minutes × 0.80 / 60 = 31,893 hours
  • Gross savings: 31,893 × $85 = $2.71M
  • Net savings (after $250K license): $2.46M
  • Payback period: $250K ÷ ($2.46M ÷ 12) = 1.2 months

Even at the conservative end—50% adoption, 40% speedup—this deployment pays back in 3.1 months, right at the Bain SDR median.

Three-Size Sample Outputs

At the default 55% speedup and 80% scope:

Deployment Size Year-1 Net Benefit Payback 3-Year NPV (10% discount)
Small (50 employees) $2.46M 1.2 months $6.1M
Mid-market (500 employees) $26M 0.6 months $64M
Enterprise (5,000 employees) $268M 0.3 months $658M

Critical context: The single biggest swing factor isn't speedup percentage. It's deployment scope. McKinsey's 2025 finding—88% of enterprises pilot agents, only 6% scale them—means most AI budgets divide their projected savings by 4 or more in practice. A 25% scope deployment delivers roughly one-quarter of the modeled benefit. The ROI math doesn't fail. The rollout does.

Framework #2: The 5 Mistakes That Push You Into the 22%

Forrester's failure analysis identifies five recurring patterns in negative-ROI deployments. Each one is preventable—and each one maps to a specific operational fix.

Mistake #1: Unclear Success Criteria (41% of failures)

The pattern: Deployment ships without binary success metrics. Team "evaluates" the agent qualitatively. Six months in, nobody can answer "is this working?"

The fix: Pre-define 2–3 success metrics before deployment. Use binary thresholds, not directional ones. Example: "Resolve 65% of tier-1 tickets autonomously within 90 seconds, or roll back." That metric either ships or doesn't. There's no debate.

Mistake #2: Insufficient Tool or Data Access (33% of failures)

The pattern: Agent has the model but not the integrations. It can reason about the customer but can't actually update the CRM record or process the refund. Becomes an expensive Q&A chatbot.

The fix: Build the integration map before picking the agent. If the deployment requires access to four systems-of-record and you've only secured two, the deployment will fail. The model is the easy part. The plumbing is where ROI dies.

Mistake #3: Drift in Evaluation Coverage (26% of failures)

The pattern: Initial deployment ships with manual QA. Prompts change, models update, behavior drifts. Nobody re-runs the evals. Quality regresses silently for three months until a customer complaint surfaces.

The fix: Automated evals on every prompt or model change. Forrester's data is unambiguous: deployments with full eval coverage roll back at 9%. Deployments without roll back at 47%. The eval pipeline is a load-bearing piece of infrastructure, not a nice-to-have.

Mistake #4: No Named Owner With Budget Authority

The pattern: Deployment is "shared ownership" between business unit and IT. When ROI questions arise, both sides assume the other is tracking. Neither is.

The fix: Name one owner. Give them the budget. Hold them accountable to the function-level benchmark. Of the 12% of pilots that successfully reach production, 94% have a named agent owner with budget authority. That's not coincidence (Forrester 2026 Success Factors).

Mistake #5: Scope Creep Past the Original Workflow

The pattern: Agent ships scoped to one workflow ("resolve tier-1 password resets"). Stakeholders ask: "can it also do X, Y, Z?" Scope expands. Success criteria blur. Three months in, the agent is mediocre at six things instead of excellent at one.

The fix: Keep agents narrow until they hit their function-level payback benchmark. Then expand. Of production-grade agents, 81% are scoped to a single workflow with binary success criteria. Multi-workflow agents are an advanced pattern, not a starter pattern.

The combined rule: Run every deployment through a five-question gate before launch. Are the success metrics binary? Are all integrations live? Are automated evals running? Is there one named owner? Is the scope single-workflow? If any answer is no, you're queuing up to join the 22%.

Case Study: Cigna + Sierra (8 Weeks, 80% Auth Time Reduction)

Cigna's deployment of Sierra's agent platform is a textbook example of the framework working as designed. Sierra's published case studies put the deployment at 8 weeks from kickoff to production—well inside the vendor agent 29–41 day median for time-to-first-value and ahead of the 4.1-month customer service payback benchmark (CMSWire May 2026).

The numbers:

  • 80% reduction in patient authentication time (the binary success metric)
  • Production scope: Patient authentication workflow only (single-workflow per Mistake #5)
  • 40% of inquiries resolved autonomously across the deployment
  • 28% improvement in issue resolution time
  • 19% increase in first-contact resolution

The deployment shipped narrow (authentication, not "patient experience"), had a binary success metric (auth time reduction percentage), and integrated with Cigna's existing systems-of-record before launch. That's the operational discipline that distinguishes the 12% that ship from the 88% that pilot forever.

Sierra's broader portfolio reinforces the pattern. Singtel hit 70%+ autonomous resolution within 10 weeks. Nordstrom launched its voice agent "Nora" in five weeks. The common thread isn't proprietary technology—it's narrow scope, binary metrics, and vendor-led integration speed.

What didn't work elsewhere: Uber's coding agent rollout, by contrast, hit positive results (10% of code autonomously generated) but "blew through" the AI budget in the process. Same model, same vendor capability—different scope discipline. The ROI math worked at the task level but failed at the portfolio level because the deployment expanded faster than the savings.

The takeaway: vendor agent ROI is now a process problem, not a technology problem. The vendors have shipped capable agents. The 22% failure rate sits inside the buyer organization. (Sierra ROI methodology context, OneReach 2026 — IBM realized $3.5B in cost savings with a 50% productivity increase using the same operational discipline.)

For a fuller breakdown of why the operationalization gap kills pilots, the recurring pattern is the same one Cigna avoided: narrow scope, named owner, binary metric.

What to Do About It

For CIOs (Next 30 Days)

  • Rebenchmark your active deployments against the Bain function-level medians. Anything tracking 2x slower than its function median is a candidate for course-correction or kill.
  • Audit your eval pipelines. If any production agent lacks automated evals on prompt and model changes, fix it this quarter. The 47% vs 9% rollback gap is too large to absorb.
  • Sequence by payback speed. Lead the 2026 roadmap with SDR, customer service, and marketing ops deployments. Use the savings to fund engineering and finance deployments later.

For CFOs (Next 90 Days)

  • Move from total AI spend to function-budgeted AI spend. Each function gets a payback expectation and a stage-gate. Anything missing its function median by 60 days triggers review.
  • Underwrite the 22% rule. Build the failure rate into your portfolio math. If you're running ten deployments, plan for two to lose money. Distribute the bets accordingly.
  • Track time savings to P&L impact, not Slack reports. Forrester's data on captured-vs-evaporated productivity is brutal. Most "hours saved" never materialize as P&L impact because the freed capacity isn't redirected.

For Business Leaders (Next 120 Days)

  • Default to vendor over build. The 2.4x time-to-value gap is now decisive on most use cases. Build only when there's a specific defensibility or compliance argument.
  • Name single owners with budget authority. Shared ownership is the most common failure mode in the Forrester data. Pick one accountable executive per deployment.
  • Ship narrow, expand later. Single-workflow scope is the operational pattern that distinguishes production agents from forever-pilots. Resist the "can it also do X" pressure during the first 90 days.

The data is finally good enough to act on. The 22% that lose money at 12 months aren't losing because the models can't do it. They're losing because the operating discipline isn't there. Both of those gaps are now fixable—if CIOs and CFOs treat the new benchmarks as a measurement contract, not a suggestion.


Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

AI Agent Payback: 3.4 Months for SDRs, 9.3 for Engineering

Photo by fauxels on Pexels

SDR agents pay back in 3.4 months. Engineering agents take 9.3. Clinical agents take 18.4. And 22% of all deployments lose money after a full year. That's the headline from a wave of new 2026 benchmarks—Bain's Agentic AI Benchmark 2026, BCG's payback survey, and Forrester's root-cause analysis—that just turned the "AI agent ROI" conversation from religion into arithmetic.

The numbers matter because every CFO is now asking the same two questions: When does this thing pay for itself, and how do I know we're not in the 22% that never does? The new data finally gives them defensible answers—if they know where to look. For CIOs, the same data is a forecasting tool. Pitch an SDR agent and you can credibly promise payback inside two quarters. Pitch an engineering agent and you're looking at three. Both are good investments. They just need different business cases.

This piece breaks down the new benchmarks by function, gives you an ROI calculator framework you can run today, and maps the five mistakes that push deployments into the negative-ROI bucket. Skip the part where you guess.

If you've been tracking the broader enterprise AI ROI crisis—where 95% of pilots fail to deliver measurable returns—the new benchmarks are how you escape it.

What Changed: The 2026 Payback Benchmarks

For two years, "AI agent ROI" was a vibes-based conversation. Vendors quoted hero metrics ("75% faster!"). CFOs countered with anecdata ("Uber blew through their whole 2026 budget by April"). Neither side had defensible benchmarks. The 2026 analyst cycle changed that.

Three new datasets dropped between Q1 and Q2 2026:

  • Bain Agentic AI Benchmark 2026 — Function-level payback medians across 800+ enterprise deployments
  • BCG Agentic AI Pulse 2026 — ROI realization rates at 6, 12, and 24-month marks
  • Forrester + Anaconda 2026 Survey — Root-cause analysis on negative-ROI deployments

The combined picture is sharper than anything we had in 2025. Here's what the data says.

The median payback period for an enterprise AI agent deployment is 5.1 months. That's the BCG/Forrester top-line, calculated across functions. But the function-level dispersion is enormous. Customer service agents pay back in 4.1 months. SDR (sales development) agents pay back in 3.4 months—the fastest in the dataset. Engineering agents take 9.3 months. Finance and operations agents land at 8.9 months. Legal hits 14.8 months. Clinical agents—the slowest—take 18.4 months (Bain Agentic AI Benchmark 2026).

Across the full dataset, 41% of deployments cross positive ROI within 12 months. Another 18% hit it inside six. But 22% are still underwater at the 12-month mark—and Forrester's root-cause work shows the failure pattern is almost never model capability (Forrester + Anaconda 2026 Survey).

The Forrester failure breakdown:

  • 41% of negative-ROI deployments traced back to unclear success criteria
  • 33% to insufficient tool or data access
  • 26% to drift in evaluation coverage (no automated evals on prompt/model changes)

That's a critical finding because it means the lever isn't model selection. It's operational discipline. The deployments that ship to production with a named owner, automated evals, and binary success criteria—those are the ones that hit payback inside two quarters. The deployments that drift into "let's see what it can do" land in the 22%.

Vendor agents reach positive ROI 2.4x faster than custom builds. Deloitte's State of Generative AI Q1 2026 puts vendor time-to-value at 29–41 days vs 89–118 days for in-house builds (Deloitte State of Generative AI Q1 2026). That gap is now too large to ignore on most use cases.

Why This Matters: Dual-Audience Implications

Technical Implications (CIO/CTO)

The function-level payback dispersion has direct architectural consequences. The fastest-paying functions—SDR, customer service, marketing operations—share three traits: high-volume repetitive tasks, well-instrumented success metrics, and existing integrations with systems-of-record (Salesforce, ServiceNow, HubSpot). The slowest-paying functions—legal, clinical, finance reconciliation—have low volume, high variance, and weaker instrumentation.

If you're a CIO sequencing your 2026 agent roadmap, the data tells you where to start. Don't lead with the highest-prestige use case. Lead with the function that pays back fastest, ships proof points, and funds the next wave. Forrester's data on rollback rates makes this concrete: agents shipped without automated evals roll back 47% of the time. Agents shipped with full eval coverage roll back 9% of the time. Eval coverage is now the single biggest predictor of production survival (Forrester Rollback Data).

There's a second technical implication around the vendor-vs-custom decision. The 2.4x time-to-value gap means the default answer is now "buy" unless you have a specific defensibility argument for "build." That flips the 2024 calculus, which favored custom because vendor agents were immature. Vendor agents like Sierra, Decagon, and Glean have matured enough that the custom premium is no longer worth the wait on most use cases.

Business Implications (CFO/COO/CMO)

For CFOs, this is the first dataset that supports defensible AI agent business cases. Until now, "show me the ROI" was a rhetorical question—the data didn't exist. Now you can underwrite a deployment against a known benchmark.

The CFO playbook flips three ways:

  1. Function-based budgeting. Don't budget by total AI spend. Budget by function-specific ROI expectations. An SDR agent that doesn't hit payback by month five is broken. A clinical agent that doesn't hit payback by month five is normal.

  2. Stage-gate by benchmark. If a deployment isn't tracking toward its function-level median by the 60-day mark, that's a signal to course-correct or kill. The data lets you set non-arbitrary stage gates.

  3. The 22% rule. Roughly one in five deployments will lose money at 12 months. Treat that as the failure rate in your portfolio math. Allocate accordingly.

For COOs, the productivity gains are now quantified by function. McKinsey's 2026 survey pegs the median knowledge worker at 6.4 hours saved per week, with customer service reps saving 8.7 hours and software engineers saving 11.3 hours (McKinsey Global AI Survey 2026). The question is no longer "do agents save time?" It's "did we capture the savings as P&L impact or did they evaporate into Slack?"

For CMOs, the marketing operations payback of 6.7 months is roughly mid-pack—not the fastest, not the slowest. But marketing also has the highest measurable cost-per-task reduction in some categories: long-form article drafting drops 156x ($640 to $4.10) and customer service tickets drop 9.1x ($4.18 to $0.46) (Master of Code 2026 Report).

Market Context: Why This Data Hit Now

The benchmarks arrived in the same quarter as a $950M raise by Sierra (Bret Taylor's enterprise AI agent platform, now at a $15B+ post-money valuation) and a $122B raise by OpenAI (TechCrunch, May 4, 2026). The capital surge is forcing the analyst community to publish defensible measurement frameworks. Without them, the next 18 months of agent deployments will run blind.

Sierra alone now claims 40% of the Fortune 50 as customers, with $150M+ in ARR after 24 months in market (CMSWire May 2026). Decagon, Glean, AI21, and Cognition (Devin, recently valued at $2.5B) round out the vendor side. On the hyperscaler side, Microsoft just shipped MAI-Code-1-Flash at Build 2026, Google launched a $100/month developer tier, and AWS continues to push Bedrock Agents. The vendor agent market is now real—and the new data on payback periods is the first apples-to-apples scoring system available to enterprise buyers.

Gartner's Q2 2026 forecast pushes the market toward 80% of enterprise applications embedding at least one AI agent by year-end, up from 33% in 2024. Production-grade deployments are also climbing fast: 9% in 2024 → 19% in 2025 → 31% in Q1 2026 (Gartner Q2 2026 Outlook). But Gartner's also warned that 40%+ of agentic AI projects will be canceled by end of 2027—driven by escalating costs, unclear ROI, and inadequate risk controls. The payback benchmarks are the missing measurement layer that determines which deployments survive the coming cull.

The IDC/McKinsey consensus forecast pegs total AI agent spend at $1.4 trillion by 2027 (IDC + McKinsey Consensus 2026). That's roughly the size of the entire global software industry in 2020. At that scale, "I'm sure it's working" is no longer an acceptable answer. The 22% failure rate, multiplied by $1.4T, is roughly $300B of negative-ROI spend over the next 24 months. That's the spend the payback benchmarks are designed to prevent.

For context on how token-side economics complicate this picture, see the $7M budget trap where token prices fell 98% but enterprise AI bills tripled. The payback benchmarks are the consumption-side counterweight to the cost-side explosion.

Framework #1: The Function-Level ROI Calculator

Use this framework to size any agent deployment against its 2026 benchmark. It works at three deployment sizes—small team pilot, mid-market scale-out, and enterprise rollout.

Inputs You Need

  1. Employee count affected by the deployment
  2. Fully-loaded hourly rate (use BLS 2026 default: base wage × 1.42)
  3. Current task time (minutes per transaction)
  4. Expected agent-assisted task time (use vendor benchmarks or 55% speedup default)
  5. Annual license + integration cost (use $50K SMB / $250K mid / $1.5M enterprise)
  6. Deployment scope (% of eligible workflows actually migrated)

Function-Specific Benchmarks (Bain 2026)

Function Median Payback Hours Saved/Week Cost-Per-Task Reduction
Sales Development (SDR) 3.4 months 5.4 4.8x
Customer Service 4.1 months 8.7 9.1x
Marketing Operations 6.7 months 6.1 12x (content)
IT Helpdesk 8.0 months 5.9 6.2x
Finance / Accounting 8.9 months 3.8 3.4x
Software Engineering 9.3 months 11.3 66x (code review)
Human Resources 11.2 months 4.6 2.4x
Legal 14.8 months 2.9 1.8x
Clinical 18.4 months 1.8 1.2x

Worked Example: SDR Agent for a Mid-Market SaaS

  • Inputs: 50 SDRs, $85/hour fully loaded, 35 min per outbound sequence (current), 12 min agent-assisted, $250K annual platform cost, 80% workflow scope
  • Annual time saved: 50 × 40 sequences/week × 52 weeks × 23 minutes × 0.80 / 60 = 31,893 hours
  • Gross savings: 31,893 × $85 = $2.71M
  • Net savings (after $250K license): $2.46M
  • Payback period: $250K ÷ ($2.46M ÷ 12) = 1.2 months

Even at the conservative end—50% adoption, 40% speedup—this deployment pays back in 3.1 months, right at the Bain SDR median.

Three-Size Sample Outputs

At the default 55% speedup and 80% scope:

Deployment Size Year-1 Net Benefit Payback 3-Year NPV (10% discount)
Small (50 employees) $2.46M 1.2 months $6.1M
Mid-market (500 employees) $26M 0.6 months $64M
Enterprise (5,000 employees) $268M 0.3 months $658M

Critical context: The single biggest swing factor isn't speedup percentage. It's deployment scope. McKinsey's 2025 finding—88% of enterprises pilot agents, only 6% scale them—means most AI budgets divide their projected savings by 4 or more in practice. A 25% scope deployment delivers roughly one-quarter of the modeled benefit. The ROI math doesn't fail. The rollout does.

Framework #2: The 5 Mistakes That Push You Into the 22%

Forrester's failure analysis identifies five recurring patterns in negative-ROI deployments. Each one is preventable—and each one maps to a specific operational fix.

Mistake #1: Unclear Success Criteria (41% of failures)

The pattern: Deployment ships without binary success metrics. Team "evaluates" the agent qualitatively. Six months in, nobody can answer "is this working?"

The fix: Pre-define 2–3 success metrics before deployment. Use binary thresholds, not directional ones. Example: "Resolve 65% of tier-1 tickets autonomously within 90 seconds, or roll back." That metric either ships or doesn't. There's no debate.

Mistake #2: Insufficient Tool or Data Access (33% of failures)

The pattern: Agent has the model but not the integrations. It can reason about the customer but can't actually update the CRM record or process the refund. Becomes an expensive Q&A chatbot.

The fix: Build the integration map before picking the agent. If the deployment requires access to four systems-of-record and you've only secured two, the deployment will fail. The model is the easy part. The plumbing is where ROI dies.

Mistake #3: Drift in Evaluation Coverage (26% of failures)

The pattern: Initial deployment ships with manual QA. Prompts change, models update, behavior drifts. Nobody re-runs the evals. Quality regresses silently for three months until a customer complaint surfaces.

The fix: Automated evals on every prompt or model change. Forrester's data is unambiguous: deployments with full eval coverage roll back at 9%. Deployments without roll back at 47%. The eval pipeline is a load-bearing piece of infrastructure, not a nice-to-have.

Mistake #4: No Named Owner With Budget Authority

The pattern: Deployment is "shared ownership" between business unit and IT. When ROI questions arise, both sides assume the other is tracking. Neither is.

The fix: Name one owner. Give them the budget. Hold them accountable to the function-level benchmark. Of the 12% of pilots that successfully reach production, 94% have a named agent owner with budget authority. That's not coincidence (Forrester 2026 Success Factors).

Mistake #5: Scope Creep Past the Original Workflow

The pattern: Agent ships scoped to one workflow ("resolve tier-1 password resets"). Stakeholders ask: "can it also do X, Y, Z?" Scope expands. Success criteria blur. Three months in, the agent is mediocre at six things instead of excellent at one.

The fix: Keep agents narrow until they hit their function-level payback benchmark. Then expand. Of production-grade agents, 81% are scoped to a single workflow with binary success criteria. Multi-workflow agents are an advanced pattern, not a starter pattern.

The combined rule: Run every deployment through a five-question gate before launch. Are the success metrics binary? Are all integrations live? Are automated evals running? Is there one named owner? Is the scope single-workflow? If any answer is no, you're queuing up to join the 22%.

Case Study: Cigna + Sierra (8 Weeks, 80% Auth Time Reduction)

Cigna's deployment of Sierra's agent platform is a textbook example of the framework working as designed. Sierra's published case studies put the deployment at 8 weeks from kickoff to production—well inside the vendor agent 29–41 day median for time-to-first-value and ahead of the 4.1-month customer service payback benchmark (CMSWire May 2026).

The numbers:

  • 80% reduction in patient authentication time (the binary success metric)
  • Production scope: Patient authentication workflow only (single-workflow per Mistake #5)
  • 40% of inquiries resolved autonomously across the deployment
  • 28% improvement in issue resolution time
  • 19% increase in first-contact resolution

The deployment shipped narrow (authentication, not "patient experience"), had a binary success metric (auth time reduction percentage), and integrated with Cigna's existing systems-of-record before launch. That's the operational discipline that distinguishes the 12% that ship from the 88% that pilot forever.

Sierra's broader portfolio reinforces the pattern. Singtel hit 70%+ autonomous resolution within 10 weeks. Nordstrom launched its voice agent "Nora" in five weeks. The common thread isn't proprietary technology—it's narrow scope, binary metrics, and vendor-led integration speed.

What didn't work elsewhere: Uber's coding agent rollout, by contrast, hit positive results (10% of code autonomously generated) but "blew through" the AI budget in the process. Same model, same vendor capability—different scope discipline. The ROI math worked at the task level but failed at the portfolio level because the deployment expanded faster than the savings.

The takeaway: vendor agent ROI is now a process problem, not a technology problem. The vendors have shipped capable agents. The 22% failure rate sits inside the buyer organization. (Sierra ROI methodology context, OneReach 2026 — IBM realized $3.5B in cost savings with a 50% productivity increase using the same operational discipline.)

For a fuller breakdown of why the operationalization gap kills pilots, the recurring pattern is the same one Cigna avoided: narrow scope, named owner, binary metric.

What to Do About It

For CIOs (Next 30 Days)

  • Rebenchmark your active deployments against the Bain function-level medians. Anything tracking 2x slower than its function median is a candidate for course-correction or kill.
  • Audit your eval pipelines. If any production agent lacks automated evals on prompt and model changes, fix it this quarter. The 47% vs 9% rollback gap is too large to absorb.
  • Sequence by payback speed. Lead the 2026 roadmap with SDR, customer service, and marketing ops deployments. Use the savings to fund engineering and finance deployments later.

For CFOs (Next 90 Days)

  • Move from total AI spend to function-budgeted AI spend. Each function gets a payback expectation and a stage-gate. Anything missing its function median by 60 days triggers review.
  • Underwrite the 22% rule. Build the failure rate into your portfolio math. If you're running ten deployments, plan for two to lose money. Distribute the bets accordingly.
  • Track time savings to P&L impact, not Slack reports. Forrester's data on captured-vs-evaporated productivity is brutal. Most "hours saved" never materialize as P&L impact because the freed capacity isn't redirected.

For Business Leaders (Next 120 Days)

  • Default to vendor over build. The 2.4x time-to-value gap is now decisive on most use cases. Build only when there's a specific defensibility or compliance argument.
  • Name single owners with budget authority. Shared ownership is the most common failure mode in the Forrester data. Pick one accountable executive per deployment.
  • Ship narrow, expand later. Single-workflow scope is the operational pattern that distinguishes production agents from forever-pilots. Resist the "can it also do X" pressure during the first 90 days.

The data is finally good enough to act on. The 22% that lose money at 12 months aren't losing because the models can't do it. They're losing because the operating discipline isn't there. Both of those gaps are now fixable—if CIOs and CFOs treat the new benchmarks as a measurement contract, not a suggestion.


Continue Reading

Share:

THE DAILY BRIEF

Enterprise AIAI ROIAI AgentsCFO StrategyAgentic AI

AI Agent Payback: 3.4 Months for SDRs, 9.3 for Engineering

BCG and Bain just dropped the 2026 AI agent payback benchmarks. 22% of deployments lose money. Here's the function-by-function ROI calculator.

By Rajesh Beri·June 7, 2026·15 min read

SDR agents pay back in 3.4 months. Engineering agents take 9.3. Clinical agents take 18.4. And 22% of all deployments lose money after a full year. That's the headline from a wave of new 2026 benchmarks—Bain's Agentic AI Benchmark 2026, BCG's payback survey, and Forrester's root-cause analysis—that just turned the "AI agent ROI" conversation from religion into arithmetic.

The numbers matter because every CFO is now asking the same two questions: When does this thing pay for itself, and how do I know we're not in the 22% that never does? The new data finally gives them defensible answers—if they know where to look. For CIOs, the same data is a forecasting tool. Pitch an SDR agent and you can credibly promise payback inside two quarters. Pitch an engineering agent and you're looking at three. Both are good investments. They just need different business cases.

This piece breaks down the new benchmarks by function, gives you an ROI calculator framework you can run today, and maps the five mistakes that push deployments into the negative-ROI bucket. Skip the part where you guess.

If you've been tracking the broader enterprise AI ROI crisis—where 95% of pilots fail to deliver measurable returns—the new benchmarks are how you escape it.

What Changed: The 2026 Payback Benchmarks

For two years, "AI agent ROI" was a vibes-based conversation. Vendors quoted hero metrics ("75% faster!"). CFOs countered with anecdata ("Uber blew through their whole 2026 budget by April"). Neither side had defensible benchmarks. The 2026 analyst cycle changed that.

Three new datasets dropped between Q1 and Q2 2026:

  • Bain Agentic AI Benchmark 2026 — Function-level payback medians across 800+ enterprise deployments
  • BCG Agentic AI Pulse 2026 — ROI realization rates at 6, 12, and 24-month marks
  • Forrester + Anaconda 2026 Survey — Root-cause analysis on negative-ROI deployments

The combined picture is sharper than anything we had in 2025. Here's what the data says.

The median payback period for an enterprise AI agent deployment is 5.1 months. That's the BCG/Forrester top-line, calculated across functions. But the function-level dispersion is enormous. Customer service agents pay back in 4.1 months. SDR (sales development) agents pay back in 3.4 months—the fastest in the dataset. Engineering agents take 9.3 months. Finance and operations agents land at 8.9 months. Legal hits 14.8 months. Clinical agents—the slowest—take 18.4 months (Bain Agentic AI Benchmark 2026).

Across the full dataset, 41% of deployments cross positive ROI within 12 months. Another 18% hit it inside six. But 22% are still underwater at the 12-month mark—and Forrester's root-cause work shows the failure pattern is almost never model capability (Forrester + Anaconda 2026 Survey).

The Forrester failure breakdown:

  • 41% of negative-ROI deployments traced back to unclear success criteria
  • 33% to insufficient tool or data access
  • 26% to drift in evaluation coverage (no automated evals on prompt/model changes)

That's a critical finding because it means the lever isn't model selection. It's operational discipline. The deployments that ship to production with a named owner, automated evals, and binary success criteria—those are the ones that hit payback inside two quarters. The deployments that drift into "let's see what it can do" land in the 22%.

Vendor agents reach positive ROI 2.4x faster than custom builds. Deloitte's State of Generative AI Q1 2026 puts vendor time-to-value at 29–41 days vs 89–118 days for in-house builds (Deloitte State of Generative AI Q1 2026). That gap is now too large to ignore on most use cases.

Why This Matters: Dual-Audience Implications

Technical Implications (CIO/CTO)

The function-level payback dispersion has direct architectural consequences. The fastest-paying functions—SDR, customer service, marketing operations—share three traits: high-volume repetitive tasks, well-instrumented success metrics, and existing integrations with systems-of-record (Salesforce, ServiceNow, HubSpot). The slowest-paying functions—legal, clinical, finance reconciliation—have low volume, high variance, and weaker instrumentation.

If you're a CIO sequencing your 2026 agent roadmap, the data tells you where to start. Don't lead with the highest-prestige use case. Lead with the function that pays back fastest, ships proof points, and funds the next wave. Forrester's data on rollback rates makes this concrete: agents shipped without automated evals roll back 47% of the time. Agents shipped with full eval coverage roll back 9% of the time. Eval coverage is now the single biggest predictor of production survival (Forrester Rollback Data).

There's a second technical implication around the vendor-vs-custom decision. The 2.4x time-to-value gap means the default answer is now "buy" unless you have a specific defensibility argument for "build." That flips the 2024 calculus, which favored custom because vendor agents were immature. Vendor agents like Sierra, Decagon, and Glean have matured enough that the custom premium is no longer worth the wait on most use cases.

Business Implications (CFO/COO/CMO)

For CFOs, this is the first dataset that supports defensible AI agent business cases. Until now, "show me the ROI" was a rhetorical question—the data didn't exist. Now you can underwrite a deployment against a known benchmark.

The CFO playbook flips three ways:

  1. Function-based budgeting. Don't budget by total AI spend. Budget by function-specific ROI expectations. An SDR agent that doesn't hit payback by month five is broken. A clinical agent that doesn't hit payback by month five is normal.

  2. Stage-gate by benchmark. If a deployment isn't tracking toward its function-level median by the 60-day mark, that's a signal to course-correct or kill. The data lets you set non-arbitrary stage gates.

  3. The 22% rule. Roughly one in five deployments will lose money at 12 months. Treat that as the failure rate in your portfolio math. Allocate accordingly.

For COOs, the productivity gains are now quantified by function. McKinsey's 2026 survey pegs the median knowledge worker at 6.4 hours saved per week, with customer service reps saving 8.7 hours and software engineers saving 11.3 hours (McKinsey Global AI Survey 2026). The question is no longer "do agents save time?" It's "did we capture the savings as P&L impact or did they evaporate into Slack?"

For CMOs, the marketing operations payback of 6.7 months is roughly mid-pack—not the fastest, not the slowest. But marketing also has the highest measurable cost-per-task reduction in some categories: long-form article drafting drops 156x ($640 to $4.10) and customer service tickets drop 9.1x ($4.18 to $0.46) (Master of Code 2026 Report).

Market Context: Why This Data Hit Now

The benchmarks arrived in the same quarter as a $950M raise by Sierra (Bret Taylor's enterprise AI agent platform, now at a $15B+ post-money valuation) and a $122B raise by OpenAI (TechCrunch, May 4, 2026). The capital surge is forcing the analyst community to publish defensible measurement frameworks. Without them, the next 18 months of agent deployments will run blind.

Sierra alone now claims 40% of the Fortune 50 as customers, with $150M+ in ARR after 24 months in market (CMSWire May 2026). Decagon, Glean, AI21, and Cognition (Devin, recently valued at $2.5B) round out the vendor side. On the hyperscaler side, Microsoft just shipped MAI-Code-1-Flash at Build 2026, Google launched a $100/month developer tier, and AWS continues to push Bedrock Agents. The vendor agent market is now real—and the new data on payback periods is the first apples-to-apples scoring system available to enterprise buyers.

Gartner's Q2 2026 forecast pushes the market toward 80% of enterprise applications embedding at least one AI agent by year-end, up from 33% in 2024. Production-grade deployments are also climbing fast: 9% in 2024 → 19% in 2025 → 31% in Q1 2026 (Gartner Q2 2026 Outlook). But Gartner's also warned that 40%+ of agentic AI projects will be canceled by end of 2027—driven by escalating costs, unclear ROI, and inadequate risk controls. The payback benchmarks are the missing measurement layer that determines which deployments survive the coming cull.

The IDC/McKinsey consensus forecast pegs total AI agent spend at $1.4 trillion by 2027 (IDC + McKinsey Consensus 2026). That's roughly the size of the entire global software industry in 2020. At that scale, "I'm sure it's working" is no longer an acceptable answer. The 22% failure rate, multiplied by $1.4T, is roughly $300B of negative-ROI spend over the next 24 months. That's the spend the payback benchmarks are designed to prevent.

For context on how token-side economics complicate this picture, see the $7M budget trap where token prices fell 98% but enterprise AI bills tripled. The payback benchmarks are the consumption-side counterweight to the cost-side explosion.

Framework #1: The Function-Level ROI Calculator

Use this framework to size any agent deployment against its 2026 benchmark. It works at three deployment sizes—small team pilot, mid-market scale-out, and enterprise rollout.

Inputs You Need

  1. Employee count affected by the deployment
  2. Fully-loaded hourly rate (use BLS 2026 default: base wage × 1.42)
  3. Current task time (minutes per transaction)
  4. Expected agent-assisted task time (use vendor benchmarks or 55% speedup default)
  5. Annual license + integration cost (use $50K SMB / $250K mid / $1.5M enterprise)
  6. Deployment scope (% of eligible workflows actually migrated)

Function-Specific Benchmarks (Bain 2026)

Function Median Payback Hours Saved/Week Cost-Per-Task Reduction
Sales Development (SDR) 3.4 months 5.4 4.8x
Customer Service 4.1 months 8.7 9.1x
Marketing Operations 6.7 months 6.1 12x (content)
IT Helpdesk 8.0 months 5.9 6.2x
Finance / Accounting 8.9 months 3.8 3.4x
Software Engineering 9.3 months 11.3 66x (code review)
Human Resources 11.2 months 4.6 2.4x
Legal 14.8 months 2.9 1.8x
Clinical 18.4 months 1.8 1.2x

Worked Example: SDR Agent for a Mid-Market SaaS

  • Inputs: 50 SDRs, $85/hour fully loaded, 35 min per outbound sequence (current), 12 min agent-assisted, $250K annual platform cost, 80% workflow scope
  • Annual time saved: 50 × 40 sequences/week × 52 weeks × 23 minutes × 0.80 / 60 = 31,893 hours
  • Gross savings: 31,893 × $85 = $2.71M
  • Net savings (after $250K license): $2.46M
  • Payback period: $250K ÷ ($2.46M ÷ 12) = 1.2 months

Even at the conservative end—50% adoption, 40% speedup—this deployment pays back in 3.1 months, right at the Bain SDR median.

Three-Size Sample Outputs

At the default 55% speedup and 80% scope:

Deployment Size Year-1 Net Benefit Payback 3-Year NPV (10% discount)
Small (50 employees) $2.46M 1.2 months $6.1M
Mid-market (500 employees) $26M 0.6 months $64M
Enterprise (5,000 employees) $268M 0.3 months $658M

Critical context: The single biggest swing factor isn't speedup percentage. It's deployment scope. McKinsey's 2025 finding—88% of enterprises pilot agents, only 6% scale them—means most AI budgets divide their projected savings by 4 or more in practice. A 25% scope deployment delivers roughly one-quarter of the modeled benefit. The ROI math doesn't fail. The rollout does.

Framework #2: The 5 Mistakes That Push You Into the 22%

Forrester's failure analysis identifies five recurring patterns in negative-ROI deployments. Each one is preventable—and each one maps to a specific operational fix.

Mistake #1: Unclear Success Criteria (41% of failures)

The pattern: Deployment ships without binary success metrics. Team "evaluates" the agent qualitatively. Six months in, nobody can answer "is this working?"

The fix: Pre-define 2–3 success metrics before deployment. Use binary thresholds, not directional ones. Example: "Resolve 65% of tier-1 tickets autonomously within 90 seconds, or roll back." That metric either ships or doesn't. There's no debate.

Mistake #2: Insufficient Tool or Data Access (33% of failures)

The pattern: Agent has the model but not the integrations. It can reason about the customer but can't actually update the CRM record or process the refund. Becomes an expensive Q&A chatbot.

The fix: Build the integration map before picking the agent. If the deployment requires access to four systems-of-record and you've only secured two, the deployment will fail. The model is the easy part. The plumbing is where ROI dies.

Mistake #3: Drift in Evaluation Coverage (26% of failures)

The pattern: Initial deployment ships with manual QA. Prompts change, models update, behavior drifts. Nobody re-runs the evals. Quality regresses silently for three months until a customer complaint surfaces.

The fix: Automated evals on every prompt or model change. Forrester's data is unambiguous: deployments with full eval coverage roll back at 9%. Deployments without roll back at 47%. The eval pipeline is a load-bearing piece of infrastructure, not a nice-to-have.

Mistake #4: No Named Owner With Budget Authority

The pattern: Deployment is "shared ownership" between business unit and IT. When ROI questions arise, both sides assume the other is tracking. Neither is.

The fix: Name one owner. Give them the budget. Hold them accountable to the function-level benchmark. Of the 12% of pilots that successfully reach production, 94% have a named agent owner with budget authority. That's not coincidence (Forrester 2026 Success Factors).

Mistake #5: Scope Creep Past the Original Workflow

The pattern: Agent ships scoped to one workflow ("resolve tier-1 password resets"). Stakeholders ask: "can it also do X, Y, Z?" Scope expands. Success criteria blur. Three months in, the agent is mediocre at six things instead of excellent at one.

The fix: Keep agents narrow until they hit their function-level payback benchmark. Then expand. Of production-grade agents, 81% are scoped to a single workflow with binary success criteria. Multi-workflow agents are an advanced pattern, not a starter pattern.

The combined rule: Run every deployment through a five-question gate before launch. Are the success metrics binary? Are all integrations live? Are automated evals running? Is there one named owner? Is the scope single-workflow? If any answer is no, you're queuing up to join the 22%.

Case Study: Cigna + Sierra (8 Weeks, 80% Auth Time Reduction)

Cigna's deployment of Sierra's agent platform is a textbook example of the framework working as designed. Sierra's published case studies put the deployment at 8 weeks from kickoff to production—well inside the vendor agent 29–41 day median for time-to-first-value and ahead of the 4.1-month customer service payback benchmark (CMSWire May 2026).

The numbers:

  • 80% reduction in patient authentication time (the binary success metric)
  • Production scope: Patient authentication workflow only (single-workflow per Mistake #5)
  • 40% of inquiries resolved autonomously across the deployment
  • 28% improvement in issue resolution time
  • 19% increase in first-contact resolution

The deployment shipped narrow (authentication, not "patient experience"), had a binary success metric (auth time reduction percentage), and integrated with Cigna's existing systems-of-record before launch. That's the operational discipline that distinguishes the 12% that ship from the 88% that pilot forever.

Sierra's broader portfolio reinforces the pattern. Singtel hit 70%+ autonomous resolution within 10 weeks. Nordstrom launched its voice agent "Nora" in five weeks. The common thread isn't proprietary technology—it's narrow scope, binary metrics, and vendor-led integration speed.

What didn't work elsewhere: Uber's coding agent rollout, by contrast, hit positive results (10% of code autonomously generated) but "blew through" the AI budget in the process. Same model, same vendor capability—different scope discipline. The ROI math worked at the task level but failed at the portfolio level because the deployment expanded faster than the savings.

The takeaway: vendor agent ROI is now a process problem, not a technology problem. The vendors have shipped capable agents. The 22% failure rate sits inside the buyer organization. (Sierra ROI methodology context, OneReach 2026 — IBM realized $3.5B in cost savings with a 50% productivity increase using the same operational discipline.)

For a fuller breakdown of why the operationalization gap kills pilots, the recurring pattern is the same one Cigna avoided: narrow scope, named owner, binary metric.

What to Do About It

For CIOs (Next 30 Days)

  • Rebenchmark your active deployments against the Bain function-level medians. Anything tracking 2x slower than its function median is a candidate for course-correction or kill.
  • Audit your eval pipelines. If any production agent lacks automated evals on prompt and model changes, fix it this quarter. The 47% vs 9% rollback gap is too large to absorb.
  • Sequence by payback speed. Lead the 2026 roadmap with SDR, customer service, and marketing ops deployments. Use the savings to fund engineering and finance deployments later.

For CFOs (Next 90 Days)

  • Move from total AI spend to function-budgeted AI spend. Each function gets a payback expectation and a stage-gate. Anything missing its function median by 60 days triggers review.
  • Underwrite the 22% rule. Build the failure rate into your portfolio math. If you're running ten deployments, plan for two to lose money. Distribute the bets accordingly.
  • Track time savings to P&L impact, not Slack reports. Forrester's data on captured-vs-evaporated productivity is brutal. Most "hours saved" never materialize as P&L impact because the freed capacity isn't redirected.

For Business Leaders (Next 120 Days)

  • Default to vendor over build. The 2.4x time-to-value gap is now decisive on most use cases. Build only when there's a specific defensibility or compliance argument.
  • Name single owners with budget authority. Shared ownership is the most common failure mode in the Forrester data. Pick one accountable executive per deployment.
  • Ship narrow, expand later. Single-workflow scope is the operational pattern that distinguishes production agents from forever-pilots. Resist the "can it also do X" pressure during the first 90 days.

The data is finally good enough to act on. The 22% that lose money at 12 months aren't losing because the models can't do it. They're losing because the operating discipline isn't there. Both of those gaps are now fixable—if CIOs and CFOs treat the new benchmarks as a measurement contract, not a suggestion.


Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Subscribe