The enterprise AI bill has arrived. The average large organization now spends $186 million annually on AI. Token consumption is up 13x since January 2025. Headcount dedicated to AI engineering has doubled in six months.
Yet when CFOs ask the obvious question—what did we get back?—most can't answer.
Only 8% of enterprises report achieving meaningful business returns from AI, according to KPMG's latest study. Deloitte found that 74% of organizations deployed AI to grow revenue, but only 20% have seen it happen. The disconnect isn't subtle. It's a $186 million measurement crisis that just became the #1 boardroom issue for 2026.
Botanu, a stealth-mode startup that emerged June 11, 2026, positions itself as "the COO for your AI agents." Their launch thesis: enterprises aren't failing because AI doesn't work. They're failing because they can't locate where their agents are working.
The problem isn't adoption. It's instrumentation. And the companies that fix it first will separate AI spending from AI waste.
The Measurement Problem: Activity ≠ Outcome
In the cloud era, every dollar spent could be traced to the workload, team, or product that drove it. Capacity scaled predictably. Costs increased with usage, but the math was linear and auditable.
AI breaks that model entirely.
The same task—qualifying a sales lead, summarizing a support ticket, generating a contract clause—can produce wildly different costs from one run to the next. Pricing has shifted from flat per-seat subscriptions to usage-based models that push volatility onto the buyer's invoice. By the time the bill arrives, no one can tell which agents were actually worth it.
"A single agent's cost is scattered across systems, each metered differently, each owned by a different team," said Deborah Jacob, Botanu's co-founder and CTO. "No one can see what one agent actually costs. But cost is only half the problem. The value an agent creates is just as scattered as its spend."
Here's what that looks like in practice:
- Model API costs: Anthropic, OpenAI, Google, Cohere—each with different pricing tiers, batching discounts, and caching strategies
- Infrastructure: Compute (GPU/TPU hours), storage (vector databases, embeddings), orchestration (agent frameworks)
- Tooling: Observability platforms, guardrail systems, retrieval augmentation services
- Labor: Prompt engineering, fine-tuning, evaluation, incident response
An enterprise CIO recently told me their AI spending was "somewhere between $4 million and $22 million" for Q1 2026. The 5.5x variance wasn't rounding error—it was instrumentation failure. They genuinely didn't know.
The Other Half: Measuring What Agents Delivered
Cost opacity is fixable with better telemetry. The harder problem is proving the agent created value commensurate with its cost.
Most AI observability tools measure activity metrics:
- Tokens consumed
- API calls made
- Latency per request
- Error rates
- Tool invocations
These are engineering KPIs. They tell you an agent was busy. They don't tell you if it closed the deal.
Botanu's co-founder and CEO Alina Vrsaljko calls this "token-maxxing"—optimizing for throughput, not outcomes. "A sales agent's job isn't to make calls," Vrsaljko said. "It's to create qualified leads. You should performance-manage it, not just cost-manage it."
The outcome measurement gap shows up in three forms:
- Revenue contribution: Did the agent increase pipeline, close deals, or upsell existing accounts?
- Cost avoidance: Did it eliminate manual labor, prevent churn, or reduce support escalations?
- Time compression: Did it accelerate contract review, deployment timelines, or compliance audits?
For a CFO, the question is singular: did the outcome justify the cost? That can't be answered from telemetry alone. The value an agent creates lives in the CRM, the ticketing system, the legal document repository—business systems that don't natively talk to AI observability platforms.
Why 72% of CEOs Now Own the AI Decision
BCG reported that 72% of CEOs now personally own AI investment decisions—up from 34% in 2024. That shift reflects urgency, but also accountability.
When the CTO owned AI, the success metric was deployment velocity: how many tools piloted, how fast rollouts happened. When the CEO owns it, the metric is business impact—and most don't have the instrumentation to measure it.
"We're spending confidently on AI," said M.G. Thibault, CFO-in-residence at Scale Venture Partners and leader of the Coterie CFO community. "What we're missing is a way to measure it that every CFO would recognize; a real KPI, not usage stats. That's the open space right now."
The urgency is financial, not philosophical. Ray Rike, CEO of Benchmarkit and host of the "AI to ROI" podcast, puts it bluntly: "Enterprises are running out of budget before they run out of enthusiasm. The discipline that's missing is simple to say and hard to do: measure outcomes, not activity, and connect costs to returns."
Outcome-Maxxing: Botanu's Framework
Botanu's platform reconstructs an agent's full digital footprint—across every model vendor, tool, and infrastructure layer—by reading telemetry, the systems-level record of activity. It then ties that footprint to where outcomes actually land: the CRM, the support ticketing system, the billing platform.
The comparison metric is labor equivalence: what the same job would cost a person to do.
Example scenario (sales development agent):
- Agent cost: $47,000/year (API tokens + infrastructure + orchestration)
- Human equivalent: $180,000/year (SDR salary + benefits + overhead)
- Outcome: 1,200 qualified leads generated (vs. 900 from human SDR)
- ROI: 33% more output at 74% lower cost = $133,000 annual value created
Example scenario (customer support agent):
- Agent cost: $92,000/year
- Human equivalent: 3 FTEs at $65,000 each = $195,000/year
- Outcome: 14,500 tickets resolved autonomously (87% resolution rate, 4.2/5 CSAT)
- ROI: 87% of workload automated at 53% lower cost = $103,000 annual value created
The difference between these examples and current practice: the outcome comes from the business system that owns it, not from what the agent reports about itself.
Deborah Jacob, Botanu's CTO, explained: "Activity is not outcome. A thousand tokens and ten tool calls tell you an agent was busy—not whether it closed the deal. We measure the result the business actually recorded, and weigh it against what it cost to get there. That's the one number a CFO can act on."
The CFO Playbook: 4 Steps to Fix ROI Measurement
If you're a CFO or finance leader accountable for AI spend with no ROI proof, here's the practical path forward:
1. Treat AI Agents Like Headcount, Not Software Licenses
Shift the mental model: An AI agent is a hire, not a subscription. You wouldn't pay an employee $180,000/year without tracking their output—don't do it for agents either.
Action: For every agent in production, define:
- Job description: What specific business outcome is this agent responsible for?
- Success metric: How do you measure whether it delivered that outcome? (Revenue, tickets closed, contracts reviewed, compliance flags raised)
- Cost cap: What's the maximum you'd pay for this outcome? (Use human labor cost as the baseline)
2. Instrument Business Systems, Not Just AI Telemetry
Current state: Most enterprises have excellent AI observability (tokens, latency, errors) but zero connection to business outcomes.
Action: Integrate AI activity logs with:
- CRM (Salesforce, HubSpot): Track which deals, leads, or accounts the agent touched
- Support systems (Zendesk, ServiceNow): Track which tickets the agent resolved autonomously
- Financial systems (NetSuite, Workday): Track cost savings, contract value, compliance automation
Example: If a contract review agent processes 400 agreements/month, connect that activity to the Legal Ops system that tracks cycle time, error rate, and value-at-risk reduction. The CFO can then compare agent cost ($8,000/month) to the outcome (40 hours of paralegal time saved = $12,000/month at $300/hour blended rate).
3. Separate "Token-Maxxing" From "Outcome-Maxxing"
Token-maxxing = engineering efficiency: Reducing API costs, optimizing prompts, caching embeddings, batching requests.
Outcome-maxxing = business efficiency: Producing more results (qualified leads, resolved tickets, approved contracts) per dollar spent.
Both matter, but they're not the same KPI. Token-maxxing is cost optimization. Outcome-maxxing is value creation.
Action: Create two dashboards:
- Engineering dashboard: Track cost-per-task, latency, error rates, token efficiency (owned by CTO)
- Business dashboard: Track outcome-per-dollar, labor equivalence, ROI by agent (owned by CFO)
The CFO dashboard is what goes to the board. The engineering dashboard is how you improve it.
4. Kill Agents That Don't Pay for Themselves
Most enterprises run pilots indefinitely without forcing a promotion-or-kill decision. If an agent can't prove ROI after 90 days in production, shut it down or redesign it.
Action: Set a quarterly AI agent review with three outcomes:
- Promote: Agent delivering >2x ROI → expand scope or deploy to more teams
- Redesign: Agent delivering 0.5-2x ROI → fix the job description, tooling, or data access
- Kill: Agent delivering <0.5x ROI → decommission and reallocate budget
Example: A lead qualification agent costs $52,000/year but only generates 340 qualified leads (vs. human SDR baseline of 900). That's 38% of expected output at 29% of human cost—technically cheaper, but underperforming. Options: (1) Fix the agent's data access to LinkedIn + CRM, (2) Redesign the job description to focus on inbound lead enrichment instead of outbound prospecting, or (3) Kill it and reallocate $52,000 to agents with proven ROI.
The Business Case: What Good Looks Like
Gurpreet Bal, CIO at BHI, summarized the urgency: "As we start running AI agents in production, proving the ROI of each one becomes immensely challenging."
Challenging, but not optional. The enterprises that crack ROI measurement in 2026 will gain three strategic advantages:
- Board confidence: CFOs can justify AI budgets with the same rigor as headcount or capital expenditures
- Optimization velocity: Kill underperforming agents faster, double down on high-ROI agents earlier
- Vendor leverage: Negotiate usage-based pricing with data on actual business value created (not just tokens consumed)
The companies that don't crack it will hit a budget ceiling. When AI spend reaches $200M-$300M annually with no ROI proof, finance teams start cutting indiscriminately. High-value agents get killed alongside low-value ones because no one can tell the difference.
Botanu's emergence signals the market recognizing this gap. Whether their specific platform becomes the standard or competitors emerge, the discipline itself—connecting AI cost to business outcome at the agent level—is now table stakes for enterprise AI.
The Bottom Line for CFOs and CIOs
The $186 million AI spend isn't the problem. The problem is not knowing whether you got $3 of value for every $1 you spent.
Here's the new CFO playbook:
- Treat agents like headcount (job descriptions, success metrics, cost caps)
- Instrument business systems, not just AI telemetry
- Separate token-maxxing (cost optimization) from outcome-maxxing (value creation)
- Kill agents that don't pay for themselves (quarterly review, promote/redesign/kill)
The measurement crisis is solvable. The tools are emerging. The discipline is definable. The enterprises that fix this in 2026 will separate AI spending from AI waste—and build the instrumentation to prove it to the board.
Continue Reading
Sources
- Botanu Emerges from Stealth, Reveals Enterprises Spend $186M Annually on AI With Little Proof of ROI - GlobeNewswire, June 11, 2026
- AI Pulse - KPMG
- State of AI in the Enterprise - Deloitte
