Uber blew through its entire 2026 AI coding budget by April. Microsoft revoked developer access to Claude Code six months after launch. One Fortune 500 company racked up a $500 million AI bill in a single month after forgetting to set usage limits. These aren't horror stories from failed pilots. These are production deployments at some of the world's most sophisticated technology companies.
The pattern is repeating everywhere: per-token AI costs have collapsed 98% since late 2022, yet enterprise AI bills have tripled. The average enterprise AI budget has grown from $1.2 million in 2024 to $7 million in 2026—a 320% increase. CFOs who thought they were buying efficiency are discovering they've purchased a consumption engine with no governor.
The Numbers That Don't Add Up
GPT-4-equivalent performance now costs roughly $0.40 per million tokens, down from $20 per million in late 2022. That's a 98% reduction in unit economics. By every traditional metric, costs should be plummeting. Instead, they're exploding.
The culprit is volume, specifically the explosive growth of agentic AI tools. According to Nicholas Arcolano, head of research at engineering management platform Jellyfish, per-developer token consumption has risen roughly 18.6 times in nine months. A simple workflow in 2023 cost about $0.04 per interaction. An orchestrated agentic system in 2026 costs roughly $1.20—about 30 times more per task.
Individual engineers at Microsoft were reportedly spending between $500 and $2,000 per month on tokens before the company pulled the plug. Engineers who used the most tokens were about twice as productive as lighter users, but they spent 10 times the tokens to get there. "Whether extreme spend pays off comes down to the ultimate business value of shipped code, which most companies still can't measure," Arcolano told TechCrunch.
The Agentic AI Consumption Problem
Agentic AI tools released since November 2025—including Anthropic's Claude Opus 4.5, OpenAI's GPT-5.1, and Google's Gemini 3 Pro—have fundamentally changed the economics of AI deployment. Unlike earlier assistive models that waited for explicit instructions, these agents orchestrate multi-step workflows, run background processes, and execute tasks in parallel. Each action consumes tokens.
A Priceline senior director told TechCrunch that a routine Cursor contract renewal came back four to five times more expensive than the previous year. "It's like the crack-cocaine epidemic," said Chris Reed, Priceline's senior director of IT finance. "They let you try it to get you hooked, and now you're kind of beholden to it."
Alexander Embiricos, OpenAI's head of enterprise, described the shift in customer conversations: "Six months ago, I would have a conversation with a customer and it would be all about 'What can it do? Is it good enough?' Now the conversations are about, 'We're spending so much. What visibility do you have? What token controls do you have?'"
From Token-Maxxing to Guardrails
J.R. Storment, executive director of the FinOps Foundation, described the inflection point bluntly: "In April and May, I started hearing from companies: 'Oh my god, we are 3x over our entire 2026 token budget and it's only April.' The whole conversation shifted from token-maxxing and 'go fast' to 'we need guardrails, how do we control this?'"
The challenge for CFOs and CIOs is that traditional cost management frameworks don't apply. Cloud spending is a hundreds-of-millions-of-rows-per-month data problem. Token consumption is a trillions-of-rows-per-month data problem, according to Storment. Most enterprises don't have the infrastructure to track, attribute, or optimize spending at that granularity.
Priceline has begun placing token limits on certain groups and is already seeing discrepancies between vendor-reported usage and internal tracking data. This is the early-stage playbook: hard caps, internal monitoring tools, and manual reconciliation. It's not scalable, but it's the best option available until standardized tools emerge.
The Business Case Problem: ROI vs. Productivity
The productivity gains are real but difficult to measure. Engineers using agentic tools ship code faster, but the correlation between token spend and business value is unclear. Most companies still can't connect AI spending to specific outcomes like revenue growth, customer retention, or operational efficiency.
This creates a strategic dilemma for CTOs and CFOs:
-
For CTOs: How do you justify continued AI investment when you can't prove ROI? Productivity metrics (lines of code, pull requests merged) don't translate to business outcomes. You need frameworks that tie AI spend to feature delivery, customer impact, and competitive positioning.
-
For CFOs: How do you budget for a cost center that grows 18.6x year-over-year with no ceiling in sight? Traditional CapEx/OpEx models don't work when consumption is both unpredictable and essential. You need cost-per-outcome metrics, not cost-per-token.
According to MIT's Project NANDA, 95% of enterprise generative AI pilots failed to deliver measurable ROI in 2025. McKinsey reported that 74% of enterprises struggle to scale AI beyond initial pilots. Gartner predicts that 40% of all agentic AI projects will be canceled by 2027. The pattern is clear: pilot projects are cheap, but production deployments are expensive and risky.
The Tokenomics Foundation: Industry Response
The Linux Foundation announced plans this week for the Tokenomics Foundation, a new standards body aimed at bringing the same cost discipline to AI tokens that FinOps brought to cloud spending. A formal launch is planned for July 2026.
The Foundation plans to build:
- A canonical definition of "tokenomics"
- Open standards for AI token usage and billing
- New metrics including cost-per-intelligence and tokens-per-watt
Nishant Gupta, chief availability officer at Salesforce, said in a statement that "token economics is fundamentally more abstract and opaque than anything we've managed at this scale before."
The challenge is timing. Goldman Sachs projects global token usage will multiply 24 times by 2030. The companies already over budget need solutions now, not in 18 months after standards are finalized and vendor support is implemented.
Model Routing: The Immediate Cost Lever
While the industry waits for standards, model routing has emerged as the primary cost optimization strategy. Factory, an enterprise AI coding startup, launched a model router this week that automatically picks the cheapest adequate model for each task. Instead of defaulting to GPT-4 or Claude Opus for every query, the system routes simple tasks to smaller models like Haiku or GPT-3.5.
Vitaly Gordon, CEO of Faros AI, said frontier labs are already doing this internally: "The financial report for how much you spend on Anthropic, even if you call the Opus model, some of the spend will be on Sonnet or Haiku, because they are smart enough to do it."
This is the enterprise playbook right now:
- Model routing: Route tasks to the cheapest adequate model
- Hard caps: Set monthly or weekly token limits per team or developer
- Internal monitoring: Build custom tracking tools to reconcile vendor data
- Fine-tuned smaller models: Train task-specific models that cost less to run
- Value-based pricing: Negotiate outcome-based contracts instead of consumption-based
The AT&T chief data officer noted in June 2026 that fine-tuned smaller language models will become crucial for mature AI enterprises due to their cost and performance advantages over larger general models.
What CFOs and CTOs Should Do Now
For CFOs:
-
Shift from cost-per-token to cost-per-outcome: Measure AI spending against business results (revenue per feature, cost per processed claim, support tickets resolved per dollar). Token efficiency is a technical metric; business efficiency is what matters.
-
Demand transparency from vendors: Your AI spend should be as auditable as your cloud spend. If your vendor can't provide usage data at the application or team level, that's a red flag.
-
Budget for 3-5x consumption growth: Agentic AI adoption is accelerating. The $7M average in 2026 could be $20M by 2027. Build contingency into your budgets now.
-
Pilot value-based pricing: Some vendors (Paid, Factory) are experimenting with outcome-based pricing instead of consumption-based. This shifts cost risk back to the vendor and aligns incentives.
For CTOs:
-
Implement model routing immediately: Don't default to the most expensive model for every task. Build routing logic that balances cost and quality. Open-source tools like LiteLLM and Portkey can help.
-
Track productivity AND cost together: You need metrics that connect AI spend to engineering output. Tools like Jellyfish, Waydev, and Faros AI provide agent monitoring to prove ROI.
-
Build token observability: Treat token consumption like you treat cloud spending. Datadog and New Relic have added token-level observability. Use it.
-
Set guardrails before budget blowouts: Hard caps at the team level prevent runaway spending. You can always raise limits; you can't un-spend money.
-
Invest in fine-tuned models: A task-specific model that costs 10% as much to run will pay for itself quickly at production scale.
The Strategic Shift: From Experimentation to Discipline
The era of "let's try everything" is over. The next phase of enterprise AI is about cost discipline, measurable ROI, and strategic vendor selection. Companies that treat AI as a line item in their innovation budget will lose to companies that treat it as a core business function with rigorous financial controls.
As Vitaly Gordon, CEO of Faros AI, put it: "Maybe we created a steam engine, but we still haven't figured out the assembly line."
The assembly line is coming. The Tokenomics Foundation, model routing, fine-tuned models, and value-based pricing are the first iterations. But CFOs and CTOs can't wait for industry standards to mature. The budget blowouts are happening now.
The companies that will win are the ones that build cost discipline into their AI strategy from day one—not as an afterthought, but as a core competitive advantage.
