Token Prices Fell 98%. Enterprise AI Bills Tripled. Here's Why.

Uber burned through its entire 2026 AI budget by April. Microsoft is canceling Claude Code licenses by June 30. One company ran up a $500 million Claude bill in a single month. Per-token prices have fallen 98% since 2022, but enterprise AI bills have risen 320%. The agentic multiplier — where autonomous AI tools consume 18.6x more tokens per developer — is creating the biggest cost crisis in enterprise technology since cloud computing. This article maps the five-layer Enterprise AI Cost Governance Stack and provides a FinOps Maturity Assessment for organizations navigating the token economy.

By Rajesh Beri·June 9, 2026·13 min read
Share:

THE DAILY BRIEF

AI coststoken economicsUber AI budgetMicrosoft Claude CodeFinOpsTokenomics Foundationagentic AI costsenterprise AI spendingAI cost governancemodel routingJevons Paradox

Token Prices Fell 98%. Enterprise AI Bills Tripled. Here's Why.

Uber burned through its entire 2026 AI budget by April. Microsoft is canceling Claude Code licenses by June 30. One company ran up a $500 million Claude bill in a single month. Per-token prices have fallen 98% since 2022, but enterprise AI bills have risen 320%. The agentic multiplier — where autonomous AI tools consume 18.6x more tokens per developer — is creating the biggest cost crisis in enterprise technology since cloud computing. This article maps the five-layer Enterprise AI Cost Governance Stack and provides a FinOps Maturity Assessment for organizations navigating the token economy.

By Rajesh Beri·June 9, 2026·13 min read

Uber burned through its entire 2026 AI budget by April. Microsoft is canceling Claude Code licenses across its Windows and Office division by June 30. One unnamed company ran up a $500 million Claude bill in a single month after forgetting to set usage limits.

These aren't edge cases. They're the first tremors of the biggest cost crisis in enterprise technology since the early days of cloud computing. And the paradox at the center of it should terrify every CIO who signed off on an AI budget this year: per-token prices have fallen 98% since 2022, but enterprise AI bills have risen 320%.

The math that was supposed to make AI cheaper just made it more dangerous.

I've spent the past two weeks tracking this crisis across the industry. The data is worse than the headlines suggest. Here's what's actually happening, why the old playbooks won't work, and what the companies that survive this will do differently.

The Consumption Trap: When Cheaper Means More Expensive

Let's start with the number that explains everything else.

In late 2022, GPT-4-equivalent performance cost roughly $20 per million tokens. Today, it costs about $0.40 per million tokens. That's a 98% reduction. By every traditional IT budgeting model, enterprise AI bills should have collapsed.

Instead, the average enterprise AI budget has grown from $1.2 million per year in 2024 to $7 million in 2026 (FinOps Foundation, State of FinOps 2026). And 73% of enterprises report their actual AI costs exceeded even those inflated projections (FinOps X 2026 Keynote).

The culprit isn't the price per token. It's the number of tokens per task.

A standard linear AI workflow in 2023 — summarize this document, answer this question — consumed about $0.04 worth of tokens per interaction. A 2026 agentic system — where AI agents plan, execute multi-step workflows, call tools, and coordinate with other agents — costs roughly $1.20 per interaction. That's a 30x increase per task (The Next Web).

Per-developer token consumption has risen 18.6x in nine months, driven by agentic coding tools released since November 2025: Anthropic's Claude Code, Cursor's agent mode, and similar tools that don't just suggest code — they write, test, debug, and refactor autonomously. Each autonomous loop burns through tokens at a rate that would have been inconceivable under the old chat-and-respond model.

Here's the trap: engineers who consume the most tokens are 2x more productive. But they spend 10x more tokens to get there. The ROI is positive — individually. At enterprise scale, with thousands of engineers, the aggregate bill is catastrophic.

The Body Count: Uber, Microsoft, and the $500 Million Month

Uber: Budget Gone by April

Uber deployed Claude Code and Cursor to approximately 5,000 engineers and did what any aggressive tech company would do — encouraged maximum adoption. Internal leaderboards ranked AI usage competitively. The culture was "use AI as much as possible."

It worked. By April, Uber's CTO Praveen Neppalli Naga told leadership he'd blown through the entire 2026 AI tools budget. Per-engineer API costs were running between $500 and $2,000 per month, with monthly usage rates hitting 84-95% across the engineering organization (TechCrunch).

Uber's response: a $1,500 monthly cap per employee per agentic coding tool. Every engineer now has a usage dashboard. Exceeding the cap requires a formal request.

The cap is a band-aid. It tells you Uber doesn't have a cost governance framework — it has a spending ceiling bolted on after the fact. There's a difference.

Microsoft: Killing Claude Code by June 30

The irony is thick. Microsoft — the company that invested $13 billion in OpenAI and sells Copilot as the future of AI-assisted development — had to cancel Anthropic's Claude Code licenses because its own engineers preferred it. Claude Code had become "perhaps a little too popular" inside Microsoft's Experiences and Devices division, the group responsible for Windows, Microsoft 365, Outlook, Teams, and Surface (The Next Web).

On May 14, engineers across the division received notice: Claude Code licenses expire June 30. Switch to GitHub Copilot CLI.

The official reason is "toolchain unification." The actual reason is fiscal-year-end accounting. When thousands of engineers use a competitor's token-billed product daily, every prompt, code review, and debugging session compounds into a line item that surprises finance teams at exactly the wrong moment.

The $500 Million Month

Then there's the company that makes Uber and Microsoft look disciplined.

An unnamed enterprise — Axios broke the story, and speculation centers on a major tech company, possibly Amazon — ran up a $500 million Claude bill in a single month. The root cause: no usage limits were set on employee licenses. Developers ran extended autonomous coding sessions. AI agents executed chained workflows. Employees used expensive frontier models for tasks that didn't require them — including, reportedly, checking the weather (Tom's Hardware).

Five hundred million dollars. One month. No guardrails.

Chris Reed, Senior Director of IT Finance at Priceline, described the broader situation to TechCrunch with a comparison that's uncomfortable but accurate: "It's like the crack-cocaine epidemic." Priceline's own Cursor contract renewal came back 4-5x more expensive than the original deal (TechCrunch).

Why "Just Set a Budget" Doesn't Work: The Agentic Multiplier

To understand why traditional IT budgeting fails for AI, you need to understand what changed in November 2025.

Before November 2025, most enterprise AI was chat-based. A developer asked a question, got an answer, maybe iterated once or twice. Token consumption was predictable — roughly proportional to the number of employees using the tool.

Then agentic coding tools launched. Anthropic released Claude Code. Cursor shipped agent mode. OpenAI pushed GPT-5.1 with extended autonomous execution. Google released Gemini 3 Pro with multi-step tool use. Suddenly, one developer action — "refactor this module" or "add tests for this service" — could trigger dozens of sequential LLM calls as the agent planned, executed, tested, debugged, and iterated.

The consumption model flipped from linear to exponential. And it hit enterprise budgets like a freight train because most AI budgets were built on 2024 consumption patterns applied to 2026 agentic workloads.

Here's a concrete example of the math. If you have 5,000 engineers (Uber's scale) and each generates an average of $1,000/month in token costs (the midpoint of reported ranges), that's $5 million per month, or $60 million annually — for coding tools alone, before you count any other AI workloads. And that's at 2026 prices. Goldman Sachs projects global token usage will multiply 24x by 2030. If the pattern holds, we're not at the end of the cost crisis. We're at the beginning.

What we're witnessing also has a historical name: the Jevons Paradox. In 1865, economist William Stanley Jevons observed that as coal engines became more efficient, total coal consumption increased rather than decreased. Efficiency made coal useful for more applications, which drove aggregate demand beyond what the efficiency gains saved. Every 10x reduction in per-token cost unlocks a new class of AI use cases — from simple chat to autonomous agents to multi-agent orchestration — and each class consumes orders of magnitude more tokens per task.

The companies that recognized this dynamic early built cost governance into their AI platforms from day one. The companies that didn't are the ones making headlines now.

Framework 1: The Enterprise AI Cost Governance Stack

Based on tracking what's working across the companies that aren't blowing up their budgets, here's the five-layer framework every enterprise needs:

Layer 1: Visibility (Where Is the Money Going?)

Every LLM API call must carry metadata identifying the feature, team, business process, and model it serves. Without this, you're flying blind. Most enterprises that blew their budgets couldn't tell you which team or use case consumed the most tokens until it was too late.

Minimum viable implementation:

  • Tag every API call with team, project, and use case
  • Real-time dashboards per team and per developer
  • Weekly cost reports to engineering leadership

Tools in this space: Pay-i, Datadog AI Observability, New Relic AI Monitoring, Helicone

Layer 2: Controls (How Do We Stop the Bleeding?)

Four controls fix 80% of cost blowouts:

  1. Per-user token limits — Uber's $1,500 cap, but implemented proactively
  2. Per-team monthly budgets — allocated based on use case value, not headcount
  3. Model access policies — not every task needs a frontier model
  4. Automated threshold alerts — at 50%, 80%, and 100% of budget

Critical insight: Controls without routing are just rationing. You need Layer 3.

Layer 3: Intelligent Routing (Right Model for the Right Task)

This is where the 60-80% cost reduction lives. Most enterprise AI tasks — summarization, classification, simple Q&A — can run on budget-tier models at $0.10-$1 per million tokens without meaningful quality loss. Frontier models at $15-$30+ per million tokens should be reserved for complex reasoning, multi-step agents, and tasks where quality directly impacts revenue.

The "token maxing" problem: Without routing governance, developers default to the most capable model for everything. This is the organizational equivalent of taking a Ferrari to the grocery store.

Tools in this space: Factory (model routing), Martian, OpenRouter, enterprise API gateways with model selection logic

Layer 4: Optimization (Squeeze More Value Per Token)

  • Prompt engineering: Structured prompts consume 30-50% fewer tokens than conversational ones
  • Context management: Pass summarized context, not raw conversation history, across multi-turn workflows
  • Caching: Identical or near-identical requests should hit a cache, not a model
  • Agent architecture: Limit autonomous loop depth. Set step limits and budget thresholds per agent workflow

Layer 5: Governance (The Ongoing Discipline)

  • Cost-per-output metrics: Not cost-per-token, but cost-per-resolved-ticket, cost-per-accepted-code-suggestion, cost-per-summarized-document
  • Quarterly model reviews: As new, cheaper models launch (and they launch monthly now), evaluate whether workloads can shift down
  • FinOps integration: AI cost management is not a separate discipline. It's the next chapter of FinOps

Framework 2: The AI FinOps Maturity Assessment

Score your organization 1-5 on each dimension. If your total is below 15, you're in the blast radius.

Dimension Level 1 (Crisis) Level 3 (Managed) Level 5 (Optimized)
Visibility No per-team token tracking Per-team dashboards, weekly reports Real-time per-request attribution with business context
Budget Controls No limits set (the $500M scenario) Per-user caps, manual approval for overages Dynamic budgets tied to business value metrics
Model Routing Everyone uses frontier models Manual model selection guidelines Automated routing engine with quality/cost optimization
Cost Attribution "AI spend" is one line item Cost allocated to teams Cost-per-business-outcome tracked and optimized
Governance Reactive — fix after blowout Quarterly reviews, dedicated AI FinOps role Continuous optimization, automated rightsizing, Tokenomics standards

Scoring:

  • 5-10: Emergency mode. You are one autonomous coding sprint away from a budget crisis. Implement Layers 1-2 this week.
  • 11-15: Managed but fragile. You'll survive 2026 but you're leaving 50%+ cost savings on the table. Prioritize Layer 3 (routing).
  • 16-20: Competitive. You have governance. Focus on cost-per-outcome metrics and preparing for the agentic multiplier.
  • 21-25: Leading. You're ready for the 24x token growth Goldman Sachs projects. Share your playbook — the industry needs it.

The Industry Response: The Tokenomics Foundation

The scale of this crisis has prompted institutional action. At FinOps X 2026, the FinOps Foundation and Linux Foundation announced the formation of the Tokenomics Foundation, a new open standards body with a formal launch planned for July 2026.

The founding supporters read like a who's who of enterprise tech: Oracle, Google, Microsoft, Accenture, Booking.com, Flexera, IBM, JPMorgan Chase, KPMG, Nebius, Salesforce, SAP, and ServiceNow.

The Foundation's planned deliverables include:

  • Canonical tokenomics definitions — standardizing how tokens are measured, reported, and compared across providers
  • Open billing standards — so enterprises can compare apples to apples across OpenAI, Anthropic, Google, and open-source models
  • New metrics: cost-per-intelligence and tokens-per-watt — measuring value and efficiency, not just consumption

As J.R. Storment of the FinOps Foundation described the shift: the conversation has moved from "go fast" to "we need guardrails."

And the signal that matters most came from OpenAI itself. Alexander Embiricos, OpenAI's Head of Enterprise, told TechCrunch: "Our conversations are never about capability anymore. Now the conversations are about spending, visibility, auditability, token controls" (TechCrunch).

When the provider tells you the conversation has shifted from "what can AI do?" to "how much is this costing us?" — the market has turned.

The Emerging AI FinOps Market

The crisis is creating an entirely new category of enterprise tooling. Pay-i, Helicone, and Keywords AI focus specifically on LLM cost tracking. Jellyfish, Waydev, and Faros AI approach it from the engineering productivity angle — tying token spend to developer output. Datadog and New Relic have both launched AI-specific observability modules that track token consumption alongside traditional infrastructure metrics.

Factory is betting on model routing as the core value proposition — automatically selecting the cheapest model that meets quality thresholds for each request. Martian and OpenRouter offer similar capabilities. The thesis: if 80% of enterprise AI requests can be served by budget-tier models without quality degradation, routing alone can cut bills by 60-80%.

The governance layer is more nascent. Elvex, Liminal, and TrueFoundry offer AI governance frameworks that include cost controls alongside safety and compliance guardrails. Enterprise API gateways from Kong and Apigee are adding model-selection and cost-allocation features.

For any enterprise starting from zero, the priority order is: visibility first (you can't manage what you can't measure), then controls (per-user and per-team limits), then routing (automate model selection), then optimization (prompt engineering, caching, agent architecture). Skip straight to optimization without visibility and you're optimizing blind.

What Happens Next

Gartner projects that 25% of planned 2026 AI budgets will slip into 2027 as proofs of concept stall in procurement pipelines. Over 40% of agentic AI projects will be canceled by end of 2027. Total worldwide AI spend is forecast to hit $2.59 trillion in 2026, up 47% year-over-year (Gartner). AI software spending alone is projected at $453 billion in 2026, growing another 41% to $638 billion in 2027.

The money is flowing. The question is whether it flows into value or into token bills that nobody approved.

Here's what I'm watching:

  1. The Tokenomics Foundation's July launch. If it delivers open billing standards quickly, it could do for AI costs what FinOps did for cloud costs. If it gets bogged down in committee, enterprises are on their own.

  2. Uber's cap experiment. If $1,500/month/tool proves too restrictive and productivity drops, other companies will hesitate to set limits. If it works, it becomes the template.

  3. The agentic multiplier. Every company building multi-agent systems needs to model the cost curve, not the cost point. An agent that costs $1.20 per task today will cost $3.60 when it orchestrates three sub-agents tomorrow. The compounding is multiplicative, not linear.

  4. Model routing becoming table stakes. The companies that treat this as optional will be the next Uber. The ones that build it into their AI platform from day one will save 60-80%.

The flat-fee era of AI is over. The era of AI cost governance has begun. The enterprises that build the governance stack now will be the ones still running AI programs in 2028. The ones that don't will be the cautionary tales in next year's FinOps X keynote.


Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Uber burned through its entire 2026 AI budget by April. Microsoft is canceling Claude Code licenses across its Windows and Office division by June 30. One unnamed company ran up a $500 million Claude bill in a single month after forgetting to set usage limits.

These aren't edge cases. They're the first tremors of the biggest cost crisis in enterprise technology since the early days of cloud computing. And the paradox at the center of it should terrify every CIO who signed off on an AI budget this year: per-token prices have fallen 98% since 2022, but enterprise AI bills have risen 320%.

The math that was supposed to make AI cheaper just made it more dangerous.

I've spent the past two weeks tracking this crisis across the industry. The data is worse than the headlines suggest. Here's what's actually happening, why the old playbooks won't work, and what the companies that survive this will do differently.

The Consumption Trap: When Cheaper Means More Expensive

Let's start with the number that explains everything else.

In late 2022, GPT-4-equivalent performance cost roughly $20 per million tokens. Today, it costs about $0.40 per million tokens. That's a 98% reduction. By every traditional IT budgeting model, enterprise AI bills should have collapsed.

Instead, the average enterprise AI budget has grown from $1.2 million per year in 2024 to $7 million in 2026 (FinOps Foundation, State of FinOps 2026). And 73% of enterprises report their actual AI costs exceeded even those inflated projections (FinOps X 2026 Keynote).

The culprit isn't the price per token. It's the number of tokens per task.

A standard linear AI workflow in 2023 — summarize this document, answer this question — consumed about $0.04 worth of tokens per interaction. A 2026 agentic system — where AI agents plan, execute multi-step workflows, call tools, and coordinate with other agents — costs roughly $1.20 per interaction. That's a 30x increase per task (The Next Web).

Per-developer token consumption has risen 18.6x in nine months, driven by agentic coding tools released since November 2025: Anthropic's Claude Code, Cursor's agent mode, and similar tools that don't just suggest code — they write, test, debug, and refactor autonomously. Each autonomous loop burns through tokens at a rate that would have been inconceivable under the old chat-and-respond model.

Here's the trap: engineers who consume the most tokens are 2x more productive. But they spend 10x more tokens to get there. The ROI is positive — individually. At enterprise scale, with thousands of engineers, the aggregate bill is catastrophic.

The Body Count: Uber, Microsoft, and the $500 Million Month

Uber: Budget Gone by April

Uber deployed Claude Code and Cursor to approximately 5,000 engineers and did what any aggressive tech company would do — encouraged maximum adoption. Internal leaderboards ranked AI usage competitively. The culture was "use AI as much as possible."

It worked. By April, Uber's CTO Praveen Neppalli Naga told leadership he'd blown through the entire 2026 AI tools budget. Per-engineer API costs were running between $500 and $2,000 per month, with monthly usage rates hitting 84-95% across the engineering organization (TechCrunch).

Uber's response: a $1,500 monthly cap per employee per agentic coding tool. Every engineer now has a usage dashboard. Exceeding the cap requires a formal request.

The cap is a band-aid. It tells you Uber doesn't have a cost governance framework — it has a spending ceiling bolted on after the fact. There's a difference.

Microsoft: Killing Claude Code by June 30

The irony is thick. Microsoft — the company that invested $13 billion in OpenAI and sells Copilot as the future of AI-assisted development — had to cancel Anthropic's Claude Code licenses because its own engineers preferred it. Claude Code had become "perhaps a little too popular" inside Microsoft's Experiences and Devices division, the group responsible for Windows, Microsoft 365, Outlook, Teams, and Surface (The Next Web).

On May 14, engineers across the division received notice: Claude Code licenses expire June 30. Switch to GitHub Copilot CLI.

The official reason is "toolchain unification." The actual reason is fiscal-year-end accounting. When thousands of engineers use a competitor's token-billed product daily, every prompt, code review, and debugging session compounds into a line item that surprises finance teams at exactly the wrong moment.

The $500 Million Month

Then there's the company that makes Uber and Microsoft look disciplined.

An unnamed enterprise — Axios broke the story, and speculation centers on a major tech company, possibly Amazon — ran up a $500 million Claude bill in a single month. The root cause: no usage limits were set on employee licenses. Developers ran extended autonomous coding sessions. AI agents executed chained workflows. Employees used expensive frontier models for tasks that didn't require them — including, reportedly, checking the weather (Tom's Hardware).

Five hundred million dollars. One month. No guardrails.

Chris Reed, Senior Director of IT Finance at Priceline, described the broader situation to TechCrunch with a comparison that's uncomfortable but accurate: "It's like the crack-cocaine epidemic." Priceline's own Cursor contract renewal came back 4-5x more expensive than the original deal (TechCrunch).

Why "Just Set a Budget" Doesn't Work: The Agentic Multiplier

To understand why traditional IT budgeting fails for AI, you need to understand what changed in November 2025.

Before November 2025, most enterprise AI was chat-based. A developer asked a question, got an answer, maybe iterated once or twice. Token consumption was predictable — roughly proportional to the number of employees using the tool.

Then agentic coding tools launched. Anthropic released Claude Code. Cursor shipped agent mode. OpenAI pushed GPT-5.1 with extended autonomous execution. Google released Gemini 3 Pro with multi-step tool use. Suddenly, one developer action — "refactor this module" or "add tests for this service" — could trigger dozens of sequential LLM calls as the agent planned, executed, tested, debugged, and iterated.

The consumption model flipped from linear to exponential. And it hit enterprise budgets like a freight train because most AI budgets were built on 2024 consumption patterns applied to 2026 agentic workloads.

Here's a concrete example of the math. If you have 5,000 engineers (Uber's scale) and each generates an average of $1,000/month in token costs (the midpoint of reported ranges), that's $5 million per month, or $60 million annually — for coding tools alone, before you count any other AI workloads. And that's at 2026 prices. Goldman Sachs projects global token usage will multiply 24x by 2030. If the pattern holds, we're not at the end of the cost crisis. We're at the beginning.

What we're witnessing also has a historical name: the Jevons Paradox. In 1865, economist William Stanley Jevons observed that as coal engines became more efficient, total coal consumption increased rather than decreased. Efficiency made coal useful for more applications, which drove aggregate demand beyond what the efficiency gains saved. Every 10x reduction in per-token cost unlocks a new class of AI use cases — from simple chat to autonomous agents to multi-agent orchestration — and each class consumes orders of magnitude more tokens per task.

The companies that recognized this dynamic early built cost governance into their AI platforms from day one. The companies that didn't are the ones making headlines now.

Framework 1: The Enterprise AI Cost Governance Stack

Based on tracking what's working across the companies that aren't blowing up their budgets, here's the five-layer framework every enterprise needs:

Layer 1: Visibility (Where Is the Money Going?)

Every LLM API call must carry metadata identifying the feature, team, business process, and model it serves. Without this, you're flying blind. Most enterprises that blew their budgets couldn't tell you which team or use case consumed the most tokens until it was too late.

Minimum viable implementation:

  • Tag every API call with team, project, and use case
  • Real-time dashboards per team and per developer
  • Weekly cost reports to engineering leadership

Tools in this space: Pay-i, Datadog AI Observability, New Relic AI Monitoring, Helicone

Layer 2: Controls (How Do We Stop the Bleeding?)

Four controls fix 80% of cost blowouts:

  1. Per-user token limits — Uber's $1,500 cap, but implemented proactively
  2. Per-team monthly budgets — allocated based on use case value, not headcount
  3. Model access policies — not every task needs a frontier model
  4. Automated threshold alerts — at 50%, 80%, and 100% of budget

Critical insight: Controls without routing are just rationing. You need Layer 3.

Layer 3: Intelligent Routing (Right Model for the Right Task)

This is where the 60-80% cost reduction lives. Most enterprise AI tasks — summarization, classification, simple Q&A — can run on budget-tier models at $0.10-$1 per million tokens without meaningful quality loss. Frontier models at $15-$30+ per million tokens should be reserved for complex reasoning, multi-step agents, and tasks where quality directly impacts revenue.

The "token maxing" problem: Without routing governance, developers default to the most capable model for everything. This is the organizational equivalent of taking a Ferrari to the grocery store.

Tools in this space: Factory (model routing), Martian, OpenRouter, enterprise API gateways with model selection logic

Layer 4: Optimization (Squeeze More Value Per Token)

  • Prompt engineering: Structured prompts consume 30-50% fewer tokens than conversational ones
  • Context management: Pass summarized context, not raw conversation history, across multi-turn workflows
  • Caching: Identical or near-identical requests should hit a cache, not a model
  • Agent architecture: Limit autonomous loop depth. Set step limits and budget thresholds per agent workflow

Layer 5: Governance (The Ongoing Discipline)

  • Cost-per-output metrics: Not cost-per-token, but cost-per-resolved-ticket, cost-per-accepted-code-suggestion, cost-per-summarized-document
  • Quarterly model reviews: As new, cheaper models launch (and they launch monthly now), evaluate whether workloads can shift down
  • FinOps integration: AI cost management is not a separate discipline. It's the next chapter of FinOps

Framework 2: The AI FinOps Maturity Assessment

Score your organization 1-5 on each dimension. If your total is below 15, you're in the blast radius.

Dimension Level 1 (Crisis) Level 3 (Managed) Level 5 (Optimized)
Visibility No per-team token tracking Per-team dashboards, weekly reports Real-time per-request attribution with business context
Budget Controls No limits set (the $500M scenario) Per-user caps, manual approval for overages Dynamic budgets tied to business value metrics
Model Routing Everyone uses frontier models Manual model selection guidelines Automated routing engine with quality/cost optimization
Cost Attribution "AI spend" is one line item Cost allocated to teams Cost-per-business-outcome tracked and optimized
Governance Reactive — fix after blowout Quarterly reviews, dedicated AI FinOps role Continuous optimization, automated rightsizing, Tokenomics standards

Scoring:

  • 5-10: Emergency mode. You are one autonomous coding sprint away from a budget crisis. Implement Layers 1-2 this week.
  • 11-15: Managed but fragile. You'll survive 2026 but you're leaving 50%+ cost savings on the table. Prioritize Layer 3 (routing).
  • 16-20: Competitive. You have governance. Focus on cost-per-outcome metrics and preparing for the agentic multiplier.
  • 21-25: Leading. You're ready for the 24x token growth Goldman Sachs projects. Share your playbook — the industry needs it.

The Industry Response: The Tokenomics Foundation

The scale of this crisis has prompted institutional action. At FinOps X 2026, the FinOps Foundation and Linux Foundation announced the formation of the Tokenomics Foundation, a new open standards body with a formal launch planned for July 2026.

The founding supporters read like a who's who of enterprise tech: Oracle, Google, Microsoft, Accenture, Booking.com, Flexera, IBM, JPMorgan Chase, KPMG, Nebius, Salesforce, SAP, and ServiceNow.

The Foundation's planned deliverables include:

  • Canonical tokenomics definitions — standardizing how tokens are measured, reported, and compared across providers
  • Open billing standards — so enterprises can compare apples to apples across OpenAI, Anthropic, Google, and open-source models
  • New metrics: cost-per-intelligence and tokens-per-watt — measuring value and efficiency, not just consumption

As J.R. Storment of the FinOps Foundation described the shift: the conversation has moved from "go fast" to "we need guardrails."

And the signal that matters most came from OpenAI itself. Alexander Embiricos, OpenAI's Head of Enterprise, told TechCrunch: "Our conversations are never about capability anymore. Now the conversations are about spending, visibility, auditability, token controls" (TechCrunch).

When the provider tells you the conversation has shifted from "what can AI do?" to "how much is this costing us?" — the market has turned.

The Emerging AI FinOps Market

The crisis is creating an entirely new category of enterprise tooling. Pay-i, Helicone, and Keywords AI focus specifically on LLM cost tracking. Jellyfish, Waydev, and Faros AI approach it from the engineering productivity angle — tying token spend to developer output. Datadog and New Relic have both launched AI-specific observability modules that track token consumption alongside traditional infrastructure metrics.

Factory is betting on model routing as the core value proposition — automatically selecting the cheapest model that meets quality thresholds for each request. Martian and OpenRouter offer similar capabilities. The thesis: if 80% of enterprise AI requests can be served by budget-tier models without quality degradation, routing alone can cut bills by 60-80%.

The governance layer is more nascent. Elvex, Liminal, and TrueFoundry offer AI governance frameworks that include cost controls alongside safety and compliance guardrails. Enterprise API gateways from Kong and Apigee are adding model-selection and cost-allocation features.

For any enterprise starting from zero, the priority order is: visibility first (you can't manage what you can't measure), then controls (per-user and per-team limits), then routing (automate model selection), then optimization (prompt engineering, caching, agent architecture). Skip straight to optimization without visibility and you're optimizing blind.

What Happens Next

Gartner projects that 25% of planned 2026 AI budgets will slip into 2027 as proofs of concept stall in procurement pipelines. Over 40% of agentic AI projects will be canceled by end of 2027. Total worldwide AI spend is forecast to hit $2.59 trillion in 2026, up 47% year-over-year (Gartner). AI software spending alone is projected at $453 billion in 2026, growing another 41% to $638 billion in 2027.

The money is flowing. The question is whether it flows into value or into token bills that nobody approved.

Here's what I'm watching:

  1. The Tokenomics Foundation's July launch. If it delivers open billing standards quickly, it could do for AI costs what FinOps did for cloud costs. If it gets bogged down in committee, enterprises are on their own.

  2. Uber's cap experiment. If $1,500/month/tool proves too restrictive and productivity drops, other companies will hesitate to set limits. If it works, it becomes the template.

  3. The agentic multiplier. Every company building multi-agent systems needs to model the cost curve, not the cost point. An agent that costs $1.20 per task today will cost $3.60 when it orchestrates three sub-agents tomorrow. The compounding is multiplicative, not linear.

  4. Model routing becoming table stakes. The companies that treat this as optional will be the next Uber. The ones that build it into their AI platform from day one will save 60-80%.

The flat-fee era of AI is over. The era of AI cost governance has begun. The enterprises that build the governance stack now will be the ones still running AI programs in 2028. The ones that don't will be the cautionary tales in next year's FinOps X keynote.


Continue Reading

Share:

THE DAILY BRIEF

AI coststoken economicsUber AI budgetMicrosoft Claude CodeFinOpsTokenomics Foundationagentic AI costsenterprise AI spendingAI cost governancemodel routingJevons Paradox

Token Prices Fell 98%. Enterprise AI Bills Tripled. Here's Why.

Uber burned through its entire 2026 AI budget by April. Microsoft is canceling Claude Code licenses by June 30. One company ran up a $500 million Claude bill in a single month. Per-token prices have fallen 98% since 2022, but enterprise AI bills have risen 320%. The agentic multiplier — where autonomous AI tools consume 18.6x more tokens per developer — is creating the biggest cost crisis in enterprise technology since cloud computing. This article maps the five-layer Enterprise AI Cost Governance Stack and provides a FinOps Maturity Assessment for organizations navigating the token economy.

By Rajesh Beri·June 9, 2026·13 min read

Uber burned through its entire 2026 AI budget by April. Microsoft is canceling Claude Code licenses across its Windows and Office division by June 30. One unnamed company ran up a $500 million Claude bill in a single month after forgetting to set usage limits.

These aren't edge cases. They're the first tremors of the biggest cost crisis in enterprise technology since the early days of cloud computing. And the paradox at the center of it should terrify every CIO who signed off on an AI budget this year: per-token prices have fallen 98% since 2022, but enterprise AI bills have risen 320%.

The math that was supposed to make AI cheaper just made it more dangerous.

I've spent the past two weeks tracking this crisis across the industry. The data is worse than the headlines suggest. Here's what's actually happening, why the old playbooks won't work, and what the companies that survive this will do differently.

The Consumption Trap: When Cheaper Means More Expensive

Let's start with the number that explains everything else.

In late 2022, GPT-4-equivalent performance cost roughly $20 per million tokens. Today, it costs about $0.40 per million tokens. That's a 98% reduction. By every traditional IT budgeting model, enterprise AI bills should have collapsed.

Instead, the average enterprise AI budget has grown from $1.2 million per year in 2024 to $7 million in 2026 (FinOps Foundation, State of FinOps 2026). And 73% of enterprises report their actual AI costs exceeded even those inflated projections (FinOps X 2026 Keynote).

The culprit isn't the price per token. It's the number of tokens per task.

A standard linear AI workflow in 2023 — summarize this document, answer this question — consumed about $0.04 worth of tokens per interaction. A 2026 agentic system — where AI agents plan, execute multi-step workflows, call tools, and coordinate with other agents — costs roughly $1.20 per interaction. That's a 30x increase per task (The Next Web).

Per-developer token consumption has risen 18.6x in nine months, driven by agentic coding tools released since November 2025: Anthropic's Claude Code, Cursor's agent mode, and similar tools that don't just suggest code — they write, test, debug, and refactor autonomously. Each autonomous loop burns through tokens at a rate that would have been inconceivable under the old chat-and-respond model.

Here's the trap: engineers who consume the most tokens are 2x more productive. But they spend 10x more tokens to get there. The ROI is positive — individually. At enterprise scale, with thousands of engineers, the aggregate bill is catastrophic.

The Body Count: Uber, Microsoft, and the $500 Million Month

Uber: Budget Gone by April

Uber deployed Claude Code and Cursor to approximately 5,000 engineers and did what any aggressive tech company would do — encouraged maximum adoption. Internal leaderboards ranked AI usage competitively. The culture was "use AI as much as possible."

It worked. By April, Uber's CTO Praveen Neppalli Naga told leadership he'd blown through the entire 2026 AI tools budget. Per-engineer API costs were running between $500 and $2,000 per month, with monthly usage rates hitting 84-95% across the engineering organization (TechCrunch).

Uber's response: a $1,500 monthly cap per employee per agentic coding tool. Every engineer now has a usage dashboard. Exceeding the cap requires a formal request.

The cap is a band-aid. It tells you Uber doesn't have a cost governance framework — it has a spending ceiling bolted on after the fact. There's a difference.

Microsoft: Killing Claude Code by June 30

The irony is thick. Microsoft — the company that invested $13 billion in OpenAI and sells Copilot as the future of AI-assisted development — had to cancel Anthropic's Claude Code licenses because its own engineers preferred it. Claude Code had become "perhaps a little too popular" inside Microsoft's Experiences and Devices division, the group responsible for Windows, Microsoft 365, Outlook, Teams, and Surface (The Next Web).

On May 14, engineers across the division received notice: Claude Code licenses expire June 30. Switch to GitHub Copilot CLI.

The official reason is "toolchain unification." The actual reason is fiscal-year-end accounting. When thousands of engineers use a competitor's token-billed product daily, every prompt, code review, and debugging session compounds into a line item that surprises finance teams at exactly the wrong moment.

The $500 Million Month

Then there's the company that makes Uber and Microsoft look disciplined.

An unnamed enterprise — Axios broke the story, and speculation centers on a major tech company, possibly Amazon — ran up a $500 million Claude bill in a single month. The root cause: no usage limits were set on employee licenses. Developers ran extended autonomous coding sessions. AI agents executed chained workflows. Employees used expensive frontier models for tasks that didn't require them — including, reportedly, checking the weather (Tom's Hardware).

Five hundred million dollars. One month. No guardrails.

Chris Reed, Senior Director of IT Finance at Priceline, described the broader situation to TechCrunch with a comparison that's uncomfortable but accurate: "It's like the crack-cocaine epidemic." Priceline's own Cursor contract renewal came back 4-5x more expensive than the original deal (TechCrunch).

Why "Just Set a Budget" Doesn't Work: The Agentic Multiplier

To understand why traditional IT budgeting fails for AI, you need to understand what changed in November 2025.

Before November 2025, most enterprise AI was chat-based. A developer asked a question, got an answer, maybe iterated once or twice. Token consumption was predictable — roughly proportional to the number of employees using the tool.

Then agentic coding tools launched. Anthropic released Claude Code. Cursor shipped agent mode. OpenAI pushed GPT-5.1 with extended autonomous execution. Google released Gemini 3 Pro with multi-step tool use. Suddenly, one developer action — "refactor this module" or "add tests for this service" — could trigger dozens of sequential LLM calls as the agent planned, executed, tested, debugged, and iterated.

The consumption model flipped from linear to exponential. And it hit enterprise budgets like a freight train because most AI budgets were built on 2024 consumption patterns applied to 2026 agentic workloads.

Here's a concrete example of the math. If you have 5,000 engineers (Uber's scale) and each generates an average of $1,000/month in token costs (the midpoint of reported ranges), that's $5 million per month, or $60 million annually — for coding tools alone, before you count any other AI workloads. And that's at 2026 prices. Goldman Sachs projects global token usage will multiply 24x by 2030. If the pattern holds, we're not at the end of the cost crisis. We're at the beginning.

What we're witnessing also has a historical name: the Jevons Paradox. In 1865, economist William Stanley Jevons observed that as coal engines became more efficient, total coal consumption increased rather than decreased. Efficiency made coal useful for more applications, which drove aggregate demand beyond what the efficiency gains saved. Every 10x reduction in per-token cost unlocks a new class of AI use cases — from simple chat to autonomous agents to multi-agent orchestration — and each class consumes orders of magnitude more tokens per task.

The companies that recognized this dynamic early built cost governance into their AI platforms from day one. The companies that didn't are the ones making headlines now.

Framework 1: The Enterprise AI Cost Governance Stack

Based on tracking what's working across the companies that aren't blowing up their budgets, here's the five-layer framework every enterprise needs:

Layer 1: Visibility (Where Is the Money Going?)

Every LLM API call must carry metadata identifying the feature, team, business process, and model it serves. Without this, you're flying blind. Most enterprises that blew their budgets couldn't tell you which team or use case consumed the most tokens until it was too late.

Minimum viable implementation:

  • Tag every API call with team, project, and use case
  • Real-time dashboards per team and per developer
  • Weekly cost reports to engineering leadership

Tools in this space: Pay-i, Datadog AI Observability, New Relic AI Monitoring, Helicone

Layer 2: Controls (How Do We Stop the Bleeding?)

Four controls fix 80% of cost blowouts:

  1. Per-user token limits — Uber's $1,500 cap, but implemented proactively
  2. Per-team monthly budgets — allocated based on use case value, not headcount
  3. Model access policies — not every task needs a frontier model
  4. Automated threshold alerts — at 50%, 80%, and 100% of budget

Critical insight: Controls without routing are just rationing. You need Layer 3.

Layer 3: Intelligent Routing (Right Model for the Right Task)

This is where the 60-80% cost reduction lives. Most enterprise AI tasks — summarization, classification, simple Q&A — can run on budget-tier models at $0.10-$1 per million tokens without meaningful quality loss. Frontier models at $15-$30+ per million tokens should be reserved for complex reasoning, multi-step agents, and tasks where quality directly impacts revenue.

The "token maxing" problem: Without routing governance, developers default to the most capable model for everything. This is the organizational equivalent of taking a Ferrari to the grocery store.

Tools in this space: Factory (model routing), Martian, OpenRouter, enterprise API gateways with model selection logic

Layer 4: Optimization (Squeeze More Value Per Token)

  • Prompt engineering: Structured prompts consume 30-50% fewer tokens than conversational ones
  • Context management: Pass summarized context, not raw conversation history, across multi-turn workflows
  • Caching: Identical or near-identical requests should hit a cache, not a model
  • Agent architecture: Limit autonomous loop depth. Set step limits and budget thresholds per agent workflow

Layer 5: Governance (The Ongoing Discipline)

  • Cost-per-output metrics: Not cost-per-token, but cost-per-resolved-ticket, cost-per-accepted-code-suggestion, cost-per-summarized-document
  • Quarterly model reviews: As new, cheaper models launch (and they launch monthly now), evaluate whether workloads can shift down
  • FinOps integration: AI cost management is not a separate discipline. It's the next chapter of FinOps

Framework 2: The AI FinOps Maturity Assessment

Score your organization 1-5 on each dimension. If your total is below 15, you're in the blast radius.

Dimension Level 1 (Crisis) Level 3 (Managed) Level 5 (Optimized)
Visibility No per-team token tracking Per-team dashboards, weekly reports Real-time per-request attribution with business context
Budget Controls No limits set (the $500M scenario) Per-user caps, manual approval for overages Dynamic budgets tied to business value metrics
Model Routing Everyone uses frontier models Manual model selection guidelines Automated routing engine with quality/cost optimization
Cost Attribution "AI spend" is one line item Cost allocated to teams Cost-per-business-outcome tracked and optimized
Governance Reactive — fix after blowout Quarterly reviews, dedicated AI FinOps role Continuous optimization, automated rightsizing, Tokenomics standards

Scoring:

  • 5-10: Emergency mode. You are one autonomous coding sprint away from a budget crisis. Implement Layers 1-2 this week.
  • 11-15: Managed but fragile. You'll survive 2026 but you're leaving 50%+ cost savings on the table. Prioritize Layer 3 (routing).
  • 16-20: Competitive. You have governance. Focus on cost-per-outcome metrics and preparing for the agentic multiplier.
  • 21-25: Leading. You're ready for the 24x token growth Goldman Sachs projects. Share your playbook — the industry needs it.

The Industry Response: The Tokenomics Foundation

The scale of this crisis has prompted institutional action. At FinOps X 2026, the FinOps Foundation and Linux Foundation announced the formation of the Tokenomics Foundation, a new open standards body with a formal launch planned for July 2026.

The founding supporters read like a who's who of enterprise tech: Oracle, Google, Microsoft, Accenture, Booking.com, Flexera, IBM, JPMorgan Chase, KPMG, Nebius, Salesforce, SAP, and ServiceNow.

The Foundation's planned deliverables include:

  • Canonical tokenomics definitions — standardizing how tokens are measured, reported, and compared across providers
  • Open billing standards — so enterprises can compare apples to apples across OpenAI, Anthropic, Google, and open-source models
  • New metrics: cost-per-intelligence and tokens-per-watt — measuring value and efficiency, not just consumption

As J.R. Storment of the FinOps Foundation described the shift: the conversation has moved from "go fast" to "we need guardrails."

And the signal that matters most came from OpenAI itself. Alexander Embiricos, OpenAI's Head of Enterprise, told TechCrunch: "Our conversations are never about capability anymore. Now the conversations are about spending, visibility, auditability, token controls" (TechCrunch).

When the provider tells you the conversation has shifted from "what can AI do?" to "how much is this costing us?" — the market has turned.

The Emerging AI FinOps Market

The crisis is creating an entirely new category of enterprise tooling. Pay-i, Helicone, and Keywords AI focus specifically on LLM cost tracking. Jellyfish, Waydev, and Faros AI approach it from the engineering productivity angle — tying token spend to developer output. Datadog and New Relic have both launched AI-specific observability modules that track token consumption alongside traditional infrastructure metrics.

Factory is betting on model routing as the core value proposition — automatically selecting the cheapest model that meets quality thresholds for each request. Martian and OpenRouter offer similar capabilities. The thesis: if 80% of enterprise AI requests can be served by budget-tier models without quality degradation, routing alone can cut bills by 60-80%.

The governance layer is more nascent. Elvex, Liminal, and TrueFoundry offer AI governance frameworks that include cost controls alongside safety and compliance guardrails. Enterprise API gateways from Kong and Apigee are adding model-selection and cost-allocation features.

For any enterprise starting from zero, the priority order is: visibility first (you can't manage what you can't measure), then controls (per-user and per-team limits), then routing (automate model selection), then optimization (prompt engineering, caching, agent architecture). Skip straight to optimization without visibility and you're optimizing blind.

What Happens Next

Gartner projects that 25% of planned 2026 AI budgets will slip into 2027 as proofs of concept stall in procurement pipelines. Over 40% of agentic AI projects will be canceled by end of 2027. Total worldwide AI spend is forecast to hit $2.59 trillion in 2026, up 47% year-over-year (Gartner). AI software spending alone is projected at $453 billion in 2026, growing another 41% to $638 billion in 2027.

The money is flowing. The question is whether it flows into value or into token bills that nobody approved.

Here's what I'm watching:

  1. The Tokenomics Foundation's July launch. If it delivers open billing standards quickly, it could do for AI costs what FinOps did for cloud costs. If it gets bogged down in committee, enterprises are on their own.

  2. Uber's cap experiment. If $1,500/month/tool proves too restrictive and productivity drops, other companies will hesitate to set limits. If it works, it becomes the template.

  3. The agentic multiplier. Every company building multi-agent systems needs to model the cost curve, not the cost point. An agent that costs $1.20 per task today will cost $3.60 when it orchestrates three sub-agents tomorrow. The compounding is multiplicative, not linear.

  4. Model routing becoming table stakes. The companies that treat this as optional will be the next Uber. The ones that build it into their AI platform from day one will save 60-80%.

The flat-fee era of AI is over. The era of AI cost governance has begun. The enterprises that build the governance stack now will be the ones still running AI programs in 2028. The ones that don't will be the cautionary tales in next year's FinOps X keynote.


Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Subscribe