Uber Blew Its 2026 AI Budget in 4 Months—What Went Wrong

Uber burned its entire 2026 AI budget in 4 months as Claude Code adoption hit 84%. Monthly costs jumped to $2K per engineer. Here's why budgets are breaking.

By Rajesh Beri·May 22, 2026·10 min read
Share:

THE DAILY BRIEF

Enterprise AIAI CostsToken PricingClaude CodeBudget Management

Uber Blew Its 2026 AI Budget in 4 Months—What Went Wrong

Uber burned its entire 2026 AI budget in 4 months as Claude Code adoption hit 84%. Monthly costs jumped to $2K per engineer. Here's why budgets are breaking.

By Rajesh Beri·May 22, 2026·10 min read

Uber Technologies CTO Praveen Neppalli Naga confirmed to The Information that the company burned through its entire 2026 AI budget in four months. The culprit: Claude Code adoption skyrocketed from 32% to 84% of its 5,000-engineer organization, with monthly API costs per engineer ranging from $500 to $2,000. The company is now "back to the drawing board" on budgeting.

This isn't an isolated incident. It's a pattern that's forcing CIOs, CTOs, and CFOs to rethink how they plan for enterprise AI spending—and whether the current token-based pricing model is sustainable at scale.

The Numbers Behind the Crisis

Here's what happened at Uber: when Claude Code adoption was at 32%, the costs were manageable. Budget projections assumed incremental growth. But when adoption jumped to 84%—meaning nearly every engineer was using AI-assisted coding daily—the token consumption curve went vertical.

Monthly costs per engineer hit $500 to $2,000 depending on usage patterns. Multiply that by 5,000 engineers, and you're looking at $2.5 million to $10 million per month in API costs alone. Over four months, that's $10 million to $40 million—far exceeding what any enterprise budgeted for "AI experimentation" in early 2026.

The technical reason is simple: agentic AI workflows consume far more tokens than autocomplete suggestions. An autocomplete suggestion might use 50-200 tokens. An agentic workflow that analyzes an entire codebase, suggests refactoring, writes tests, and documents changes can consume 50,000-200,000 tokens per session. That's a 100-1,000x difference in token consumption per interaction.

And when 84% of your engineering org is running agentic workflows multiple times per day, the math breaks fast.

Microsoft Is Cutting Back Too

Uber isn't alone. Microsoft's Experiences and Devices division—which covers Windows, Microsoft 365, Outlook, Teams, and Surface—is winding down most Claude Code usage by June 30, 2026, according to The Verge. The timing aligns with the end of Microsoft's fiscal year, and financial considerations influenced the decision.

The primary driver reported was platform consolidation toward GitHub Copilot CLI. But token costs created a forcing function for vendor consolidation that financial incentives alone might not have triggered as quickly. In other words: when the AI bill gets big enough, even Microsoft starts consolidating vendors.

This is significant. Microsoft owns GitHub, which owns Copilot. If even Microsoft is consolidating AI tools to manage costs, what does that mean for enterprises without their own AI infrastructure?

The GitHub Copilot Pricing Shock

Here's where it gets worse: GitHub announced a fundamental shift for its Copilot AI coding assistant, moving from flat-rate subscriptions to usage-based billing starting June 1, 2026. The change replaces premium request units with GitHub AI Credits tied to token consumption.

One developer reported their projected monthly cost rising from roughly €67 in April to around €966 under the new model. That's a 14x increase—not because they're using Copilot more, but because the pricing model changed from flat-rate to per-token.

This removes predictability from enterprise budgets at exactly the moment those budgets are already under pressure. CFOs hate unpredictable costs. When you can't forecast your AI spending within a 10% margin, you can't budget for it. And when you can't budget for it, you cut it.

Why Enterprise Budgets Can't Keep Up

The cost structure of frontier AI models explains why enterprise customers are running out of runway. Tokens are the unit of computation an AI model processes. Every prompt, every response, and every long-context codebase analysis consumes them.

According to Anthropic's official documentation, Claude Code costs an average of $6 per developer per day, with daily costs remaining below $12 for 90% of users. That average obscures the tail risk. By March 2026, 84% of Uber's developers were classified as agentic coding users, delegating entire workflows to AI rather than just accepting autocomplete suggestions.

Agentic workflows consume far more tokens per session than single-turn completions. The unit economics that looked reasonable at the pilot stage stop working at the adoption stage.

The infrastructure cost driving token prices is no mystery. On-demand pricing for the NVIDIA H100 GPU ranges from $1.49 per hour on specialized providers to $6.98 per hour on Microsoft Azure. AI labs must run thousands of these GPUs simultaneously to serve enterprise customers at scale. Those costs flow directly into API token pricing.

The Pricing War Nobody's Winning

Here's the competitive landscape as of May 2026:

Anthropic pricing:

  • Claude Opus 4.7 (flagship): $5 per million input tokens / $25 per million output tokens
  • Claude Sonnet 4.6 (balanced): $3 / $15
  • Claude Haiku 4.5 (budget): $1 / $5

OpenAI pricing:

  • GPT-5.5 (flagship): $5 / $30
  • GPT-5.4 (previous flagship): $2.50 / $15
  • GPT-5.4 Nano (budget): $0.20 / $1.25

Google pricing:

  • Gemini 3.5 Flash: $0.30 / $2.50

At Google I/O 2026, CEO Sundar Pichai said "many companies are already blowing through their annual token budgets, and it's only May," and pitched Gemini 3.5 Flash as the answer. If the largest Google Cloud customers shifted 80% of their workloads from frontier models to Gemini 3.5 Flash, Pichai said, they would save more than $1 billion a year.

The problem: Google can offer cheaper pricing because they build their own Tensor Processing Units (TPUs), reducing dependence on third-party GPU pricing. OpenAI and Anthropic can't easily replicate that cost advantage. They're locked into NVIDIA's pricing, which means they're locked into higher per-token costs.

And even with Google's savings claim, enterprises are still hitting budget ceilings. The issue isn't just pricing—it's consumption. When agentic AI becomes the default workflow, token consumption grows exponentially.

The Real Competitive Threat: Chinese Models

The cost gap between American and Chinese models is wide and getting wider. AI benchmarking firm Artificial Analysis runs every major model through the same 10 evaluations and tracks the total cost. For each lab's most capable model:

  • Anthropic's Claude: $4,811
  • OpenAI's ChatGPT: $3,357
  • DeepSeek: $1,071
  • Kimi: $948
  • Zhipu's GLM: $544

Claude is nearly nine times more expensive than the cheapest Chinese alternative for the same workload.

Some 45% of companies surveyed by cloud cost firm CloudZero said they spent more than $100,000 a month on AI in 2025, up from 20% the year before. Where that money goes increasingly matters.

And the cheap alternatives are no longer a step behind. DeepSeek, the Chinese AI lab whose model triggered a U.S. tech selloff last year, released a preview of its next-generation model last month that matches or nearly matches the latest from OpenAI, Anthropic, and Google on coding, agentic, and knowledge benchmarks.

On OpenRouter, a marketplace that lets developers access hundreds of AI models through a single interface, Chinese models went from about 1% of usage in 2024 to more than 60% in May.

The American labs' best defense is trust. Banks, defense agencies, and regulated industries won't touch Chinese models regardless of price. But outside of regulated industries, where security and compliance rules are looser, the case for paying a premium gets harder to make.

The Enterprise Response: Advisor Models

The technique enterprises are deploying to manage costs is called an "advisor model." A cheap open-source model handles the bulk of the work as the default. When it hits a task it can't solve, it's given a tool that lets it call out to a frontier model from OpenAI or Anthropic for help.

Databricks CEO Ali Ghodsi, whose company's AI gateway sits between thousands of enterprise customers and the models they're using, said revenue from that product is climbing sharply. "You can curb costs really well this way," Ghodsi said.

This is the emerging playbook:

  1. Route 70-80% of requests to cheap models (Gemini Flash, open-source, or Chinese models for non-regulated workloads)
  2. Reserve frontier models (Claude Opus, GPT-5.5) for complex reasoning or high-stakes decisions
  3. Use caching aggressively (both OpenAI and Anthropic now offer ~90% off cached input)
  4. Deploy vendor consolidation to reduce API sprawl

Figma CEO Dylan Field said companies are moving through three phases of AI adoption: first, nobody uses it; second, everyone has to, with some "literally holding competitions of who can spend the most with tokens." And third is the realization that "everyone's spending too much" and has to cut back. Many enterprises are now entering that third phase.

What This Means for CIOs and CFOs

If you're planning AI budgets for the rest of 2026, here's what the data says:

For CIOs:

  1. Model tiering is mandatory. You can't afford to run everything on Claude Opus or GPT-5.5. Build routing logic that sends routine tasks to cheap models and reserves frontier models for complex work.

  2. Agentic adoption changes the math. If you budgeted for autocomplete-style AI, and your teams are now using agentic workflows, your costs will be 10-100x higher than projected. Measure actual token consumption per developer per day, not per session.

  3. Vendor lock-in is a budget risk. GitHub's shift from flat-rate to usage-based pricing is a warning. Any vendor can change their pricing model mid-year. Build abstraction layers so you can switch providers without rewriting code.

For CFOs:

  1. Budget for unpredictability. Token-based pricing means your AI costs are usage-based, not fixed. Add 50-100% buffer to your AI budget line items for the rest of 2026.

  2. Track cost per outcome, not cost per token. If AI coding tools reduce development time by 30%, that's a cost savings even if the AI bill is $10 million. But you need to measure the productivity gains, not just the token costs.

  3. Watch the Q2 numbers. Anthropic is projecting $10.9 billion in Q2 2026, up from $4.8 billion in Q1. OpenAI hasn't disclosed Q2 projections. If both companies hit those numbers, it means enterprises are still paying despite the budget strain. If they miss, it means enterprises are cutting back faster than expected.

The Long Game: Infrastructure Costs Are Falling, But Not Fast Enough

NVIDIA's Rubin platform targets a 10x reduction in inference token costs compared to its Blackwell architecture. According to Ramp's enterprise spending data, the average cost per million tokens across major providers fell from roughly $10 to $2.50 in a single year.

That long-term trend is real. But it doesn't solve the near-term problem. In practice, falling unit prices tell only half the story. The way organizations consume AI has changed so dramatically that cheaper per-token costs are offset by dramatically higher usage volume.

Enterprises that planned budgets around 2024 token rates are finding that agentic AI workflows at 2026 adoption levels consume multiples of what the spreadsheet projected.

Uber's engineers didn't stop wanting to use Claude Code. They ran out of money to pay for it. That's the crisis in a sentence.

Bottom Line

The enterprise AI budget crisis of 2026 is a consumption problem disguised as a pricing problem. Token costs are falling, but usage is rising faster. Agentic workflows consume 10-100x more tokens than autocomplete. And when 84% of your engineering org adopts AI-assisted coding, the budget you planned in January is gone by April.

The solutions are emerging: model tiering, advisor architectures, vendor consolidation, aggressive caching. But the fundamental tension remains: enterprises want AI productivity gains, but they can't afford to pay frontier-model prices for every task.

Google's $1 billion savings claim is credible—if you can shift 80% of workloads to cheaper models. The question is whether that shift compromises quality enough to erase the productivity gains.

For now, the smartest move is to measure everything: token consumption per developer, cost per task, productivity gains per dollar spent. Because the only thing worse than blowing your AI budget in four months is not knowing why.

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Uber Blew Its 2026 AI Budget in 4 Months—What Went Wrong

Photo by Fauxels on Pexels

Uber Technologies CTO Praveen Neppalli Naga confirmed to The Information that the company burned through its entire 2026 AI budget in four months. The culprit: Claude Code adoption skyrocketed from 32% to 84% of its 5,000-engineer organization, with monthly API costs per engineer ranging from $500 to $2,000. The company is now "back to the drawing board" on budgeting.

This isn't an isolated incident. It's a pattern that's forcing CIOs, CTOs, and CFOs to rethink how they plan for enterprise AI spending—and whether the current token-based pricing model is sustainable at scale.

The Numbers Behind the Crisis

Here's what happened at Uber: when Claude Code adoption was at 32%, the costs were manageable. Budget projections assumed incremental growth. But when adoption jumped to 84%—meaning nearly every engineer was using AI-assisted coding daily—the token consumption curve went vertical.

Monthly costs per engineer hit $500 to $2,000 depending on usage patterns. Multiply that by 5,000 engineers, and you're looking at $2.5 million to $10 million per month in API costs alone. Over four months, that's $10 million to $40 million—far exceeding what any enterprise budgeted for "AI experimentation" in early 2026.

The technical reason is simple: agentic AI workflows consume far more tokens than autocomplete suggestions. An autocomplete suggestion might use 50-200 tokens. An agentic workflow that analyzes an entire codebase, suggests refactoring, writes tests, and documents changes can consume 50,000-200,000 tokens per session. That's a 100-1,000x difference in token consumption per interaction.

And when 84% of your engineering org is running agentic workflows multiple times per day, the math breaks fast.

Microsoft Is Cutting Back Too

Uber isn't alone. Microsoft's Experiences and Devices division—which covers Windows, Microsoft 365, Outlook, Teams, and Surface—is winding down most Claude Code usage by June 30, 2026, according to The Verge. The timing aligns with the end of Microsoft's fiscal year, and financial considerations influenced the decision.

The primary driver reported was platform consolidation toward GitHub Copilot CLI. But token costs created a forcing function for vendor consolidation that financial incentives alone might not have triggered as quickly. In other words: when the AI bill gets big enough, even Microsoft starts consolidating vendors.

This is significant. Microsoft owns GitHub, which owns Copilot. If even Microsoft is consolidating AI tools to manage costs, what does that mean for enterprises without their own AI infrastructure?

The GitHub Copilot Pricing Shock

Here's where it gets worse: GitHub announced a fundamental shift for its Copilot AI coding assistant, moving from flat-rate subscriptions to usage-based billing starting June 1, 2026. The change replaces premium request units with GitHub AI Credits tied to token consumption.

One developer reported their projected monthly cost rising from roughly €67 in April to around €966 under the new model. That's a 14x increase—not because they're using Copilot more, but because the pricing model changed from flat-rate to per-token.

This removes predictability from enterprise budgets at exactly the moment those budgets are already under pressure. CFOs hate unpredictable costs. When you can't forecast your AI spending within a 10% margin, you can't budget for it. And when you can't budget for it, you cut it.

Why Enterprise Budgets Can't Keep Up

The cost structure of frontier AI models explains why enterprise customers are running out of runway. Tokens are the unit of computation an AI model processes. Every prompt, every response, and every long-context codebase analysis consumes them.

According to Anthropic's official documentation, Claude Code costs an average of $6 per developer per day, with daily costs remaining below $12 for 90% of users. That average obscures the tail risk. By March 2026, 84% of Uber's developers were classified as agentic coding users, delegating entire workflows to AI rather than just accepting autocomplete suggestions.

Agentic workflows consume far more tokens per session than single-turn completions. The unit economics that looked reasonable at the pilot stage stop working at the adoption stage.

The infrastructure cost driving token prices is no mystery. On-demand pricing for the NVIDIA H100 GPU ranges from $1.49 per hour on specialized providers to $6.98 per hour on Microsoft Azure. AI labs must run thousands of these GPUs simultaneously to serve enterprise customers at scale. Those costs flow directly into API token pricing.

The Pricing War Nobody's Winning

Here's the competitive landscape as of May 2026:

Anthropic pricing:

  • Claude Opus 4.7 (flagship): $5 per million input tokens / $25 per million output tokens
  • Claude Sonnet 4.6 (balanced): $3 / $15
  • Claude Haiku 4.5 (budget): $1 / $5

OpenAI pricing:

  • GPT-5.5 (flagship): $5 / $30
  • GPT-5.4 (previous flagship): $2.50 / $15
  • GPT-5.4 Nano (budget): $0.20 / $1.25

Google pricing:

  • Gemini 3.5 Flash: $0.30 / $2.50

At Google I/O 2026, CEO Sundar Pichai said "many companies are already blowing through their annual token budgets, and it's only May," and pitched Gemini 3.5 Flash as the answer. If the largest Google Cloud customers shifted 80% of their workloads from frontier models to Gemini 3.5 Flash, Pichai said, they would save more than $1 billion a year.

The problem: Google can offer cheaper pricing because they build their own Tensor Processing Units (TPUs), reducing dependence on third-party GPU pricing. OpenAI and Anthropic can't easily replicate that cost advantage. They're locked into NVIDIA's pricing, which means they're locked into higher per-token costs.

And even with Google's savings claim, enterprises are still hitting budget ceilings. The issue isn't just pricing—it's consumption. When agentic AI becomes the default workflow, token consumption grows exponentially.

The Real Competitive Threat: Chinese Models

The cost gap between American and Chinese models is wide and getting wider. AI benchmarking firm Artificial Analysis runs every major model through the same 10 evaluations and tracks the total cost. For each lab's most capable model:

  • Anthropic's Claude: $4,811
  • OpenAI's ChatGPT: $3,357
  • DeepSeek: $1,071
  • Kimi: $948
  • Zhipu's GLM: $544

Claude is nearly nine times more expensive than the cheapest Chinese alternative for the same workload.

Some 45% of companies surveyed by cloud cost firm CloudZero said they spent more than $100,000 a month on AI in 2025, up from 20% the year before. Where that money goes increasingly matters.

And the cheap alternatives are no longer a step behind. DeepSeek, the Chinese AI lab whose model triggered a U.S. tech selloff last year, released a preview of its next-generation model last month that matches or nearly matches the latest from OpenAI, Anthropic, and Google on coding, agentic, and knowledge benchmarks.

On OpenRouter, a marketplace that lets developers access hundreds of AI models through a single interface, Chinese models went from about 1% of usage in 2024 to more than 60% in May.

The American labs' best defense is trust. Banks, defense agencies, and regulated industries won't touch Chinese models regardless of price. But outside of regulated industries, where security and compliance rules are looser, the case for paying a premium gets harder to make.

The Enterprise Response: Advisor Models

The technique enterprises are deploying to manage costs is called an "advisor model." A cheap open-source model handles the bulk of the work as the default. When it hits a task it can't solve, it's given a tool that lets it call out to a frontier model from OpenAI or Anthropic for help.

Databricks CEO Ali Ghodsi, whose company's AI gateway sits between thousands of enterprise customers and the models they're using, said revenue from that product is climbing sharply. "You can curb costs really well this way," Ghodsi said.

This is the emerging playbook:

  1. Route 70-80% of requests to cheap models (Gemini Flash, open-source, or Chinese models for non-regulated workloads)
  2. Reserve frontier models (Claude Opus, GPT-5.5) for complex reasoning or high-stakes decisions
  3. Use caching aggressively (both OpenAI and Anthropic now offer ~90% off cached input)
  4. Deploy vendor consolidation to reduce API sprawl

Figma CEO Dylan Field said companies are moving through three phases of AI adoption: first, nobody uses it; second, everyone has to, with some "literally holding competitions of who can spend the most with tokens." And third is the realization that "everyone's spending too much" and has to cut back. Many enterprises are now entering that third phase.

What This Means for CIOs and CFOs

If you're planning AI budgets for the rest of 2026, here's what the data says:

For CIOs:

  1. Model tiering is mandatory. You can't afford to run everything on Claude Opus or GPT-5.5. Build routing logic that sends routine tasks to cheap models and reserves frontier models for complex work.

  2. Agentic adoption changes the math. If you budgeted for autocomplete-style AI, and your teams are now using agentic workflows, your costs will be 10-100x higher than projected. Measure actual token consumption per developer per day, not per session.

  3. Vendor lock-in is a budget risk. GitHub's shift from flat-rate to usage-based pricing is a warning. Any vendor can change their pricing model mid-year. Build abstraction layers so you can switch providers without rewriting code.

For CFOs:

  1. Budget for unpredictability. Token-based pricing means your AI costs are usage-based, not fixed. Add 50-100% buffer to your AI budget line items for the rest of 2026.

  2. Track cost per outcome, not cost per token. If AI coding tools reduce development time by 30%, that's a cost savings even if the AI bill is $10 million. But you need to measure the productivity gains, not just the token costs.

  3. Watch the Q2 numbers. Anthropic is projecting $10.9 billion in Q2 2026, up from $4.8 billion in Q1. OpenAI hasn't disclosed Q2 projections. If both companies hit those numbers, it means enterprises are still paying despite the budget strain. If they miss, it means enterprises are cutting back faster than expected.

The Long Game: Infrastructure Costs Are Falling, But Not Fast Enough

NVIDIA's Rubin platform targets a 10x reduction in inference token costs compared to its Blackwell architecture. According to Ramp's enterprise spending data, the average cost per million tokens across major providers fell from roughly $10 to $2.50 in a single year.

That long-term trend is real. But it doesn't solve the near-term problem. In practice, falling unit prices tell only half the story. The way organizations consume AI has changed so dramatically that cheaper per-token costs are offset by dramatically higher usage volume.

Enterprises that planned budgets around 2024 token rates are finding that agentic AI workflows at 2026 adoption levels consume multiples of what the spreadsheet projected.

Uber's engineers didn't stop wanting to use Claude Code. They ran out of money to pay for it. That's the crisis in a sentence.

Bottom Line

The enterprise AI budget crisis of 2026 is a consumption problem disguised as a pricing problem. Token costs are falling, but usage is rising faster. Agentic workflows consume 10-100x more tokens than autocomplete. And when 84% of your engineering org adopts AI-assisted coding, the budget you planned in January is gone by April.

The solutions are emerging: model tiering, advisor architectures, vendor consolidation, aggressive caching. But the fundamental tension remains: enterprises want AI productivity gains, but they can't afford to pay frontier-model prices for every task.

Google's $1 billion savings claim is credible—if you can shift 80% of workloads to cheaper models. The question is whether that shift compromises quality enough to erase the productivity gains.

For now, the smartest move is to measure everything: token consumption per developer, cost per task, productivity gains per dollar spent. Because the only thing worse than blowing your AI budget in four months is not knowing why.

Share:

THE DAILY BRIEF

Enterprise AIAI CostsToken PricingClaude CodeBudget Management

Uber Blew Its 2026 AI Budget in 4 Months—What Went Wrong

Uber burned its entire 2026 AI budget in 4 months as Claude Code adoption hit 84%. Monthly costs jumped to $2K per engineer. Here's why budgets are breaking.

By Rajesh Beri·May 22, 2026·10 min read

Uber Technologies CTO Praveen Neppalli Naga confirmed to The Information that the company burned through its entire 2026 AI budget in four months. The culprit: Claude Code adoption skyrocketed from 32% to 84% of its 5,000-engineer organization, with monthly API costs per engineer ranging from $500 to $2,000. The company is now "back to the drawing board" on budgeting.

This isn't an isolated incident. It's a pattern that's forcing CIOs, CTOs, and CFOs to rethink how they plan for enterprise AI spending—and whether the current token-based pricing model is sustainable at scale.

The Numbers Behind the Crisis

Here's what happened at Uber: when Claude Code adoption was at 32%, the costs were manageable. Budget projections assumed incremental growth. But when adoption jumped to 84%—meaning nearly every engineer was using AI-assisted coding daily—the token consumption curve went vertical.

Monthly costs per engineer hit $500 to $2,000 depending on usage patterns. Multiply that by 5,000 engineers, and you're looking at $2.5 million to $10 million per month in API costs alone. Over four months, that's $10 million to $40 million—far exceeding what any enterprise budgeted for "AI experimentation" in early 2026.

The technical reason is simple: agentic AI workflows consume far more tokens than autocomplete suggestions. An autocomplete suggestion might use 50-200 tokens. An agentic workflow that analyzes an entire codebase, suggests refactoring, writes tests, and documents changes can consume 50,000-200,000 tokens per session. That's a 100-1,000x difference in token consumption per interaction.

And when 84% of your engineering org is running agentic workflows multiple times per day, the math breaks fast.

Microsoft Is Cutting Back Too

Uber isn't alone. Microsoft's Experiences and Devices division—which covers Windows, Microsoft 365, Outlook, Teams, and Surface—is winding down most Claude Code usage by June 30, 2026, according to The Verge. The timing aligns with the end of Microsoft's fiscal year, and financial considerations influenced the decision.

The primary driver reported was platform consolidation toward GitHub Copilot CLI. But token costs created a forcing function for vendor consolidation that financial incentives alone might not have triggered as quickly. In other words: when the AI bill gets big enough, even Microsoft starts consolidating vendors.

This is significant. Microsoft owns GitHub, which owns Copilot. If even Microsoft is consolidating AI tools to manage costs, what does that mean for enterprises without their own AI infrastructure?

The GitHub Copilot Pricing Shock

Here's where it gets worse: GitHub announced a fundamental shift for its Copilot AI coding assistant, moving from flat-rate subscriptions to usage-based billing starting June 1, 2026. The change replaces premium request units with GitHub AI Credits tied to token consumption.

One developer reported their projected monthly cost rising from roughly €67 in April to around €966 under the new model. That's a 14x increase—not because they're using Copilot more, but because the pricing model changed from flat-rate to per-token.

This removes predictability from enterprise budgets at exactly the moment those budgets are already under pressure. CFOs hate unpredictable costs. When you can't forecast your AI spending within a 10% margin, you can't budget for it. And when you can't budget for it, you cut it.

Why Enterprise Budgets Can't Keep Up

The cost structure of frontier AI models explains why enterprise customers are running out of runway. Tokens are the unit of computation an AI model processes. Every prompt, every response, and every long-context codebase analysis consumes them.

According to Anthropic's official documentation, Claude Code costs an average of $6 per developer per day, with daily costs remaining below $12 for 90% of users. That average obscures the tail risk. By March 2026, 84% of Uber's developers were classified as agentic coding users, delegating entire workflows to AI rather than just accepting autocomplete suggestions.

Agentic workflows consume far more tokens per session than single-turn completions. The unit economics that looked reasonable at the pilot stage stop working at the adoption stage.

The infrastructure cost driving token prices is no mystery. On-demand pricing for the NVIDIA H100 GPU ranges from $1.49 per hour on specialized providers to $6.98 per hour on Microsoft Azure. AI labs must run thousands of these GPUs simultaneously to serve enterprise customers at scale. Those costs flow directly into API token pricing.

The Pricing War Nobody's Winning

Here's the competitive landscape as of May 2026:

Anthropic pricing:

  • Claude Opus 4.7 (flagship): $5 per million input tokens / $25 per million output tokens
  • Claude Sonnet 4.6 (balanced): $3 / $15
  • Claude Haiku 4.5 (budget): $1 / $5

OpenAI pricing:

  • GPT-5.5 (flagship): $5 / $30
  • GPT-5.4 (previous flagship): $2.50 / $15
  • GPT-5.4 Nano (budget): $0.20 / $1.25

Google pricing:

  • Gemini 3.5 Flash: $0.30 / $2.50

At Google I/O 2026, CEO Sundar Pichai said "many companies are already blowing through their annual token budgets, and it's only May," and pitched Gemini 3.5 Flash as the answer. If the largest Google Cloud customers shifted 80% of their workloads from frontier models to Gemini 3.5 Flash, Pichai said, they would save more than $1 billion a year.

The problem: Google can offer cheaper pricing because they build their own Tensor Processing Units (TPUs), reducing dependence on third-party GPU pricing. OpenAI and Anthropic can't easily replicate that cost advantage. They're locked into NVIDIA's pricing, which means they're locked into higher per-token costs.

And even with Google's savings claim, enterprises are still hitting budget ceilings. The issue isn't just pricing—it's consumption. When agentic AI becomes the default workflow, token consumption grows exponentially.

The Real Competitive Threat: Chinese Models

The cost gap between American and Chinese models is wide and getting wider. AI benchmarking firm Artificial Analysis runs every major model through the same 10 evaluations and tracks the total cost. For each lab's most capable model:

  • Anthropic's Claude: $4,811
  • OpenAI's ChatGPT: $3,357
  • DeepSeek: $1,071
  • Kimi: $948
  • Zhipu's GLM: $544

Claude is nearly nine times more expensive than the cheapest Chinese alternative for the same workload.

Some 45% of companies surveyed by cloud cost firm CloudZero said they spent more than $100,000 a month on AI in 2025, up from 20% the year before. Where that money goes increasingly matters.

And the cheap alternatives are no longer a step behind. DeepSeek, the Chinese AI lab whose model triggered a U.S. tech selloff last year, released a preview of its next-generation model last month that matches or nearly matches the latest from OpenAI, Anthropic, and Google on coding, agentic, and knowledge benchmarks.

On OpenRouter, a marketplace that lets developers access hundreds of AI models through a single interface, Chinese models went from about 1% of usage in 2024 to more than 60% in May.

The American labs' best defense is trust. Banks, defense agencies, and regulated industries won't touch Chinese models regardless of price. But outside of regulated industries, where security and compliance rules are looser, the case for paying a premium gets harder to make.

The Enterprise Response: Advisor Models

The technique enterprises are deploying to manage costs is called an "advisor model." A cheap open-source model handles the bulk of the work as the default. When it hits a task it can't solve, it's given a tool that lets it call out to a frontier model from OpenAI or Anthropic for help.

Databricks CEO Ali Ghodsi, whose company's AI gateway sits between thousands of enterprise customers and the models they're using, said revenue from that product is climbing sharply. "You can curb costs really well this way," Ghodsi said.

This is the emerging playbook:

  1. Route 70-80% of requests to cheap models (Gemini Flash, open-source, or Chinese models for non-regulated workloads)
  2. Reserve frontier models (Claude Opus, GPT-5.5) for complex reasoning or high-stakes decisions
  3. Use caching aggressively (both OpenAI and Anthropic now offer ~90% off cached input)
  4. Deploy vendor consolidation to reduce API sprawl

Figma CEO Dylan Field said companies are moving through three phases of AI adoption: first, nobody uses it; second, everyone has to, with some "literally holding competitions of who can spend the most with tokens." And third is the realization that "everyone's spending too much" and has to cut back. Many enterprises are now entering that third phase.

What This Means for CIOs and CFOs

If you're planning AI budgets for the rest of 2026, here's what the data says:

For CIOs:

  1. Model tiering is mandatory. You can't afford to run everything on Claude Opus or GPT-5.5. Build routing logic that sends routine tasks to cheap models and reserves frontier models for complex work.

  2. Agentic adoption changes the math. If you budgeted for autocomplete-style AI, and your teams are now using agentic workflows, your costs will be 10-100x higher than projected. Measure actual token consumption per developer per day, not per session.

  3. Vendor lock-in is a budget risk. GitHub's shift from flat-rate to usage-based pricing is a warning. Any vendor can change their pricing model mid-year. Build abstraction layers so you can switch providers without rewriting code.

For CFOs:

  1. Budget for unpredictability. Token-based pricing means your AI costs are usage-based, not fixed. Add 50-100% buffer to your AI budget line items for the rest of 2026.

  2. Track cost per outcome, not cost per token. If AI coding tools reduce development time by 30%, that's a cost savings even if the AI bill is $10 million. But you need to measure the productivity gains, not just the token costs.

  3. Watch the Q2 numbers. Anthropic is projecting $10.9 billion in Q2 2026, up from $4.8 billion in Q1. OpenAI hasn't disclosed Q2 projections. If both companies hit those numbers, it means enterprises are still paying despite the budget strain. If they miss, it means enterprises are cutting back faster than expected.

The Long Game: Infrastructure Costs Are Falling, But Not Fast Enough

NVIDIA's Rubin platform targets a 10x reduction in inference token costs compared to its Blackwell architecture. According to Ramp's enterprise spending data, the average cost per million tokens across major providers fell from roughly $10 to $2.50 in a single year.

That long-term trend is real. But it doesn't solve the near-term problem. In practice, falling unit prices tell only half the story. The way organizations consume AI has changed so dramatically that cheaper per-token costs are offset by dramatically higher usage volume.

Enterprises that planned budgets around 2024 token rates are finding that agentic AI workflows at 2026 adoption levels consume multiples of what the spreadsheet projected.

Uber's engineers didn't stop wanting to use Claude Code. They ran out of money to pay for it. That's the crisis in a sentence.

Bottom Line

The enterprise AI budget crisis of 2026 is a consumption problem disguised as a pricing problem. Token costs are falling, but usage is rising faster. Agentic workflows consume 10-100x more tokens than autocomplete. And when 84% of your engineering org adopts AI-assisted coding, the budget you planned in January is gone by April.

The solutions are emerging: model tiering, advisor architectures, vendor consolidation, aggressive caching. But the fundamental tension remains: enterprises want AI productivity gains, but they can't afford to pay frontier-model prices for every task.

Google's $1 billion savings claim is credible—if you can shift 80% of workloads to cheaper models. The question is whether that shift compromises quality enough to erase the productivity gains.

For now, the smartest move is to measure everything: token consumption per developer, cost per task, productivity gains per dollar spent. Because the only thing worse than blowing your AI budget in four months is not knowing why.

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Subscribe