Companies are blowing through their entire 2026 AI budgets — and it's only May. Uber exhausted its annual AI allocation in four months. Microsoft just canceled most internal Claude Code licenses. One healthcare enterprise racked up $6 million in unplanned costs from a single AI deployment. And in the most extreme case, one company received a $500 million bill for a single month of Claude usage.
The culprit? The shift from seat-based software pricing to consumption-based AI tokens. What looked like a predictable $10-$30 per seat per month turned into volatile, unpredictable spending that's forcing CFOs to choose between AI budgets and headcount growth.
Google sees an opening. While Anthropic hypes its unreleased Mythos model and OpenAI races toward IPO, Google is changing the conversation from capability to cost. The company's latest pitch: if you're one of Google cloud's top customers and you move 80% of your AI workloads to a mix of Gemini 3.5 Flash and other frontier models, you could save more than $1 billion a year.
That's not marketing hyperbole. That's Google's infrastructure advantage finally showing up where it matters most — your bottom line.
The Token Budget Crisis Is Real
By 2026, an estimated 85% of SaaS providers transitioned to consumption-based pricing directly tied to token usage. The era of "unlimited" AI access under flat-rate subscriptions is over. And enterprises weren't ready.
Here's what's happening:
- Uber exhausted its entire 2026 AI budget by April after rolling out Claude Code to 5,000 engineers. Monthly costs per engineer ranged from $150 to $250 on average, with heavy users hitting $500 to $2,000.
- Microsoft began canceling most internal Claude Code licenses in mid-May, redirecting engineers to its own GitHub Copilot CLI to control token costs.
- An unnamed healthcare enterprise consumed 1 trillion tokens over six months, resulting in more than $6 million in unplanned costs.
- One AI consultant's client accrued a $500 million bill in a single month for Claude usage — with no spending caps or usage limits in place.
The problem isn't just sticker shock. It's that AI spending patterns don't follow traditional software procurement rules. Finance teams are missing AI cost forecasts by over 50% in nearly one in four cases, according to industry reports. Deloitte published a comprehensive CFO guide on AI token economics in April 2026 — a sign that even the consulting giants recognize this is uncharted territory.
Why AI Costs Are Spiraling Out of Control
The pricing spread for AI models in 2026 is staggering: from as low as $0.04 per million tokens for budget models to over $180 per million for frontier reasoning models. That's a 4,500x difference.
And enterprises are defaulting to the expensive end. This behavior — called "token maxing" — burns through budgets 10 to 100 times faster than necessary. Why? Because most teams don't have the tooling, governance, or incentives to route simpler tasks to cheaper models.
Three factors are driving runaway costs:
-
Agentic workflows are token-hungry. AI agents run in the background, processing millions of tokens without human intervention. Unlike ChatGPT sessions that end when you close the tab, agents keep consuming tokens until they complete their tasks — or until your budget hits zero.
-
Longer context windows increase usage. Models with 1M+ token context windows can process entire codebases or multi-hundred-page documents in a single API call. That's powerful — and expensive.
-
Inference costs now dominate AI budgets. By 2026, AI inference costs represent 85% of enterprise AI budgets, up from being an afterthought in 2023. Training costs are sunk investments; inference costs keep growing month after month.
Google CEO Sundar Pichai recently noted that monthly usage of Google's AI products increased sevenfold to 3.2 quadrillion tokens since last year. That's not a typo. Quadrillion. And every one of those tokens costs someone money.
Google's Full-Stack Advantage
Here's where Google's infrastructure play gets interesting. The company pays around 50% to 75% less for its internal AI compute than rivals, according to analyst estimates from William Blair.
Why? Google owns the full stack:
- Custom TPU chips designed specifically for AI workloads
- Direct sourcing from component manufacturers (no Nvidia markup)
- Global data center network optimized over 25+ years
- Vertical integration from silicon to application layer
OpenAI, by contrast, pays Microsoft, Oracle, and other cloud providers a margin on every ChatGPT and Codex request. Those providers pay Nvidia for GPUs. Every layer adds cost. Google cuts out multiple intermediaries.
This isn't new. Google used the same playbook to dominate search in the 2000s. While Yahoo and Microsoft competed on result quality, Google built custom infrastructure from cheap, off-the-shelf parts to maximize speed and minimize cost. Google's results didn't need to be the absolute best — they just needed to be fast enough and cheap enough that users kept coming back.
Now Google is rerunning that strategy with Gemini. Except this time, it also has a hugely successful search advertising business that can subsidize AI efforts while rivals like OpenAI and Anthropic burn through cash and race for more compute.
The Gemini 3.5 Flash Pitch: "Good Enough" at 75% Off
Google's latest Gemini 3.5 Flash model is positioned as a high-capability, low-cost alternative to frontier models. Pricing ranges from $0.10 to $0.40 per million tokens for the Flash-Lite variant — compared to $2 to $18 per million for Gemini 3.1 Pro and significantly higher for Anthropic's Claude or OpenAI's GPT-5.5.
But there's a catch. Early analysis from Artificial Analysis found that Gemini 3.5 Flash costs 5.5 times more to run than its predecessor, Gemini 3 Flash, and nearly twice as much as Gemini 3.1 Pro. So while it's cheaper than Anthropic or OpenAI's top-tier models, it's not the budget option it might seem at first glance.
Google's argument: You don't need frontier performance for most enterprise tasks. Route 80% of your workloads to Flash, reserve the expensive models for the 20% of tasks that actually need them, and save $1 billion a year.
Is that realistic? Depends on your workload mix. If your AI usage is dominated by coding assistants, customer support chatbots, document summarization, or data extraction, Flash is probably sufficient. If you're running complex reasoning tasks, multi-step agentic workflows, or research-grade analysis, you'll still need frontier models.
The key is intelligent model routing — a capability that requires infrastructure most enterprises don't have yet.
What CTOs and CFOs Should Do Right Now
If you're a CTO:
-
Audit your current AI spending by model tier. How much are you spending on frontier models vs. mid-tier vs. budget? Break it down by use case.
-
Implement intelligent model routing. Use cost-efficient models for simple tasks; reserve expensive models for complex reasoning. This can reduce token consumption by 30-50% without sacrificing quality.
-
Set up real-time usage analytics. You can't control costs you can't see. Implement budget alerts, chargebacks to business units, and hierarchical budget management with hard caps.
-
Evaluate Google's Flash offering against your workload. Run benchmarks on your actual tasks. Don't trust vendor marketing — test it yourself.
-
Consider AI gateways. Enterprise AI gateways add a control layer between applications and LLM providers, enabling semantic caching, provider routing, and cost attribution.
If you're a CFO:
-
Treat AI spending like cloud spend, not software licenses. Consumption-based pricing requires FinOps discipline, not traditional procurement processes.
-
Set department-level budgets with hard caps. Engineering teams will consume as much as you give them. Cap it.
-
Compare AI token costs to human labor costs. Some enterprises are now facing a "tokens or humans" dilemma. If AI inference costs are approaching headcount costs, you need to justify ROI differently.
-
Demand ROI metrics before expanding AI usage. Every dollar spent on tokens should drive measurable value: cost savings, revenue growth, or productivity gains.
-
Plan for 2027 budgets now. If you're already blowing through 2026 budgets by May, extrapolate forward. What does that look like in 12 months?
The Real Opportunity: Infrastructure as Competitive Moat
Google's bet is simple: as AI commoditizes, the advantage shifts to whoever can run it cheapest and fastest. Capability gaps between frontier models are shrinking. OpenAI's president recently declared that "the model alone is no longer the product."
If that's true, infrastructure becomes the moat.
Google has spent 25+ years building that moat. TPU chips, global data centers, direct component sourcing, vertical integration. Rivals can't replicate that overnight — or even over a few years.
But here's the nuance: Google's cost advantage only matters if enterprises are willing to accept "good enough" performance in exchange for dramatic cost savings. If you're convinced you need the absolute frontier model for every task, Google's pitch won't resonate.
The shift is already happening. Uber COO Andrew Macdonald said in April that it's becoming harder to justify the company's ballooning AI costs. Venture capitalist Chamath Palihapitiya said his firm, 8090, moved away from Cursor because it was spending too much on tokens. Analyst Dan Morgan from Synovus Trust noted: "As AI agents become more complex, long-running processes have become the norm. This has created sticker shock at many organizations."
Translation: Enterprises are hitting a breaking point. And when budgets run dry, "good enough" starts looking pretty good.
Bottom Line: The Token Budget Crisis Is Google's Opportunity
Three things are true at the same time:
-
Enterprises are blowing through AI budgets faster than expected. It's only May, and companies like Uber have already exhausted their annual allocations.
-
Google has a structural cost advantage that rivals can't match. Owning the full stack — chips, data centers, cloud, models, applications — means Google pays 50-75% less for AI compute.
-
The market is shifting from capability competition to cost competition. As model performance gaps shrink, price becomes the differentiator.
Google's $1 billion savings pitch isn't aspirational — it's a direct challenge to OpenAI and Anthropic. If you're spending $1.2 billion a year on AI inference and Google can deliver comparable performance for $200 million, the ROI conversation changes fast.
The question for enterprise leaders: Are you willing to accept "good enough" AI performance in exchange for massive cost savings? Or do you still believe you need frontier models for everything?
Your answer will determine whether Google's infrastructure bet pays off — and whether your CFO approves next year's AI budget.
