The CFO who signed off on the enterprise AI budget in January is about to have a very uncomfortable Q3 conversation. Token-based pricing just redrew the rules of the AI economy — and most finance teams haven't caught up.
At FinOps X 2026 in San Diego this week, the message from enterprise practitioners was blunt: the era of flat-fee, all-you-can-eat AI subscriptions is over. Token pricing is replacing it. And if you're running AI agents at scale, your costs are not growing linearly — they're compounding.
This isn't a future threat. It's happening right now, inside your company's current AI deployments.
The Honeymoon Is Over
Think back to 2024. Your teams signed up for AI tools at flat monthly prices. $20 per seat here. $200 per month there. Predictable. Budget-friendly. Easy to approve.
That pricing was never sustainable — it was a land-grab strategy. OpenAI, Anthropic, Google, and Microsoft were subsidizing access to build user habit and market share. According to analysis from SemiAnalysis, a $200/month Anthropic subscription was delivering roughly $8,000 worth of Claude compute at real cost. A similar OpenAI plan was giving users $14,000 worth of Codex tokens.
That math doesn't work for anyone building a sustainable business. And it's now being corrected — rapidly.
J.R. Storment, Executive Director of the FinOps Foundation, described it plainly at FinOps X: tokens have become "the atomic unit of AI." In his keynote, he made a striking comparison — tokens "serve more roles in the modern economy than almost any other commodity has in modern history, maybe, maybe oil in the 20th century." They are simultaneously the unit of compute output, the vendor pricing mechanism, and the value metric enterprises use to justify AI investment.
The subsidy era is done. You're now paying closer to real cost.
What Is a Token (and Why Should You Care)?
Most business leaders glaze over at the word "token." But understanding the basics is now a financial literacy requirement — not just a technical curiosity.
An AI token is the smallest unit a language model processes. Before a model reads your prompt or writes a response, it converts all text into tokens. In English, one token is roughly four characters or three-quarters of a word. That means 100 tokens is about 75 words.
You pay for both what you send (input tokens) and what the model generates back (output tokens). Vendors publish rate cards per million tokens, with separate prices for inputs and outputs. Output tokens are typically two to five times more expensive per million than input tokens.
For a simple customer service bot handling 10,000 queries a day, this math is manageable. For an AI agent making iterative decisions, spawning sub-tasks, and looping back for corrections — the math gets ugly fast.
SAP's FinOps team captured it well in their session at the conference: "You pay per token, and this little token hides an enormous complexity underneath predictability." Model choice, quantization settings, caching behavior, agent loop depth — all of it affects token consumption, and most enterprises have no real visibility into those variables today.
The Agentic Multiplier No One Budgeted For
Here's where the enterprise cost crisis gets serious. Agentic AI — systems where AI models call tools, take multi-step actions, and loop back to correct themselves — is the dominant enterprise deployment pattern right now. It's also catastrophically expensive in a token economy.
Storment laid out the cost math at FinOps X. Through mid-2025, global token usage grew in what he described as "a nice linear path." Then came the agentic wave: context windows expanded from tens of thousands of tokens to millions, and agents started introducing "loops and retries and corrections and all this insanity."
Token consumption went non-linear. Not because the models got worse — because the patterns changed.
A simple query to a language model might consume 2,000 tokens. An agentic workflow handling the same business task — calling tools, validating outputs, retrying on errors, passing context between steps — might consume 50,000 to 200,000 tokens to accomplish the equivalent work. At scale, across a team or department, that's the difference between a manageable line item and a budget crisis.
I've seen this play out firsthand in enterprise AI deployments. What looks like a small-scale pilot at $3,000/month becomes a $40,000/month production system the moment you turn on agent loops and real user load. The step function catches finance teams completely off guard.
56% of CEOs Report Zero ROI — And This Is Why
Enterprise AI spend hit an average of $11.6 million in 2026, according to recent market data. Yet 56% of CEOs report no measurable revenue or cost benefit from that investment.
That gap has a name at FinOps X this week: tokenmaxxing. The practice of optimizing for token usage metrics — tokens per employee, prompts per week, agents deployed — instead of business outcomes. Teams look productive on the AI leaderboard while the actual work benefits remain invisible.
Amazon's SVP Dave Treadwell captured the frustration perfectly when he said publicly: "Please don't use AI just for the sake of using AI." That's not a warning against AI investment — it's a warning against measuring the wrong thing.
Token usage is a cost metric. It tells you how much you're spending. It tells you nothing about what you're getting. The companies getting real ROI from AI are the ones who've connected token consumption to business outputs: revenue generated, cases resolved, code shipped, decisions accelerated.
The ones who aren't? They're paying $11.6 million a year to populate a dashboard.
What CFOs Need to Do Right Now
If you're a CFO or head of finance, you need three things in place before your next AI renewal or expansion:
1. Cost attribution by business unit. You cannot manage what you cannot allocate. Token costs need to roll up to the department consuming them, not sit as an undifferentiated IT line item. Every AI system should have an owner accountable for both spend and business outcomes.
2. Budget guardrails on agentic systems. Agents don't have natural stopping points. Without spend limits and token consumption caps per agent run, a misconfigured workflow can run for hours, burning tokens at rates that rival entire teams. Implement hard limits before you give agents access to production systems.
3. Monthly token cost per business outcome. Not tokens consumed — tokens consumed per resolved customer ticket, per PR reviewed, per contract processed. This is the metric that distinguishes value from waste. If you don't have this data, you don't know if your AI investment is working.
At a Fortune 500 company I've worked with, the finance team was shocked to discover that 40% of their monthly AI token spend was coming from three automated workflows that were running on test data with no production impact. No one had set up attribution. No one caught it for six months.
What CTOs Need to Do Right Now
The technical side of this problem is a FinOps challenge disguised as an engineering one. Here's the playbook:
Implement prompt caching aggressively. Every major AI provider now offers some form of caching — if you send the same system prompt repeatedly, you pay significantly less for the repeated input tokens. For enterprise deployments with consistent prompts, this alone can cut costs 40-60%.
Right-size model selection. Not every task requires your most capable model. A tier-1 reasoning model at $15 per million output tokens is overkill for document summarization or data extraction. A smaller, faster model at $0.60 per million output tokens handles those tasks just as well. Most enterprises are routing everything to the premium tier because it's the default.
Instrument agent loops for cost before deploying to production. Before any AI agent touches production traffic, you should know its average token consumption per task, its maximum token consumption per task, and its failure-retry cost profile. If you don't have this data from staging, you're flying blind in production.
Set context window discipline. The temptation with million-token context windows is to feed models everything. Resist it. Every token in context is a token you're paying for. Retrieval-augmented generation (RAG) patterns that pull only relevant context chunks are far more cost-efficient than brute-force large-context approaches at scale.
In peer conversations with engineering leaders, the ones who've gotten AI cost under control share one thing in common: they treat token budget as a first-class engineering constraint, the same way they treat latency or reliability. It's not an afterthought — it's designed in from the start.
The Market Is Moving Fast
Token prices have been falling — Storment acknowledged that "since 2023, token prices have fallen dramatically" — but that trend is decelerating. GPU scarcity and the genuine value of frontier AI models are keeping prices from collapsing to near-zero.
More importantly, falling per-token prices have been more than offset by exploding consumption volumes. The net effect for enterprise AI budgets is more spend, not less, even as unit costs decline.
The GitHub Copilot situation is instructive here. When Microsoft moved Copilot from flat-fee to token-based billing earlier this year, developer communities erupted. Not because tokens are inherently expensive, but because developers couldn't predict or control their consumption. Predictability is the core problem — and it's a governance problem, not a pricing problem.
The Bottom Line
Every enterprise leader needs to internalize one reality: AI is no longer a subscription. It's a utility, priced on consumption, with variable costs that scale with both usage patterns and model choices.
This is actually good news if you manage it correctly. Utility pricing creates accountability. It forces the question "what are we getting for this spend?" in a way that flat subscriptions never did. The companies that build FinOps discipline for AI now will have a significant competitive advantage over those who don't — in cost efficiency, in speed of iteration, and in the ability to scale AI investments without budget surprises.
The reckoning is here. The question is whether you're going to manage it or be managed by it.
Rajesh Beri is the founder of THE DAILY BRIEF and Head of AI Engineering at a Fortune 500 enterprise security company. He writes about enterprise AI strategy, implementation, and economics for technical and business leaders.
