A business user at one enterprise spent $32,000 in a single month on AI tokens. Their colleagues didn't know until the bill arrived. This is not a hypothetical — and it's about to become your problem too.
Gartner published a new prediction this week that should land in every CFO and CTO inbox by morning: within two years, the cost of AI token consumption for software developers will meet or exceed the average developer's monthly salary. Not in a decade. By 2028.
For most enterprises, this isn't a warning about the future. It's a description of what's already happening — just without anyone watching.
The Shift Nobody Budgeted For
For years, enterprise software costs were predictable. You paid a per-seat SaaS fee. The headcount grew, the bill scaled linearly. Finance could model it. Procurement could negotiate it. Budgets could absorb it.
AI has broken that model entirely.
Vendors are moving from flat per-seat licensing to consumption-based pricing — you pay for every token your developers input into, and receive from, AI tools. Context windows, completions, reasoning chains, agent loops, code reviews — all of it burns tokens. The more capable the model, the more it costs per token. And the more your team uses it, the more the costs compound.
Gartner senior principal analyst Nitish Tyagi put it plainly: "I have heard scary numbers like 'My developer consumed $20K last month,' or 'A business user consumed $32K'."
Those aren't edge cases. They're early signals of a systemic pattern that most enterprises haven't built the governance machinery to handle.
Why the Old Playbook Doesn't Work
The cost structure for AI workloads is fundamentally different from anything enterprise IT has managed before.
Traditional infrastructure costs — cloud compute, storage, bandwidth — are high but predictable. You can set budget alerts, right-size instances, and run utilization reports. The cost drivers are technical and visible.
Token costs are neither predictable nor visible to most organizations right now.
First, the billing logic is opaque. Different models charge at different rates. Context window usage isn't uniformly disclosed. Agentic workflows — where AI models call other tools, which call other AI models — can multiply token consumption in ways that aren't obvious from the user interface. A developer running a code review loop doesn't see the token meter ticking. Their manager doesn't either.
Second, light users drive disproportionate costs as adoption matures. Developers start. Then product managers, analysts, legal teams, and finance staff onboard. Each new user adds consumption that procurement didn't forecast. Enterprise AI licensing agreements negotiated for the dev team suddenly cover 10x the user base and 50x the usage.
Third, AI vendors haven't yet built mature cost optimization tools into their products. Tyagi noted that enterprises are moving from experimentation to scaled deployment, but vendors are still catching up on transparency and governance features. The tools to proactively manage costs simply don't exist in most platforms today.
The Productivity Paradox at the Center of This
Here is the finding that should most concern technical leaders: there is no direct relationship between how many tokens a developer consumes and how productive they become.
Gartner calls excessive token use "tokenmaxxing" — the habit of feeding AI tools as much context as possible on the assumption that more input equals better output. It doesn't. In fact, optimizing token consumption — being deliberate about what context gets passed to an AI — actually improves quality, not just cost.
"Tokenmaxxing is not directly related to higher productivity gains," Tyagi said. "But optimizing token consumption is."
This matters for how you measure ROI. If your engineering team is using 10x more tokens than last quarter and you're attributing that to higher productivity, you may be measuring the wrong thing. Token consumption is an input metric, not an output one. The output metrics that matter are feature velocity, defect rates, cycle time from spec to production, and customer satisfaction scores.
Gartner estimates AI assistive development can deliver up to 20% productivity gains — but only when teams are using AI deliberately and governing consumption rather than just feeding the machine more data.
What CFOs Need to Know Right Now
Token costs are an off-balance-sheet risk for most enterprises. The spend is happening in tools your finance team approved at one scale, now being used at a different order of magnitude. The consumption-based pricing model means there's no natural ceiling — unlike a SaaS seat count, which self-limits based on headcount.
The ROI narrative is fragile without cost governance. AWS CEO Matt Garman made headlines this week noting that 90% of CIOs he surveyed now have a path to positive AI ROI — a dramatic shift from even a year ago. But ROI projections built on productivity gains get eroded quickly when token costs grow unchecked. A 20% productivity improvement doesn't look compelling if it comes with a 40% increase in tooling costs.
Finance needs a new cost model for AI. The old SaaS cost model — seats multiplied by per-seat price — doesn't apply. What you need instead is a usage-by-team, usage-by-workflow, cost-per-output-unit model. That requires visibility into token consumption by user, by task type, and by business outcome.
For procurement teams: push AI vendors for committed pricing, volume discounts, and consumption caps. Many vendors will negotiate these terms now, before your consumption grows large enough to give you leverage.
What CTOs and VPs of Engineering Need to Do
The technical response to this challenge has three components: governance, routing, and context discipline.
Build governance before you need it. Gartner recommends establishing token thresholds by role and use case, automating usage monitoring with alerting, and creating explicit escalation policies when thresholds are crossed. These controls need to be embedded in engineering workflows — not layered on after the fact. Developer choice alone cannot produce token discipline, Tyagi noted. "Costs can escalate faster than the productivity gains these tools are designed to deliver," he warned.
Route tasks to the right model. One of the biggest cost drivers is defaulting to frontier models for everything. A senior developer running a complex architecture review might legitimately need a high-reasoning frontier model. The same developer checking for syntax errors in a 50-line function does not. AWS has built this logic into Kiro, its agentic development environment — automatically routing tasks to lighter models for high-frequency work and reserving frontier models for genuinely complex requests. The principle applies regardless of vendor: build deliberate model routing into your AI workflows.
Treat context engineering as a core skill. Context engineering — the practice of providing AI tools only the information they need, summarized as tightly as possible — is the highest-leverage intervention available to engineering teams today. It reduces token costs, improves output quality, and increases speed. Gartner recommends mandating specific context engineering practices and including them in developer training programs. For engineers reading this: this is the skill that will differentiate your performance and your team's cost efficiency over the next 18 months.
Create a use-case decision framework. Not every task should be handled the same way. Classify work into three execution models: developer-led (AI as a research assistant), developer-with-agent (AI handles drafting, human reviews), and fully agent-led (AI executes end-to-end). The appropriate autonomy level determines the appropriate token budget. Match the model to the task; match the autonomy level to the risk and cost tolerance.
The Framework That Actually Works
Talking to peers who've gotten ahead of this issue, the approaches that are working share a common structure:
Start with measurement. You cannot govern what you cannot see. Instrument your AI tooling to capture token usage by user, by model, by workflow, and by time period. Many platforms expose this data via API even if the native dashboard doesn't surface it clearly. Build a weekly cost report that goes to both engineering leadership and finance.
Set a cost-per-outcome baseline. For your highest-value AI workflows — the ones where you have the clearest productivity evidence — calculate cost per unit of output. If you're using AI for code review, the metric might be cost per PR reviewed. If you're using AI for documentation, cost per page produced. This baseline lets you identify when a workflow's economics are drifting, and whether a cost increase is justified by output quality.
Embed token reviews into sprint cycles. Gartner recommends reviewing high-consumption workflows at a regular cadence — identifying inefficiencies, refining context engineering practices, and sharing learnings across teams. Making this part of the sprint process, not a separate initiative, is what drives adoption.
Don't retreat from AI. The temptation when confronted with unexpected costs is to restrict usage. That's the wrong move. Gartner's advice: do not treat escalating costs as a reason to move away from AI, or to shift all workloads to open-source models. The productivity value is real. The goal is to capture it without letting costs outrun it.
The Bottom Line for Leaders
The era of flat-fee AI pricing is ending faster than most enterprises anticipated. Consumption-based models are here, they're scaling, and the governance frameworks haven't kept pace.
For business leaders: the AI ROI story your teams are telling you may not account for token costs growing alongside adoption. Get visibility into your current spend, model what it looks like at 3x current usage, and decide whether your current ROI projections still hold.
For technical leaders: context engineering is the highest-return investment your team can make right now. Better context means better output, lower costs, and faster delivery — simultaneously. That's rare in engineering, and it's the right place to focus.
The $32,000 monthly token bill isn't a horror story about AI going wrong. It's a signal that AI is going right — just without the governance layer in place to make the economics work. Build the governance. Keep the tools.
Gartner's full prediction on AI coding token costs can be found at gartner.com.
Follow Rajesh Beri on LinkedIn and X/Twitter for daily enterprise AI insights.