Uber's AI Budget Was Gone in 4 Months. Yours Could Be Too.

78% of IT leaders got surprise AI bills in 2026. Uber burned its entire budget in 4 months. Here's what tokenmaxxing is — and how to fix it before it hits you.

By Rajesh Beri·July 5, 2026·11 min read
Share:
THE DAILY BRIEF
Enterprise AIAI BudgetsFinOpsTokenmaxxingClaude Enterprise
Uber's AI Budget Was Gone in 4 Months. Yours Could Be Too.

78% of IT leaders got surprise AI bills in 2026. Uber burned its entire budget in 4 months. Here's what tokenmaxxing is — and how to fix it before it hits you.

By Rajesh Beri·July 5, 2026·11 min read

The quote that should be on every CFO's whiteboard right now came from a major enterprise CTO earlier this year: "I'm back to the drawing board, because the budget I thought I would need is blown away already."

This wasn't a startup that underestimated its runway. It was a tech giant with 5,000 engineers. The company had rolled out an AI coding assistant in December 2025. By April 2026 — four months later — the entire annual AI budget was consumed. Not over-budget by 10 or 20 percent. Gone. The kind of gone that triggers emergency finance reviews and board-level conversations about whether AI programs are governable at all.

This is the enterprise AI budget crisis of 2026, and if your organization has deployed agentic AI tools at any meaningful scale, there is a real chance your finance team is about to have the same conversation.

The Numbers Are Worse Than You Think

The incident above was not an isolated outlier. It was the most visible example of a pattern that has been quietly accumulating across enterprise balance sheets.

78% of IT leaders reported unexpected charges from consumption-based AI pricing models in 2026, according to Zylo's SaaS Management Index. A separate Axios investigation identified one unnamed enterprise that spent $500 million in a single month after deploying AI tools without usage caps or monitoring. Microsoft canceled internal AI coding tool licenses across a major division before its June 30 fiscal year close, citing the same dynamics.

The broader picture is equally sobering. A survey published earlier this year found that 79% of organizations overspent on AI, with mature FinOps teams experiencing an average cost overrun of 30.9%. And 90% of CIOs named AI cost forecasting as their top deployment challenge, according to Flexprice research — meaning the difficulty of predicting these costs is now the number one operational concern for technical leaders, ahead of security, talent, and integration complexity.

What changed? The models didn't suddenly get more expensive. The way enterprises use them did.

What Tokenmaxxing Actually Is

The term "tokenmaxxing" has entered the enterprise vocabulary in 2026 to describe a specific organizational failure mode: defaulting to the most powerful, most expensive AI model for every task regardless of whether that capability is required.

It sounds trivial. It is not. There is currently a 4,500x pricing spread between the cheapest and most expensive AI models available. A junior analyst doing basic document summarization who defaults to a frontier reasoning model for every conversation costs an organization orders of magnitude more than the same task assigned to a lightweight model. Multiply that across hundreds or thousands of employees running dozens of AI interactions per day, and the math compounds into genuine P&L exposure.

Tokenmaxxing has an organizational accelerant. Several companies — including a large tech firm that has since reversed course — ran internal AI adoption leaderboards that measured success by token consumption, treating high usage as a proxy for productivity. The result was an incentive structure that rewarded burning tokens rather than generating business value. When the quarterly spend reports arrived, the damage was done.

The problem is not that employees are lazy or irresponsible. The problem is that enterprises deployed powerful AI tools without the governance infrastructure to match task complexity to model capability. No policy. No defaults. No guardrails. No alerts. Just open access to whatever model the interface defaulted to.

Why Agentic AI Makes This Structurally Different

Before agentic AI, the token consumption math was predictable. An employee sends a prompt, the model responds, the interaction costs a roughly calculable number of tokens. Budgets could be estimated from usage projections.

Agentic AI broke that model entirely.

When a developer uses an AI coding assistant on a large repository, they do not initiate one API call. The agent plans the task, retrieves relevant context from the codebase, calls tools to inspect files and run tests, verifies its outputs, retries steps that produce errors, and checks its work against the original intent — all before surfacing a result to the user. A single user-initiated task can generate 5 to 30 model calls, according to a March 2026 Gartner analysis. GitHub's May 2026 research found that agentic coding tasks can consume roughly 1,000 times more tokens than a standard single-turn query.

That multiplier detonates any AI budget built on chat-era assumptions. Most enterprise AI budgets for 2026 were set in the fall of 2025, before agentic coding became the default way engineers worked. Finance teams modeled costs based on individual queries. The actual usage pattern was autonomous agents running dozens of calls in the background while the engineer did something else entirely.

Goldman Sachs projects that token consumption will multiply 24-fold — reaching 120 quadrillion tokens per month — between 2026 and 2030. At enterprise pricing tiers, where leading AI models are billed per million tokens for both inputs and outputs, that trajectory carries serious financial implications for any organization that doesn't build governance infrastructure now.

The FinOps Gap Nobody Planned For

cloud computing spent a decade building the financial operations discipline now known as FinOps — the combination of tooling, practices, and organizational habits that lets enterprises understand, forecast, and optimize cloud spend. AWS Cost Explorer, Azure Cost Management, tagging strategies, reserved instance planning, idle resource alerts: these are the artifacts of hard lessons learned when cloud bills first started arriving.

AI spending is now in the same position cloud was in 2012. The tools that enterprises built to govern cloud costs don't map cleanly onto AI cost structures. AI billing happens at the token level, varies dramatically by model tier, and scales non-linearly with agentic task complexity rather than linearly with compute instance hours. The mental models don't transfer.

Most enterprises have no idea which department is spending the most on AI. They don't know which model tier their employees are defaulting to. They don't know whether the finance team's Claude usage is running on a $3-per-million-token model when a $0.25-per-million-token model would handle their use cases identically. And they find out the bill has exceeded projections at the end of the month, when there is nothing to be done except pay it.

That gap — the absence of visibility, controls, and governance — is what Anthropic announced it was addressing on July 2, 2026.

Anthropic's Response: Claude Enterprise Gets Spend Controls

On July 2, Anthropic shipped a significant update to Claude Enterprise: model-level entitlements, a richer analytics dashboard, and configurable spend-threshold alerts. The release is available now for all Claude Enterprise customers through the admin console.

The timing is deliberate. Anthropic's own enterprise customers have been living through the billing crisis described above. The company has watched organizations burn budgets they didn't know were at risk, and the July 2 release is a direct response to what enterprise admins have been asking for.

This is not a cosmetic update. The three components work together to give IT and finance teams meaningful governance over AI spend for the first time.

Model-Level Entitlements: Matching Capability to Role

The most structurally important addition is model-level entitlements. Administrators can now set which Claude model starts a conversation by default — across chat, Cowork, and Claude Code — and can restrict which models specific user groups can access at all.

The practical implication: an engineering team working on complex architecture problems can be given access to full frontier models. A sales team using Claude for email drafting and CRM summarization can be restricted to a mid-tier model that handles those tasks at a fraction of the cost. An operations team doing basic data extraction can be assigned to a lightweight model. The org chart becomes the policy layer for AI cost governance.

Critically, this integrates with SCIM — the System for Cross-domain Identity Management protocol that enterprises already use to sync users and groups from identity providers like Okta and Azure Active Directory into SaaS tools. Organizations don't need to build a separate access hierarchy for Claude model governance. The IT team's existing group definitions become the model entitlement definitions automatically. For regulated industries where specific AI models must be used for specific data categories — financial services, healthcare, government contracting — this gives compliance teams an enforceable policy mechanism rather than a policy document that employees can bypass.

Spend Alerts: Know Before the Bill Arrives

The second component is configurable spend-threshold alerts. Administrators can set spending limits that trigger notifications before budgets are exhausted — at the organizational level, the group level, or per individual user. An IT leader can configure an alert when a department's AI spend reaches 70% of its monthly allocation, giving finance teams time to investigate and adjust before crossing into overrun territory.

This is the capability that would have prevented the incidents described above. Not after the fact, not in the monthly bill review, but in real time as consumption approaches the limit.

Analytics Dashboard: Visibility That Actually Helps

The upgraded analytics dashboard surfaces cost and usage broken down by group and by individual user, with output metrics displayed alongside their token cost. Admins can see not just what was spent, but what was produced — artifacts created, files edited, skills invoked — which gives some ability to assess whether the spend generated proportionate output.

The dashboard filters use the same SCIM group definitions as the entitlement controls, so the spend view aligns with organizational structure rather than requiring a separate classification exercise.

What Technical Leaders Need to Do Now

Anthropic's release is a useful governance layer for Claude Enterprise deployments, but it doesn't solve the broader problem for organizations running multiple AI tools across multiple vendors. The FinOps discipline for AI has to be built at the organizational level, not vendor by vendor.

For CIOs and VP Engineering: Audit your current AI tool deployments for default model settings. Most tools default to the most capable — and most expensive — model because that's what produces the most impressive demos. Establishing a tiered model policy that matches model capability to task complexity is the single highest-ROI governance action available right now. The 4,500x pricing spread means even a partial shift toward appropriate-tier usage produces significant savings.

For CFOs: Build AI cost forecasting into your quarterly budget process as a first-class item, not a line in the software budget. The variables that drive AI spend — model tier, agentic task complexity, team size, and usage intensity — are different enough from traditional software costs that they require their own planning framework. Require monthly spend reporting with department-level breakdowns, and establish approval thresholds for new AI tool deployments before, not after, the first bill arrives.

For compliance and legal teams: Model-level entitlements matter beyond cost. If your organization handles regulated data — PHI, PII, financial information subject to audit — you need a mechanism to ensure sensitive workloads run only on models that have cleared your internal security review. The Claude Enterprise controls give you that mechanism for Claude deployments; ensure you have equivalent controls for other tools in your stack.

For all leaders: The Uber story is not a cautionary tale about AI adoption — it's a cautionary tale about deploying powerful tools without governance infrastructure. The lesson is not to slow down AI adoption. It's to build the management layer at the same time as the deployment layer, not six months later when the CFO is asking questions.

The Structural Reality

What enterprise leaders are confronting in 2026 is a structural shift, not a temporary anomaly. Agentic AI tasks are inherently more token-intensive than the query-and-response patterns that preceded them. Token consumption will continue to grow as agentic workflows become the default rather than the exception. The cost curves will only become more complex as organizations layer multiple AI tools, models, and agents across their workflows.

The organizations that build FinOps discipline for AI now — visibility, model governance, spend alerts, department-level accountability — will have a structural advantage when the next wave of agentic deployments arrives. The organizations that don't will keep having the conversation Uber's finance team had this spring: staring at a bill they didn't expect, trying to explain to the board why the budget they set in October was gone by April.

The tools exist. The governance frameworks are beginning to mature. The question is whether your organization will build them proactively or reactively.

Based on the 2026 data, 78% of enterprises are choosing reactively. That's still a choice — just not a good one.


Rajesh Beri is the founder of THE D*AI*LY BRIEF, a newsletter focused on Enterprise AI for technical and business leaders. He writes from experience building and deploying AI systems at enterprise scale. Follow on Twitter/X or LinkedIn.

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

beri.net

Subscribe at beri.net/subscribe for twice-weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Uber's AI Budget Was Gone in 4 Months. Yours Could Be Too.

Photo by Brett Sayles on Pexels

The quote that should be on every CFO's whiteboard right now came from a major enterprise CTO earlier this year: "I'm back to the drawing board, because the budget I thought I would need is blown away already."

This wasn't a startup that underestimated its runway. It was a tech giant with 5,000 engineers. The company had rolled out an AI coding assistant in December 2025. By April 2026 — four months later — the entire annual AI budget was consumed. Not over-budget by 10 or 20 percent. Gone. The kind of gone that triggers emergency finance reviews and board-level conversations about whether AI programs are governable at all.

This is the enterprise AI budget crisis of 2026, and if your organization has deployed agentic AI tools at any meaningful scale, there is a real chance your finance team is about to have the same conversation.

The Numbers Are Worse Than You Think

The incident above was not an isolated outlier. It was the most visible example of a pattern that has been quietly accumulating across enterprise balance sheets.

78% of IT leaders reported unexpected charges from consumption-based AI pricing models in 2026, according to Zylo's SaaS Management Index. A separate Axios investigation identified one unnamed enterprise that spent $500 million in a single month after deploying AI tools without usage caps or monitoring. Microsoft canceled internal AI coding tool licenses across a major division before its June 30 fiscal year close, citing the same dynamics.

The broader picture is equally sobering. A survey published earlier this year found that 79% of organizations overspent on AI, with mature FinOps teams experiencing an average cost overrun of 30.9%. And 90% of CIOs named AI cost forecasting as their top deployment challenge, according to Flexprice research — meaning the difficulty of predicting these costs is now the number one operational concern for technical leaders, ahead of security, talent, and integration complexity.

What changed? The models didn't suddenly get more expensive. The way enterprises use them did.

What Tokenmaxxing Actually Is

The term "tokenmaxxing" has entered the enterprise vocabulary in 2026 to describe a specific organizational failure mode: defaulting to the most powerful, most expensive AI model for every task regardless of whether that capability is required.

It sounds trivial. It is not. There is currently a 4,500x pricing spread between the cheapest and most expensive AI models available. A junior analyst doing basic document summarization who defaults to a frontier reasoning model for every conversation costs an organization orders of magnitude more than the same task assigned to a lightweight model. Multiply that across hundreds or thousands of employees running dozens of AI interactions per day, and the math compounds into genuine P&L exposure.

Tokenmaxxing has an organizational accelerant. Several companies — including a large tech firm that has since reversed course — ran internal AI adoption leaderboards that measured success by token consumption, treating high usage as a proxy for productivity. The result was an incentive structure that rewarded burning tokens rather than generating business value. When the quarterly spend reports arrived, the damage was done.

The problem is not that employees are lazy or irresponsible. The problem is that enterprises deployed powerful AI tools without the governance infrastructure to match task complexity to model capability. No policy. No defaults. No guardrails. No alerts. Just open access to whatever model the interface defaulted to.

Why Agentic AI Makes This Structurally Different

Before agentic AI, the token consumption math was predictable. An employee sends a prompt, the model responds, the interaction costs a roughly calculable number of tokens. Budgets could be estimated from usage projections.

Agentic AI broke that model entirely.

When a developer uses an AI coding assistant on a large repository, they do not initiate one API call. The agent plans the task, retrieves relevant context from the codebase, calls tools to inspect files and run tests, verifies its outputs, retries steps that produce errors, and checks its work against the original intent — all before surfacing a result to the user. A single user-initiated task can generate 5 to 30 model calls, according to a March 2026 Gartner analysis. GitHub's May 2026 research found that agentic coding tasks can consume roughly 1,000 times more tokens than a standard single-turn query.

That multiplier detonates any AI budget built on chat-era assumptions. Most enterprise AI budgets for 2026 were set in the fall of 2025, before agentic coding became the default way engineers worked. Finance teams modeled costs based on individual queries. The actual usage pattern was autonomous agents running dozens of calls in the background while the engineer did something else entirely.

Goldman Sachs projects that token consumption will multiply 24-fold — reaching 120 quadrillion tokens per month — between 2026 and 2030. At enterprise pricing tiers, where leading AI models are billed per million tokens for both inputs and outputs, that trajectory carries serious financial implications for any organization that doesn't build governance infrastructure now.

The FinOps Gap Nobody Planned For

cloud computing spent a decade building the financial operations discipline now known as FinOps — the combination of tooling, practices, and organizational habits that lets enterprises understand, forecast, and optimize cloud spend. AWS Cost Explorer, Azure Cost Management, tagging strategies, reserved instance planning, idle resource alerts: these are the artifacts of hard lessons learned when cloud bills first started arriving.

AI spending is now in the same position cloud was in 2012. The tools that enterprises built to govern cloud costs don't map cleanly onto AI cost structures. AI billing happens at the token level, varies dramatically by model tier, and scales non-linearly with agentic task complexity rather than linearly with compute instance hours. The mental models don't transfer.

Most enterprises have no idea which department is spending the most on AI. They don't know which model tier their employees are defaulting to. They don't know whether the finance team's Claude usage is running on a $3-per-million-token model when a $0.25-per-million-token model would handle their use cases identically. And they find out the bill has exceeded projections at the end of the month, when there is nothing to be done except pay it.

That gap — the absence of visibility, controls, and governance — is what Anthropic announced it was addressing on July 2, 2026.

Anthropic's Response: Claude Enterprise Gets Spend Controls

On July 2, Anthropic shipped a significant update to Claude Enterprise: model-level entitlements, a richer analytics dashboard, and configurable spend-threshold alerts. The release is available now for all Claude Enterprise customers through the admin console.

The timing is deliberate. Anthropic's own enterprise customers have been living through the billing crisis described above. The company has watched organizations burn budgets they didn't know were at risk, and the July 2 release is a direct response to what enterprise admins have been asking for.

This is not a cosmetic update. The three components work together to give IT and finance teams meaningful governance over AI spend for the first time.

Model-Level Entitlements: Matching Capability to Role

The most structurally important addition is model-level entitlements. Administrators can now set which Claude model starts a conversation by default — across chat, Cowork, and Claude Code — and can restrict which models specific user groups can access at all.

The practical implication: an engineering team working on complex architecture problems can be given access to full frontier models. A sales team using Claude for email drafting and CRM summarization can be restricted to a mid-tier model that handles those tasks at a fraction of the cost. An operations team doing basic data extraction can be assigned to a lightweight model. The org chart becomes the policy layer for AI cost governance.

Critically, this integrates with SCIM — the System for Cross-domain Identity Management protocol that enterprises already use to sync users and groups from identity providers like Okta and Azure Active Directory into SaaS tools. Organizations don't need to build a separate access hierarchy for Claude model governance. The IT team's existing group definitions become the model entitlement definitions automatically. For regulated industries where specific AI models must be used for specific data categories — financial services, healthcare, government contracting — this gives compliance teams an enforceable policy mechanism rather than a policy document that employees can bypass.

Spend Alerts: Know Before the Bill Arrives

The second component is configurable spend-threshold alerts. Administrators can set spending limits that trigger notifications before budgets are exhausted — at the organizational level, the group level, or per individual user. An IT leader can configure an alert when a department's AI spend reaches 70% of its monthly allocation, giving finance teams time to investigate and adjust before crossing into overrun territory.

This is the capability that would have prevented the incidents described above. Not after the fact, not in the monthly bill review, but in real time as consumption approaches the limit.

Analytics Dashboard: Visibility That Actually Helps

The upgraded analytics dashboard surfaces cost and usage broken down by group and by individual user, with output metrics displayed alongside their token cost. Admins can see not just what was spent, but what was produced — artifacts created, files edited, skills invoked — which gives some ability to assess whether the spend generated proportionate output.

The dashboard filters use the same SCIM group definitions as the entitlement controls, so the spend view aligns with organizational structure rather than requiring a separate classification exercise.

What Technical Leaders Need to Do Now

Anthropic's release is a useful governance layer for Claude Enterprise deployments, but it doesn't solve the broader problem for organizations running multiple AI tools across multiple vendors. The FinOps discipline for AI has to be built at the organizational level, not vendor by vendor.

For CIOs and VP Engineering: Audit your current AI tool deployments for default model settings. Most tools default to the most capable — and most expensive — model because that's what produces the most impressive demos. Establishing a tiered model policy that matches model capability to task complexity is the single highest-ROI governance action available right now. The 4,500x pricing spread means even a partial shift toward appropriate-tier usage produces significant savings.

For CFOs: Build AI cost forecasting into your quarterly budget process as a first-class item, not a line in the software budget. The variables that drive AI spend — model tier, agentic task complexity, team size, and usage intensity — are different enough from traditional software costs that they require their own planning framework. Require monthly spend reporting with department-level breakdowns, and establish approval thresholds for new AI tool deployments before, not after, the first bill arrives.

For compliance and legal teams: Model-level entitlements matter beyond cost. If your organization handles regulated data — PHI, PII, financial information subject to audit — you need a mechanism to ensure sensitive workloads run only on models that have cleared your internal security review. The Claude Enterprise controls give you that mechanism for Claude deployments; ensure you have equivalent controls for other tools in your stack.

For all leaders: The Uber story is not a cautionary tale about AI adoption — it's a cautionary tale about deploying powerful tools without governance infrastructure. The lesson is not to slow down AI adoption. It's to build the management layer at the same time as the deployment layer, not six months later when the CFO is asking questions.

The Structural Reality

What enterprise leaders are confronting in 2026 is a structural shift, not a temporary anomaly. Agentic AI tasks are inherently more token-intensive than the query-and-response patterns that preceded them. Token consumption will continue to grow as agentic workflows become the default rather than the exception. The cost curves will only become more complex as organizations layer multiple AI tools, models, and agents across their workflows.

The organizations that build FinOps discipline for AI now — visibility, model governance, spend alerts, department-level accountability — will have a structural advantage when the next wave of agentic deployments arrives. The organizations that don't will keep having the conversation Uber's finance team had this spring: staring at a bill they didn't expect, trying to explain to the board why the budget they set in October was gone by April.

The tools exist. The governance frameworks are beginning to mature. The question is whether your organization will build them proactively or reactively.

Based on the 2026 data, 78% of enterprises are choosing reactively. That's still a choice — just not a good one.


Rajesh Beri is the founder of THE D*AI*LY BRIEF, a newsletter focused on Enterprise AI for technical and business leaders. He writes from experience building and deploying AI systems at enterprise scale. Follow on Twitter/X or LinkedIn.

Share:
THE DAILY BRIEF
Enterprise AIAI BudgetsFinOpsTokenmaxxingClaude Enterprise
Uber's AI Budget Was Gone in 4 Months. Yours Could Be Too.

78% of IT leaders got surprise AI bills in 2026. Uber burned its entire budget in 4 months. Here's what tokenmaxxing is — and how to fix it before it hits you.

By Rajesh Beri·July 5, 2026·11 min read

The quote that should be on every CFO's whiteboard right now came from a major enterprise CTO earlier this year: "I'm back to the drawing board, because the budget I thought I would need is blown away already."

This wasn't a startup that underestimated its runway. It was a tech giant with 5,000 engineers. The company had rolled out an AI coding assistant in December 2025. By April 2026 — four months later — the entire annual AI budget was consumed. Not over-budget by 10 or 20 percent. Gone. The kind of gone that triggers emergency finance reviews and board-level conversations about whether AI programs are governable at all.

This is the enterprise AI budget crisis of 2026, and if your organization has deployed agentic AI tools at any meaningful scale, there is a real chance your finance team is about to have the same conversation.

The Numbers Are Worse Than You Think

The incident above was not an isolated outlier. It was the most visible example of a pattern that has been quietly accumulating across enterprise balance sheets.

78% of IT leaders reported unexpected charges from consumption-based AI pricing models in 2026, according to Zylo's SaaS Management Index. A separate Axios investigation identified one unnamed enterprise that spent $500 million in a single month after deploying AI tools without usage caps or monitoring. Microsoft canceled internal AI coding tool licenses across a major division before its June 30 fiscal year close, citing the same dynamics.

The broader picture is equally sobering. A survey published earlier this year found that 79% of organizations overspent on AI, with mature FinOps teams experiencing an average cost overrun of 30.9%. And 90% of CIOs named AI cost forecasting as their top deployment challenge, according to Flexprice research — meaning the difficulty of predicting these costs is now the number one operational concern for technical leaders, ahead of security, talent, and integration complexity.

What changed? The models didn't suddenly get more expensive. The way enterprises use them did.

What Tokenmaxxing Actually Is

The term "tokenmaxxing" has entered the enterprise vocabulary in 2026 to describe a specific organizational failure mode: defaulting to the most powerful, most expensive AI model for every task regardless of whether that capability is required.

It sounds trivial. It is not. There is currently a 4,500x pricing spread between the cheapest and most expensive AI models available. A junior analyst doing basic document summarization who defaults to a frontier reasoning model for every conversation costs an organization orders of magnitude more than the same task assigned to a lightweight model. Multiply that across hundreds or thousands of employees running dozens of AI interactions per day, and the math compounds into genuine P&L exposure.

Tokenmaxxing has an organizational accelerant. Several companies — including a large tech firm that has since reversed course — ran internal AI adoption leaderboards that measured success by token consumption, treating high usage as a proxy for productivity. The result was an incentive structure that rewarded burning tokens rather than generating business value. When the quarterly spend reports arrived, the damage was done.

The problem is not that employees are lazy or irresponsible. The problem is that enterprises deployed powerful AI tools without the governance infrastructure to match task complexity to model capability. No policy. No defaults. No guardrails. No alerts. Just open access to whatever model the interface defaulted to.

Why Agentic AI Makes This Structurally Different

Before agentic AI, the token consumption math was predictable. An employee sends a prompt, the model responds, the interaction costs a roughly calculable number of tokens. Budgets could be estimated from usage projections.

Agentic AI broke that model entirely.

When a developer uses an AI coding assistant on a large repository, they do not initiate one API call. The agent plans the task, retrieves relevant context from the codebase, calls tools to inspect files and run tests, verifies its outputs, retries steps that produce errors, and checks its work against the original intent — all before surfacing a result to the user. A single user-initiated task can generate 5 to 30 model calls, according to a March 2026 Gartner analysis. GitHub's May 2026 research found that agentic coding tasks can consume roughly 1,000 times more tokens than a standard single-turn query.

That multiplier detonates any AI budget built on chat-era assumptions. Most enterprise AI budgets for 2026 were set in the fall of 2025, before agentic coding became the default way engineers worked. Finance teams modeled costs based on individual queries. The actual usage pattern was autonomous agents running dozens of calls in the background while the engineer did something else entirely.

Goldman Sachs projects that token consumption will multiply 24-fold — reaching 120 quadrillion tokens per month — between 2026 and 2030. At enterprise pricing tiers, where leading AI models are billed per million tokens for both inputs and outputs, that trajectory carries serious financial implications for any organization that doesn't build governance infrastructure now.

The FinOps Gap Nobody Planned For

cloud computing spent a decade building the financial operations discipline now known as FinOps — the combination of tooling, practices, and organizational habits that lets enterprises understand, forecast, and optimize cloud spend. AWS Cost Explorer, Azure Cost Management, tagging strategies, reserved instance planning, idle resource alerts: these are the artifacts of hard lessons learned when cloud bills first started arriving.

AI spending is now in the same position cloud was in 2012. The tools that enterprises built to govern cloud costs don't map cleanly onto AI cost structures. AI billing happens at the token level, varies dramatically by model tier, and scales non-linearly with agentic task complexity rather than linearly with compute instance hours. The mental models don't transfer.

Most enterprises have no idea which department is spending the most on AI. They don't know which model tier their employees are defaulting to. They don't know whether the finance team's Claude usage is running on a $3-per-million-token model when a $0.25-per-million-token model would handle their use cases identically. And they find out the bill has exceeded projections at the end of the month, when there is nothing to be done except pay it.

That gap — the absence of visibility, controls, and governance — is what Anthropic announced it was addressing on July 2, 2026.

Anthropic's Response: Claude Enterprise Gets Spend Controls

On July 2, Anthropic shipped a significant update to Claude Enterprise: model-level entitlements, a richer analytics dashboard, and configurable spend-threshold alerts. The release is available now for all Claude Enterprise customers through the admin console.

The timing is deliberate. Anthropic's own enterprise customers have been living through the billing crisis described above. The company has watched organizations burn budgets they didn't know were at risk, and the July 2 release is a direct response to what enterprise admins have been asking for.

This is not a cosmetic update. The three components work together to give IT and finance teams meaningful governance over AI spend for the first time.

Model-Level Entitlements: Matching Capability to Role

The most structurally important addition is model-level entitlements. Administrators can now set which Claude model starts a conversation by default — across chat, Cowork, and Claude Code — and can restrict which models specific user groups can access at all.

The practical implication: an engineering team working on complex architecture problems can be given access to full frontier models. A sales team using Claude for email drafting and CRM summarization can be restricted to a mid-tier model that handles those tasks at a fraction of the cost. An operations team doing basic data extraction can be assigned to a lightweight model. The org chart becomes the policy layer for AI cost governance.

Critically, this integrates with SCIM — the System for Cross-domain Identity Management protocol that enterprises already use to sync users and groups from identity providers like Okta and Azure Active Directory into SaaS tools. Organizations don't need to build a separate access hierarchy for Claude model governance. The IT team's existing group definitions become the model entitlement definitions automatically. For regulated industries where specific AI models must be used for specific data categories — financial services, healthcare, government contracting — this gives compliance teams an enforceable policy mechanism rather than a policy document that employees can bypass.

Spend Alerts: Know Before the Bill Arrives

The second component is configurable spend-threshold alerts. Administrators can set spending limits that trigger notifications before budgets are exhausted — at the organizational level, the group level, or per individual user. An IT leader can configure an alert when a department's AI spend reaches 70% of its monthly allocation, giving finance teams time to investigate and adjust before crossing into overrun territory.

This is the capability that would have prevented the incidents described above. Not after the fact, not in the monthly bill review, but in real time as consumption approaches the limit.

Analytics Dashboard: Visibility That Actually Helps

The upgraded analytics dashboard surfaces cost and usage broken down by group and by individual user, with output metrics displayed alongside their token cost. Admins can see not just what was spent, but what was produced — artifacts created, files edited, skills invoked — which gives some ability to assess whether the spend generated proportionate output.

The dashboard filters use the same SCIM group definitions as the entitlement controls, so the spend view aligns with organizational structure rather than requiring a separate classification exercise.

What Technical Leaders Need to Do Now

Anthropic's release is a useful governance layer for Claude Enterprise deployments, but it doesn't solve the broader problem for organizations running multiple AI tools across multiple vendors. The FinOps discipline for AI has to be built at the organizational level, not vendor by vendor.

For CIOs and VP Engineering: Audit your current AI tool deployments for default model settings. Most tools default to the most capable — and most expensive — model because that's what produces the most impressive demos. Establishing a tiered model policy that matches model capability to task complexity is the single highest-ROI governance action available right now. The 4,500x pricing spread means even a partial shift toward appropriate-tier usage produces significant savings.

For CFOs: Build AI cost forecasting into your quarterly budget process as a first-class item, not a line in the software budget. The variables that drive AI spend — model tier, agentic task complexity, team size, and usage intensity — are different enough from traditional software costs that they require their own planning framework. Require monthly spend reporting with department-level breakdowns, and establish approval thresholds for new AI tool deployments before, not after, the first bill arrives.

For compliance and legal teams: Model-level entitlements matter beyond cost. If your organization handles regulated data — PHI, PII, financial information subject to audit — you need a mechanism to ensure sensitive workloads run only on models that have cleared your internal security review. The Claude Enterprise controls give you that mechanism for Claude deployments; ensure you have equivalent controls for other tools in your stack.

For all leaders: The Uber story is not a cautionary tale about AI adoption — it's a cautionary tale about deploying powerful tools without governance infrastructure. The lesson is not to slow down AI adoption. It's to build the management layer at the same time as the deployment layer, not six months later when the CFO is asking questions.

The Structural Reality

What enterprise leaders are confronting in 2026 is a structural shift, not a temporary anomaly. Agentic AI tasks are inherently more token-intensive than the query-and-response patterns that preceded them. Token consumption will continue to grow as agentic workflows become the default rather than the exception. The cost curves will only become more complex as organizations layer multiple AI tools, models, and agents across their workflows.

The organizations that build FinOps discipline for AI now — visibility, model governance, spend alerts, department-level accountability — will have a structural advantage when the next wave of agentic deployments arrives. The organizations that don't will keep having the conversation Uber's finance team had this spring: staring at a bill they didn't expect, trying to explain to the board why the budget they set in October was gone by April.

The tools exist. The governance frameworks are beginning to mature. The question is whether your organization will build them proactively or reactively.

Based on the 2026 data, 78% of enterprises are choosing reactively. That's still a choice — just not a good one.


Rajesh Beri is the founder of THE D*AI*LY BRIEF, a newsletter focused on Enterprise AI for technical and business leaders. He writes from experience building and deploying AI systems at enterprise scale. Follow on Twitter/X or LinkedIn.

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

beri.net

Subscribe at beri.net/subscribe for twice-weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Subscribe