Uber Burned AI Budget in 4 Months: $500-2K Per Engineer

Uber blew its 2026 AI budget in 4 months. Engineers cost $500-2K/month. Token billing exposes a cost crisis CFOs aren't prepared for.

By Rajesh Beri·May 6, 2026·9 min read
Share:

THE DAILY BRIEF

AI Cost ManagementEnterprise AIFinOpsAI Coding Tools

Uber Burned AI Budget in 4 Months: $500-2K Per Engineer

Uber blew its 2026 AI budget in 4 months. Engineers cost $500-2K/month. Token billing exposes a cost crisis CFOs aren't prepared for.

By Rajesh Beri·May 6, 2026·9 min read

Uber CTO Praveen Neppalli Naga confirmed to The Information that the company burned through its entire 2026 AI budget in just four months. The culprit wasn't a failed pilot or overpriced vendor contract. It was Claude Code adoption spreading across 5,000 engineers faster than any budget model anticipated, with individual engineer costs ranging from $500 to $2,000 per month. For CFOs and CTOs managing AI investments, this isn't a cautionary tale about one company's overspend. It's a preview of what token-based billing does to enterprise cost structures when you deploy AI tools that actually work.

Uber rolled out Claude Code access to its full engineering organization in December 2025. By February, adoption jumped from 32% to 63% of engineers using it monthly. By March, 84% were classified as agentic coding users. The tool didn't fail or underdeliver. Engineers loved it, used it constantly, and put it to work on exactly the kinds of tasks it was designed for: parallel agent execution, large-scale codebase refactoring, automated testing, backend code generation.

About 70% of committed code at Uber now comes from AI, and roughly 11% of live backend updates are written by AI agents without any human in the loop. From a product and productivity standpoint, the rollout was a success. From a finance standpoint, the budget was incinerated. Naga was direct: "I'm back to the drawing board because the budget I thought I would need is blown away already."

Token Consumption Is Nothing Like Seat Licenses

The cost mechanics are what make this story different from a simple budget overrun. Claude Code is not priced on a per-seat basis the way a traditional enterprise software license works. It runs on token consumption, meaning the invoice is a function of how many tokens the model processes across all engineer sessions.

A developer running a single autocomplete suggestion at the end of a function consumes a negligible token budget. A developer running Claude Code as an autonomous agent across a monorepo, instructing it to refactor an API layer and generate the associated tests in parallel, can consume thousands of dollars worth of tokens in a single afternoon session.

Scale that across 5,000 engineers, many of them running multiple agent loops simultaneously, and the math compounds in ways that no annual software budget cycle was built to absorb. Reported individual engineer costs ranged from $500 to $2,000 per month. Naga himself spent $1,200 in two hours during a personal demo session. Those are not edge cases. They are the natural consequence of using agentic AI tools the way they are meant to be used.

The comparison to cloud cost is the most useful frame for understanding where this is going. In 2010, enterprise software teams started provisioning AWS compute with the same mental model they had used for on-premise servers: a capital expenditure, planned in advance, predictable. AWS bills arrived and were triple what finance had modeled.

That pattern repeated in every organization that adopted cloud at scale. A decade of FinOps tooling, reserved instance strategies, tagging frameworks, and cost anomaly alerts was built to correct for that initial miscalibration.

The FinOps Playbook for AI Doesn't Exist Yet

The AI coding cost problem is structurally similar to early cloud adoption, but the tooling to manage it doesn't exist at scale yet. A usage-based pricing model has been placed in front of a highly motivated user base with a direct incentive to consume as much of it as possible. The tooling to monitor, cap, and allocate that spend at the individual team or engineer level does not yet exist at the maturity of cloud cost tooling.

Uber is not unusual for having this problem. It's the first large company to surface it publicly at this level of specificity, which makes Naga's disclosure more valuable than it might appear. The FinOps Foundation released guidance in February 2026 recommending that enterprises with AI spend exceeding $500,000 per year establish formal AI FinOps frameworks, including token budget allocation, model cost chargebacks by business unit, and inference optimization teams. Most organizations haven't built that infrastructure yet.

GitHub Copilot is moving to token-based billing on June 1, 2026, replacing its Premium Request Units model. For regulated enterprises, this isn't just a pricing change — it's a governance, observability, and cost control challenge. Every major AI coding assistant is following the same path. The shift from predictable per-seat pricing to consumption-based billing is now industry-wide.

Leaderboards Incentivized Consumption, Not Cost Control

The internal leaderboard detail in Uber's rollout is worth examining separately. The company tracked and ranked engineer usage of Claude Code on internal performance visibility dashboards. That's a management choice designed to drive adoption, and it worked precisely as intended.

It also created a cultural dynamic where using more AI tooling was visibly rewarded, and using less was implicitly underperforming. In a token-based billing environment, that incentive structure directly translates into budget acceleration. Engineers competing on leaderboards for AI usage have no obvious reason to be conservative about consumption.

The people who designed the leaderboard were almost certainly not the same people responsible for the AI services budget line. That organizational gap, between the teams driving adoption and the teams managing spend, is the root cause of the overrun more than any pricing quirk of Claude Code itself.

For CTOs and VPs of Engineering implementing AI coding tools, this dynamic is critical to understand. Performance visibility systems that reward adoption without corresponding cost accountability will accelerate consumption faster than finance teams can adapt budgets. The leaderboard worked. The budget didn't survive it.

Three Paths Forward for Enterprise Finance Teams

The responses large enterprises will take from this situation are already visible in Naga's comments and industry moves.

Path 1: Multi-vendor strategy with competitive pricing leverage. Naga mentioned that Uber intends to give engineers access to OpenAI's Codex in the future, suggesting a multi-vendor approach rather than single-provider lock-in. That choice is likely motivated partly by competitive pricing leverage and partly by risk diversification. Companies watching Uber's experience will accelerate conversations with Anthropic, OpenAI, and other major providers about enterprise framework agreements that replace token-based billing with committed spend deals at negotiated rates.

Microsoft has already done this for Copilot: a flat per-seat model that limits the upside for the vendor but gives enterprise finance teams the predictability they need to budget reliably. As Claude Code usage scales across the industry, Anthropic will face increasing pressure from enterprise procurement teams to offer similar structures, or watch finance departments cap usage and throttle the adoption that's driving their revenue growth.

Path 2: Committed spend agreements to cap variable costs. Enterprise framework agreements with negotiated token rates and committed annual spend caps give finance teams the budget predictability required for annual planning cycles. Instead of paying per-token at published rates, enterprises negotiate bulk discounts in exchange for minimum commitments. This mirrors how cloud vendors moved from pure pay-as-you-go to reserved instances and savings plans.

The trade-off is reduced flexibility. If your engineers don't consume the committed spend, you've overpaid. If they exceed it, you're back to variable pricing or renegotiating mid-year. But for organizations with 2,000+ engineers where AI coding tools are becoming standard, committed spend agreements are becoming the norm.

Path 3: Build internal coding agents on open-weight models. The third option, building internal coding agents on top of open-weight models, is gaining credibility precisely because of stories like this one. A company running Qwen 3.6-27B locally on dedicated GPU hardware has predictable per-query costs that are a function of hardware depreciation and electricity, not per-token billing.

The setup cost is higher. The ongoing cost is bounded. For organizations with 5,000+ engineers generating multi-hundred-dollar monthly AI bills per head, the build-versus-buy calculation is no longer theoretical. Uber burned its budget in four months. The next company to do the same will have more options for what happens in month five.

Running inference locally means capital expenditure on GPU infrastructure, ongoing operational costs for power and cooling, and dedicated ML platform teams to maintain model serving infrastructure. But the total cost of ownership becomes predictable in a way that consumption-based API billing never will be. For enterprises already running large-scale ML infrastructure, adding coding assistance inference to existing GPU clusters is an incremental cost, not a greenfield build.

What CFOs and CTOs Should Do This Quarter

If your organization is deploying AI coding tools or planning to, here's what the Uber experience means for your budget and governance:

For CFOs: Treat AI tool spend like cloud spend, not SaaS licenses. Build quarterly reviews, not annual budget allocations. If your AI spend is growing faster than 30% quarter-over-quarter and you don't have real-time cost visibility at the team or product level, you're flying blind. Implement cost allocation tags, chargeback models, and usage caps before adoption scales beyond your budget's ability to absorb surprises.

Ask your CTO: What's our current monthly AI services spend? What was it last quarter? What's the projected run rate if adoption doubles? If they can't answer those three questions with specific numbers, your organization doesn't have AI cost governance yet.

For CTOs: Don't let adoption incentives run ahead of cost accountability. Leaderboards, performance metrics, and adoption KPIs should be paired with cost visibility and individual or team-level budget accountability. If engineers can see their usage rank but not their cost impact, you're optimizing for consumption, not value.

Evaluate committed spend agreements with your top AI vendors now, before you blow through your annual budget mid-year. Negotiate rate cards, volume discounts, and overage protections while you still have leverage. If you're at 1,000+ engineers and adoption is trending above 50%, you have enough scale to negotiate better terms than public API pricing.

For both: Consider a hybrid strategy. Run high-volume, low-complexity inference workloads (autocomplete, linting, documentation) on self-hosted open-weight models. Reserve expensive API-based agentic tools for complex, high-value tasks where the ROI justifies the cost. That split reduces your variable cost exposure while maintaining access to frontier capabilities when you need them.

Uber burned its budget in four months. The story isn't about overspending. It's about a cost model that scales faster than traditional budget cycles can adapt to, and the organizational gaps that emerge when adoption incentives aren't paired with cost accountability.


Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.

Continue Reading

AI Cost Management & Enterprise Strategy:


Know someone managing AI budgets?

Forward this to a CFO or CTO navigating AI cost management. They can subscribe at beri.net/#newsletter — it's free, twice a week, and I read every reply.

If you were forwarded this, click here to subscribe.

— Rajesh

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Uber Burned AI Budget in 4 Months: $500-2K Per Engineer

Photo by Markus Spiske on Pexels

Uber CTO Praveen Neppalli Naga confirmed to The Information that the company burned through its entire 2026 AI budget in just four months. The culprit wasn't a failed pilot or overpriced vendor contract. It was Claude Code adoption spreading across 5,000 engineers faster than any budget model anticipated, with individual engineer costs ranging from $500 to $2,000 per month. For CFOs and CTOs managing AI investments, this isn't a cautionary tale about one company's overspend. It's a preview of what token-based billing does to enterprise cost structures when you deploy AI tools that actually work.

Uber rolled out Claude Code access to its full engineering organization in December 2025. By February, adoption jumped from 32% to 63% of engineers using it monthly. By March, 84% were classified as agentic coding users. The tool didn't fail or underdeliver. Engineers loved it, used it constantly, and put it to work on exactly the kinds of tasks it was designed for: parallel agent execution, large-scale codebase refactoring, automated testing, backend code generation.

About 70% of committed code at Uber now comes from AI, and roughly 11% of live backend updates are written by AI agents without any human in the loop. From a product and productivity standpoint, the rollout was a success. From a finance standpoint, the budget was incinerated. Naga was direct: "I'm back to the drawing board because the budget I thought I would need is blown away already."

Token Consumption Is Nothing Like Seat Licenses

The cost mechanics are what make this story different from a simple budget overrun. Claude Code is not priced on a per-seat basis the way a traditional enterprise software license works. It runs on token consumption, meaning the invoice is a function of how many tokens the model processes across all engineer sessions.

A developer running a single autocomplete suggestion at the end of a function consumes a negligible token budget. A developer running Claude Code as an autonomous agent across a monorepo, instructing it to refactor an API layer and generate the associated tests in parallel, can consume thousands of dollars worth of tokens in a single afternoon session.

Scale that across 5,000 engineers, many of them running multiple agent loops simultaneously, and the math compounds in ways that no annual software budget cycle was built to absorb. Reported individual engineer costs ranged from $500 to $2,000 per month. Naga himself spent $1,200 in two hours during a personal demo session. Those are not edge cases. They are the natural consequence of using agentic AI tools the way they are meant to be used.

The comparison to cloud cost is the most useful frame for understanding where this is going. In 2010, enterprise software teams started provisioning AWS compute with the same mental model they had used for on-premise servers: a capital expenditure, planned in advance, predictable. AWS bills arrived and were triple what finance had modeled.

That pattern repeated in every organization that adopted cloud at scale. A decade of FinOps tooling, reserved instance strategies, tagging frameworks, and cost anomaly alerts was built to correct for that initial miscalibration.

The FinOps Playbook for AI Doesn't Exist Yet

The AI coding cost problem is structurally similar to early cloud adoption, but the tooling to manage it doesn't exist at scale yet. A usage-based pricing model has been placed in front of a highly motivated user base with a direct incentive to consume as much of it as possible. The tooling to monitor, cap, and allocate that spend at the individual team or engineer level does not yet exist at the maturity of cloud cost tooling.

Uber is not unusual for having this problem. It's the first large company to surface it publicly at this level of specificity, which makes Naga's disclosure more valuable than it might appear. The FinOps Foundation released guidance in February 2026 recommending that enterprises with AI spend exceeding $500,000 per year establish formal AI FinOps frameworks, including token budget allocation, model cost chargebacks by business unit, and inference optimization teams. Most organizations haven't built that infrastructure yet.

GitHub Copilot is moving to token-based billing on June 1, 2026, replacing its Premium Request Units model. For regulated enterprises, this isn't just a pricing change — it's a governance, observability, and cost control challenge. Every major AI coding assistant is following the same path. The shift from predictable per-seat pricing to consumption-based billing is now industry-wide.

Leaderboards Incentivized Consumption, Not Cost Control

The internal leaderboard detail in Uber's rollout is worth examining separately. The company tracked and ranked engineer usage of Claude Code on internal performance visibility dashboards. That's a management choice designed to drive adoption, and it worked precisely as intended.

It also created a cultural dynamic where using more AI tooling was visibly rewarded, and using less was implicitly underperforming. In a token-based billing environment, that incentive structure directly translates into budget acceleration. Engineers competing on leaderboards for AI usage have no obvious reason to be conservative about consumption.

The people who designed the leaderboard were almost certainly not the same people responsible for the AI services budget line. That organizational gap, between the teams driving adoption and the teams managing spend, is the root cause of the overrun more than any pricing quirk of Claude Code itself.

For CTOs and VPs of Engineering implementing AI coding tools, this dynamic is critical to understand. Performance visibility systems that reward adoption without corresponding cost accountability will accelerate consumption faster than finance teams can adapt budgets. The leaderboard worked. The budget didn't survive it.

Three Paths Forward for Enterprise Finance Teams

The responses large enterprises will take from this situation are already visible in Naga's comments and industry moves.

Path 1: Multi-vendor strategy with competitive pricing leverage. Naga mentioned that Uber intends to give engineers access to OpenAI's Codex in the future, suggesting a multi-vendor approach rather than single-provider lock-in. That choice is likely motivated partly by competitive pricing leverage and partly by risk diversification. Companies watching Uber's experience will accelerate conversations with Anthropic, OpenAI, and other major providers about enterprise framework agreements that replace token-based billing with committed spend deals at negotiated rates.

Microsoft has already done this for Copilot: a flat per-seat model that limits the upside for the vendor but gives enterprise finance teams the predictability they need to budget reliably. As Claude Code usage scales across the industry, Anthropic will face increasing pressure from enterprise procurement teams to offer similar structures, or watch finance departments cap usage and throttle the adoption that's driving their revenue growth.

Path 2: Committed spend agreements to cap variable costs. Enterprise framework agreements with negotiated token rates and committed annual spend caps give finance teams the budget predictability required for annual planning cycles. Instead of paying per-token at published rates, enterprises negotiate bulk discounts in exchange for minimum commitments. This mirrors how cloud vendors moved from pure pay-as-you-go to reserved instances and savings plans.

The trade-off is reduced flexibility. If your engineers don't consume the committed spend, you've overpaid. If they exceed it, you're back to variable pricing or renegotiating mid-year. But for organizations with 2,000+ engineers where AI coding tools are becoming standard, committed spend agreements are becoming the norm.

Path 3: Build internal coding agents on open-weight models. The third option, building internal coding agents on top of open-weight models, is gaining credibility precisely because of stories like this one. A company running Qwen 3.6-27B locally on dedicated GPU hardware has predictable per-query costs that are a function of hardware depreciation and electricity, not per-token billing.

The setup cost is higher. The ongoing cost is bounded. For organizations with 5,000+ engineers generating multi-hundred-dollar monthly AI bills per head, the build-versus-buy calculation is no longer theoretical. Uber burned its budget in four months. The next company to do the same will have more options for what happens in month five.

Running inference locally means capital expenditure on GPU infrastructure, ongoing operational costs for power and cooling, and dedicated ML platform teams to maintain model serving infrastructure. But the total cost of ownership becomes predictable in a way that consumption-based API billing never will be. For enterprises already running large-scale ML infrastructure, adding coding assistance inference to existing GPU clusters is an incremental cost, not a greenfield build.

What CFOs and CTOs Should Do This Quarter

If your organization is deploying AI coding tools or planning to, here's what the Uber experience means for your budget and governance:

For CFOs: Treat AI tool spend like cloud spend, not SaaS licenses. Build quarterly reviews, not annual budget allocations. If your AI spend is growing faster than 30% quarter-over-quarter and you don't have real-time cost visibility at the team or product level, you're flying blind. Implement cost allocation tags, chargeback models, and usage caps before adoption scales beyond your budget's ability to absorb surprises.

Ask your CTO: What's our current monthly AI services spend? What was it last quarter? What's the projected run rate if adoption doubles? If they can't answer those three questions with specific numbers, your organization doesn't have AI cost governance yet.

For CTOs: Don't let adoption incentives run ahead of cost accountability. Leaderboards, performance metrics, and adoption KPIs should be paired with cost visibility and individual or team-level budget accountability. If engineers can see their usage rank but not their cost impact, you're optimizing for consumption, not value.

Evaluate committed spend agreements with your top AI vendors now, before you blow through your annual budget mid-year. Negotiate rate cards, volume discounts, and overage protections while you still have leverage. If you're at 1,000+ engineers and adoption is trending above 50%, you have enough scale to negotiate better terms than public API pricing.

For both: Consider a hybrid strategy. Run high-volume, low-complexity inference workloads (autocomplete, linting, documentation) on self-hosted open-weight models. Reserve expensive API-based agentic tools for complex, high-value tasks where the ROI justifies the cost. That split reduces your variable cost exposure while maintaining access to frontier capabilities when you need them.

Uber burned its budget in four months. The story isn't about overspending. It's about a cost model that scales faster than traditional budget cycles can adapt to, and the organizational gaps that emerge when adoption incentives aren't paired with cost accountability.


Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.

Continue Reading

AI Cost Management & Enterprise Strategy:


Know someone managing AI budgets?

Forward this to a CFO or CTO navigating AI cost management. They can subscribe at beri.net/#newsletter — it's free, twice a week, and I read every reply.

If you were forwarded this, click here to subscribe.

— Rajesh

Share:

THE DAILY BRIEF

AI Cost ManagementEnterprise AIFinOpsAI Coding Tools

Uber Burned AI Budget in 4 Months: $500-2K Per Engineer

Uber blew its 2026 AI budget in 4 months. Engineers cost $500-2K/month. Token billing exposes a cost crisis CFOs aren't prepared for.

By Rajesh Beri·May 6, 2026·9 min read

Uber CTO Praveen Neppalli Naga confirmed to The Information that the company burned through its entire 2026 AI budget in just four months. The culprit wasn't a failed pilot or overpriced vendor contract. It was Claude Code adoption spreading across 5,000 engineers faster than any budget model anticipated, with individual engineer costs ranging from $500 to $2,000 per month. For CFOs and CTOs managing AI investments, this isn't a cautionary tale about one company's overspend. It's a preview of what token-based billing does to enterprise cost structures when you deploy AI tools that actually work.

Uber rolled out Claude Code access to its full engineering organization in December 2025. By February, adoption jumped from 32% to 63% of engineers using it monthly. By March, 84% were classified as agentic coding users. The tool didn't fail or underdeliver. Engineers loved it, used it constantly, and put it to work on exactly the kinds of tasks it was designed for: parallel agent execution, large-scale codebase refactoring, automated testing, backend code generation.

About 70% of committed code at Uber now comes from AI, and roughly 11% of live backend updates are written by AI agents without any human in the loop. From a product and productivity standpoint, the rollout was a success. From a finance standpoint, the budget was incinerated. Naga was direct: "I'm back to the drawing board because the budget I thought I would need is blown away already."

Token Consumption Is Nothing Like Seat Licenses

The cost mechanics are what make this story different from a simple budget overrun. Claude Code is not priced on a per-seat basis the way a traditional enterprise software license works. It runs on token consumption, meaning the invoice is a function of how many tokens the model processes across all engineer sessions.

A developer running a single autocomplete suggestion at the end of a function consumes a negligible token budget. A developer running Claude Code as an autonomous agent across a monorepo, instructing it to refactor an API layer and generate the associated tests in parallel, can consume thousands of dollars worth of tokens in a single afternoon session.

Scale that across 5,000 engineers, many of them running multiple agent loops simultaneously, and the math compounds in ways that no annual software budget cycle was built to absorb. Reported individual engineer costs ranged from $500 to $2,000 per month. Naga himself spent $1,200 in two hours during a personal demo session. Those are not edge cases. They are the natural consequence of using agentic AI tools the way they are meant to be used.

The comparison to cloud cost is the most useful frame for understanding where this is going. In 2010, enterprise software teams started provisioning AWS compute with the same mental model they had used for on-premise servers: a capital expenditure, planned in advance, predictable. AWS bills arrived and were triple what finance had modeled.

That pattern repeated in every organization that adopted cloud at scale. A decade of FinOps tooling, reserved instance strategies, tagging frameworks, and cost anomaly alerts was built to correct for that initial miscalibration.

The FinOps Playbook for AI Doesn't Exist Yet

The AI coding cost problem is structurally similar to early cloud adoption, but the tooling to manage it doesn't exist at scale yet. A usage-based pricing model has been placed in front of a highly motivated user base with a direct incentive to consume as much of it as possible. The tooling to monitor, cap, and allocate that spend at the individual team or engineer level does not yet exist at the maturity of cloud cost tooling.

Uber is not unusual for having this problem. It's the first large company to surface it publicly at this level of specificity, which makes Naga's disclosure more valuable than it might appear. The FinOps Foundation released guidance in February 2026 recommending that enterprises with AI spend exceeding $500,000 per year establish formal AI FinOps frameworks, including token budget allocation, model cost chargebacks by business unit, and inference optimization teams. Most organizations haven't built that infrastructure yet.

GitHub Copilot is moving to token-based billing on June 1, 2026, replacing its Premium Request Units model. For regulated enterprises, this isn't just a pricing change — it's a governance, observability, and cost control challenge. Every major AI coding assistant is following the same path. The shift from predictable per-seat pricing to consumption-based billing is now industry-wide.

Leaderboards Incentivized Consumption, Not Cost Control

The internal leaderboard detail in Uber's rollout is worth examining separately. The company tracked and ranked engineer usage of Claude Code on internal performance visibility dashboards. That's a management choice designed to drive adoption, and it worked precisely as intended.

It also created a cultural dynamic where using more AI tooling was visibly rewarded, and using less was implicitly underperforming. In a token-based billing environment, that incentive structure directly translates into budget acceleration. Engineers competing on leaderboards for AI usage have no obvious reason to be conservative about consumption.

The people who designed the leaderboard were almost certainly not the same people responsible for the AI services budget line. That organizational gap, between the teams driving adoption and the teams managing spend, is the root cause of the overrun more than any pricing quirk of Claude Code itself.

For CTOs and VPs of Engineering implementing AI coding tools, this dynamic is critical to understand. Performance visibility systems that reward adoption without corresponding cost accountability will accelerate consumption faster than finance teams can adapt budgets. The leaderboard worked. The budget didn't survive it.

Three Paths Forward for Enterprise Finance Teams

The responses large enterprises will take from this situation are already visible in Naga's comments and industry moves.

Path 1: Multi-vendor strategy with competitive pricing leverage. Naga mentioned that Uber intends to give engineers access to OpenAI's Codex in the future, suggesting a multi-vendor approach rather than single-provider lock-in. That choice is likely motivated partly by competitive pricing leverage and partly by risk diversification. Companies watching Uber's experience will accelerate conversations with Anthropic, OpenAI, and other major providers about enterprise framework agreements that replace token-based billing with committed spend deals at negotiated rates.

Microsoft has already done this for Copilot: a flat per-seat model that limits the upside for the vendor but gives enterprise finance teams the predictability they need to budget reliably. As Claude Code usage scales across the industry, Anthropic will face increasing pressure from enterprise procurement teams to offer similar structures, or watch finance departments cap usage and throttle the adoption that's driving their revenue growth.

Path 2: Committed spend agreements to cap variable costs. Enterprise framework agreements with negotiated token rates and committed annual spend caps give finance teams the budget predictability required for annual planning cycles. Instead of paying per-token at published rates, enterprises negotiate bulk discounts in exchange for minimum commitments. This mirrors how cloud vendors moved from pure pay-as-you-go to reserved instances and savings plans.

The trade-off is reduced flexibility. If your engineers don't consume the committed spend, you've overpaid. If they exceed it, you're back to variable pricing or renegotiating mid-year. But for organizations with 2,000+ engineers where AI coding tools are becoming standard, committed spend agreements are becoming the norm.

Path 3: Build internal coding agents on open-weight models. The third option, building internal coding agents on top of open-weight models, is gaining credibility precisely because of stories like this one. A company running Qwen 3.6-27B locally on dedicated GPU hardware has predictable per-query costs that are a function of hardware depreciation and electricity, not per-token billing.

The setup cost is higher. The ongoing cost is bounded. For organizations with 5,000+ engineers generating multi-hundred-dollar monthly AI bills per head, the build-versus-buy calculation is no longer theoretical. Uber burned its budget in four months. The next company to do the same will have more options for what happens in month five.

Running inference locally means capital expenditure on GPU infrastructure, ongoing operational costs for power and cooling, and dedicated ML platform teams to maintain model serving infrastructure. But the total cost of ownership becomes predictable in a way that consumption-based API billing never will be. For enterprises already running large-scale ML infrastructure, adding coding assistance inference to existing GPU clusters is an incremental cost, not a greenfield build.

What CFOs and CTOs Should Do This Quarter

If your organization is deploying AI coding tools or planning to, here's what the Uber experience means for your budget and governance:

For CFOs: Treat AI tool spend like cloud spend, not SaaS licenses. Build quarterly reviews, not annual budget allocations. If your AI spend is growing faster than 30% quarter-over-quarter and you don't have real-time cost visibility at the team or product level, you're flying blind. Implement cost allocation tags, chargeback models, and usage caps before adoption scales beyond your budget's ability to absorb surprises.

Ask your CTO: What's our current monthly AI services spend? What was it last quarter? What's the projected run rate if adoption doubles? If they can't answer those three questions with specific numbers, your organization doesn't have AI cost governance yet.

For CTOs: Don't let adoption incentives run ahead of cost accountability. Leaderboards, performance metrics, and adoption KPIs should be paired with cost visibility and individual or team-level budget accountability. If engineers can see their usage rank but not their cost impact, you're optimizing for consumption, not value.

Evaluate committed spend agreements with your top AI vendors now, before you blow through your annual budget mid-year. Negotiate rate cards, volume discounts, and overage protections while you still have leverage. If you're at 1,000+ engineers and adoption is trending above 50%, you have enough scale to negotiate better terms than public API pricing.

For both: Consider a hybrid strategy. Run high-volume, low-complexity inference workloads (autocomplete, linting, documentation) on self-hosted open-weight models. Reserve expensive API-based agentic tools for complex, high-value tasks where the ROI justifies the cost. That split reduces your variable cost exposure while maintaining access to frontier capabilities when you need them.

Uber burned its budget in four months. The story isn't about overspending. It's about a cost model that scales faster than traditional budget cycles can adapt to, and the organizational gaps that emerge when adoption incentives aren't paired with cost accountability.


Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.

Continue Reading

AI Cost Management & Enterprise Strategy:


Know someone managing AI budgets?

Forward this to a CFO or CTO navigating AI cost management. They can subscribe at beri.net/#newsletter — it's free, twice a week, and I read every reply.

If you were forwarded this, click here to subscribe.

— Rajesh

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Subscribe