Managing AI infrastructure just got a lot more complex—and more expensive. As agentic AI workflows proliferate, enterprises are discovering a painful truth: a single user action can trigger hundreds of downstream agent calls, each consuming tokens and driving up costs.
Nutanix is betting that's a $10 billion problem. Today at the company's .NEXT conference, the hybrid cloud platform announced two new products designed to give enterprises and service providers a unified control plane for AI infrastructure—specifically targeting the runaway costs of agentic workflows.
The announcement introduces Service Provider Central (letting neoclouds build multi-tenant GPU clouds) and an AI Gateway (governing which agents access which models and at what cost). Both sit on top of Nutanix Kubernetes Platform Metal, which the company describes as the only platform supporting VMs, virtualized Kubernetes, and bare metal Kubernetes from a single control plane.
But the real story isn't the technology. It's the emergence of a new discipline: AI FinOps—and why CTOs and CFOs are about to have very uncomfortable conversations about their AI bills.
The Token Cost Problem Nobody Saw Coming
Here's the issue: Agentic AI workflows scale exponentially, not linearly.
When a user asks an AI agent to "book a meeting with the sales team next Tuesday," that single prompt can spawn:
- An agent calling a calendar API (100 tokens)
- Another agent checking team availability (200 tokens)
- A third agent cross-referencing timezone conflicts (150 tokens)
- A fourth agent sending calendar invites (50 tokens each × 5 people)
That's 700+ tokens for what feels like one simple task. Multiply that by thousands of employees, hundreds of agents, and millions of daily interactions—and your API bills start to look like AWS bills circa 2015.
"Right now it's very, very easy to get access to a model—it's just an API call," said Dan Ciruli, VP and GM of cloud-native at Nutanix, in an interview with theCUBE. "But they will charge you per token. I think customers will very quickly have to start thinking about, 'Do we call an API where we're going to pay per token? Do we use some infrastructure at a service provider where we're paying for time? Or does it make economic sense to buy hardware and run it on-prem?'"
That's AI FinOps in a nutshell: optimizing the tradeoff between API costs, cloud time-based pricing, and on-premises capex.
Nutanix's Solution: Governance Before the Bills Hit
Nutanix's new AI Gateway is designed to sit between agents and models, enforcing cost and governance policies before tokens get burned.
Key capabilities:
- Agent access control – Which agents can call which models?
- Cost budgeting – Set token limits per agent, per team, per use case
- Model routing – Route low-priority tasks to cheaper models (e.g., GPT-4o-mini vs GPT-4)
- Usage analytics – Track which agents are burning the most tokens
"As agents sprawl, models and tools need to be controlled and governed," explained Anindo Sengupta, VP of product management at Nutanix. "What we've announced is the capability to really drive governance around models and tools using Nutanix's agentic AI."
This is infrastructure that prevents surprises—not just tracks them after the fact.
The Rise of Neoclouds: A New Class of AI Service Providers
The second announcement—Service Provider Central—addresses a different problem: enterprises can't get GPUs fast enough, and hyperscalers aren't always the best answer.
Enter "neoclouds": regional AI cloud providers offering GPU-as-a-service and Kubernetes-as-a-service to enterprises facing long silicon wait times from AWS, Azure, and GCP.
Service Provider Central gives these neoclouds:
- Multi-tenant GPU infrastructure
- AI service catalogs (pre-packaged AI environments)
- Bare metal Kubernetes for latency-sensitive workloads
- Enterprise-grade storage (CN-AOS) integrated
Why does this matter? Because AI workloads don't always belong in hyperscaler clouds:
- Latency: Edge AI and robotics need <10ms response times
- Data residency: EU/healthcare/defense workloads can't leave certain geographies
- Cost: At scale, on-prem or neocloud infrastructure beats per-token API pricing
"I think customers will very quickly have to start thinking about where they run their workloads," Ciruli said. "Absolutely, there'll be AI FinOps to help you optimize that."
What This Means for CTOs and CFOs
For CTOs: Infrastructure decisions just got harder
You now have three choices for running AI agents:
- API-based (OpenAI, Anthropic): Pay per token, zero infrastructure overhead
- Cloud-based (AWS Bedrock, Azure OpenAI): Pay for compute time, more control
- On-prem/Neocloud: Capex investment, full control, electricity costs
The right answer depends on usage patterns, latency requirements, and budget. Nutanix's bet is that enterprises will run all three—and need a single platform to manage them.
For CFOs: AI budgets are about to get messy
If you thought cloud cost optimization was hard, AI FinOps is worse:
- Unpredictable usage – Agent sprawl is exponential
- No benchmarks yet – Industry hasn't standardized cost-per-task metrics
- Vendor lock-in – Switching AI models mid-project is expensive
Expect to see AI cost management tools emerge as a category in the next 12 months. (Watch for Cloudflare, Datadog, and FinOps vendors to build AI-specific cost dashboards.)
The Bigger Picture: AI Infrastructure Becomes Middleware
What Nutanix is really doing here is positioning itself as the middleware layer between models and chips.
You don't buy Nutanix to train models. You buy it to:
- Orchestrate which agents call which models
- Govern who spends how much on tokens
- Optimize workload placement (cloud vs edge vs on-prem)
That's a smart play. The AI stack is consolidating, and the winners won't be the companies selling GPUs or models—they'll be the ones controlling the orchestration layer.
Think of it like Kubernetes for cloud-native apps. Nobody wanted to manually deploy containers. Nutanix is betting enterprises don't want to manually route agent calls and track token budgets either.
Action Items
For CTOs:
- Audit your current AI agent usage (how many agents, which models, token burn rate)
- Build a cost model for API vs cloud vs on-prem AI infrastructure
- Evaluate neocloud providers if hyperscaler GPU wait times exceed 6 months
- Set up AI gateway policies before agent sprawl gets out of control
For CFOs:
- Budget 15-25% buffer for AI cost overruns in 2026 (agents scale faster than expected)
- Demand per-agent cost tracking from IT (not just "cloud spend")
- Watch for surprise bills from API-based AI services (Anthropic, OpenAI, Cohere)
- Evaluate capex vs opex tradeoffs for high-volume AI workloads
For both:
- Hire AI FinOps expertise (or train existing FinOps teams on token economics)
- Build cross-functional AI budgeting (not just IT—include Sales, Marketing, HR)
- Negotiate volume discounts with AI API providers (if you're burning >$100k/month)
Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.
Continue Reading
AI Infrastructure:
- IDC Forecasts $22.5 Trillion in AI Value by 2031 — Global AI spending projections and infrastructure buildout
- AI's $242B Quarter: Why 80% of VC Money Went to One Sector — Capital flows into AI infrastructure and model providers
Sources
- Nutanix expands agentic AI infrastructure for neoclouds — SiliconANGLE, April 10, 2026
- Nutanix Extends its Agentic AI tool to boost Neo Clouds — INTLBM, April 10, 2026
- Nutanix to extend Nutanix Agentic AI, empowering Neoclouds — Zawya, April 9, 2026
- .NEXT 2026: Nutanix Delivers Complete Platform for the Agentic AI Era — StorageNewsletter, April 9, 2026