Gimlet Labs Raises $80M to Solve AI's Biggest Waste Problem
Your AI infrastructure is wasting hundreds of billions of dollars. Not because you bought the wrong chips, but because you're using them wrong.
Gimlet Labs just raised $80 million in Series A funding (total $92M) to fix this with what they call the industry's first "multi-silicon inference cloud." Instead of forcing every AI workload to run on a single type of hardware, their software slices workloads across CPUs, GPUs, and specialized chips simultaneously — delivering 3-10x faster inference at the same cost and power.
The company publicly launched just five months ago with eight-figure revenues. It's already tripled its customer base and now serves one of the top three frontier labs and one of the top three hyperscalers (both unnamed). Menlo Ventures led the Series A, with participation from Factory (who led the seed), Eclipse, Prosperity7, and Triatomic.
For enterprise AI leaders, this matters for two reasons: your existing hardware is sitting 70-85% idle, and your 2026 AI datacenter budget just got a lot more defensible.
The Problem: Homogeneous Hardware Hit a Wall
The industry is gearing up to spend $650 billion on AI datacenter CapEx this year, according to Gimlet's press release. McKinsey estimates total data center spending will reach $7 trillion by 2030 if current trends continue.
But here's the problem: most AI infrastructure uses existing hardware "somewhere between 15 to 30 percent" of the time, says Gimlet CEO and co-founder Zain Asgar, a Stanford adjunct professor and successfully exited founder (Pixie, acquired by New Relic in 2020).
"Another way to think about this: you're wasting hundreds of billions of dollars because you're just leaving idle resources," Asgar told TechCrunch. "Our goal was basically to try to figure out how you can get AI workloads to be 10x more efficient than ever, today."
The root cause isn't lazy engineers. It's architecture. Agentic AI workloads chain together multiple steps, and each step has different compute requirements. As Menlo's Tim Tully writes in a blog post about the funding: "Inference is compute-bound; decode is memory-bound; and tool calls are network-bound."
No single chip handles all three efficiently. GPUs excel at compute-heavy inference but waste energy on memory-bound decode operations. CPUs handle tool calls better but underperform on inference. SRAM-based architectures optimize for specific workloads but can't run general models.
Traditional approaches force everything onto the same hardware anyway. The result: 70-85% of your expensive AI chips sit idle while waiting for bottlenecks to clear.
Photo by Adi Goldstein on Unsplash
The Solution: Orchestrate Workloads Across Diverse Hardware
Gimlet Labs' multi-silicon inference cloud solves this by treating diverse hardware as a single orchestrated resource. Their proprietary software stack automatically maps agentic workloads to the most suitable chips without developer burden.
The platform can even slice a single model across different architectures, using the most optimal chip for each portion of the model. This isn't theoretical: the company claims it delivers 3-10x faster inference speed for the same cost and power, plus "an order of magnitude better performance per watt."
How it works in practice: Imagine you're running an AI coding agent (like Cursor or GitHub Copilot). The agent takes a user prompt (inference step, compute-heavy), generates code (decode step, memory-heavy), and calls external tools to test the code (network-bound).
Gimlet's software automatically routes the inference step to a GPU cluster optimized for transformer models, moves the decode operation to high-memory CPUs or SRAM architectures, and handles tool calls on network-optimized processors. All of this happens transparently to the developer — no code changes required.
The platform works as software you deploy on your own hardware or as an API to Gimlet's managed cloud. Either way, it integrates with the major chip vendors: NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix are all partners.
Why This Matters for Enterprise AI Budgets
If you're managing an enterprise AI infrastructure budget, the math here is worth attention. Let's break down the implications.
The Multi-Silicon ROI Case
Baseline scenario: 500 GPU instances running 24/7 at $3/hour each = $1.08M/month ($12.96M/year).
Current utilization: 15-30% (industry average per Gimlet).
With multi-silicon orchestration:
- Same workload spread across GPUs (inference), CPUs (decode), and specialized chips (tool calls)
- 3-10x speedup → reduce instance count by 50-70% while maintaining throughput
- New monthly cost: $320K-540K (using cheaper CPU/specialized instances for 50-70% of workload)
- Annual savings: $6.5-9.1M
Additional benefits:
- No vendor lock-in (platform-agnostic across NVIDIA, AMD, Intel, ARM, Cerebras, d-Matrix)
- Reuse aging GPUs for decode/memory-bound tasks (extend depreciation schedule)
- Reduce power consumption (order of magnitude per Gimlet) → lower cooling costs
The broader strategic implication: you can defend your AI budget expansion by showing concrete efficiency gains. Instead of asking for 50% more GPUs next quarter, you can show how multi-silicon orchestration delivers 3-10x better throughput with the same hardware.
For CFOs evaluating AI infrastructure spend, this changes the conversation from "how do we afford more GPUs?" to "how do we optimize what we already have?"
Who This Is For (And Who It's Not)
Gimlet's product isn't for every AI developer. It's specifically designed for the largest AI model labs and data centers — organizations running inference at scale where inefficiency costs millions per month.
Current customers include one of the top three frontier labs (OpenAI, Anthropic, or Google DeepMind) and one of the top three hyperscalers (AWS, Azure, or Google Cloud), though Gimlet won't name them publicly.
The company launched in October 2025 with eight-figure revenues (at least $10 million). CEO Asgar says the customer base has more than doubled in the last four months, which suggests at least $20M+ annual run rate.
You're a good fit if you:
- Run thousands of GPU instances for AI inference
- Operate your own data centers or private cloud
- Have platform engineering teams that can integrate orchestration software
- Spend $500K+/month on AI infrastructure
- Need to optimize cost per token at scale
You're not a good fit if you:
- Use managed AI APIs (OpenAI, Anthropic Claude, etc.) without your own infrastructure
- Run small-scale inference workloads (<100 instances)
- Lack platform engineering resources to integrate new orchestration layers
This is infrastructure-level plumbing, not an off-the-shelf SaaS product. Gimlet delivers it either as software you deploy on your hardware stack or through an API to their managed Gimlet Cloud. Either way, expect implementation timelines of weeks, not hours.
Strategic Implications for AI Infrastructure Planning
This funding round and product launch signal three broader trends worth tracking.
Trend 1: Heterogeneous hardware becomes the default. The one-size-fits-all GPU approach worked for training-focused infrastructure. But as inference workloads dominate (quadrillions of tokens per month per Gimlet), specialized hardware optimized for specific tasks will outperform general-purpose chips. Platform teams should plan for multi-vendor hardware strategies, not single-vendor lock-in.
Trend 2: Software orchestration becomes competitive advantage. Just buying more GPUs won't win. The companies that extract maximum value from diverse hardware through intelligent orchestration will have lower costs, faster inference, and better margins. This shifts competitive advantage from hardware procurement to software integration.
Trend 3: Efficiency metrics matter more than raw compute. As datacenter capacity hits bottlenecks (power, cooling, space), performance per watt becomes as important as raw FLOPS. Gimlet's "order of magnitude better performance per watt" claim is the kind of metric that will separate leaders from laggards in constrained environments.
For enterprise AI leaders, the implication is clear: start testing multi-silicon approaches now before your competitors do. Even a 3x efficiency gain becomes a massive competitive advantage when scaled across millions in infrastructure spend.
Action Items for CTOs and Infrastructure Leaders
Immediate (this quarter):
- Measure current GPU utilization across inference workloads (most tools show 15-30% per Gimlet's data)
- Calculate idle capacity cost (instances × hours idle × hourly rate)
- Evaluate multi-silicon orchestration platforms (Gimlet Labs, competitors)
Near-term (next 2 quarters):
- Pilot multi-silicon orchestration on non-critical workloads
- Map workload types to optimal hardware (inference → GPU, decode → CPU/SRAM, tools → network-optimized)
- Test cost per token improvements (target 3-5x reduction)
Strategic (12-18 months):
- Shift from single-vendor GPU procurement to multi-vendor hardware strategy
- Build platform engineering capability to manage heterogeneous infrastructure
- Redefine AI infrastructure budgets around efficiency metrics (performance per watt, cost per token) not raw capacity
The Funding Story: From Random Encounter to Oversubscribed Round
The origin story here is worth noting. CEO Zain Asgar and his co-founders (Michelle Nguyen, Omid Azizi, and Natalie Serrino) had previously worked together at Pixie, an open source Kubernetes observability startup that New Relic acquired in December 2020, just two months after a $9 million Series A led by Benchmark. Pixie's tech is now part of the open source organization that oversees Kubernetes.
About a year ago, Asgar randomly ran into Tim Tully from Menlo Ventures. He also received angel investments from Stanford professors. After the October public launch with eight-figure revenues, VCs started calling. When a term sheet landed on Asgar's desk and word got out, "we got a pretty big swarm of funding," Asgar told TechCrunch. The round was quickly oversubscribed.
Angel investors include Sequoia's Bill Coughran, Stanford Professor Nick McKeown, former VMware CEO Raghu Raghuram, and Intel CEO Lip-Bu Tan — a strong signal of technical credibility and enterprise connections.
The company now employs 30 people and plans to use the funding to expand engineering and customer success teams.
What to Watch
Near-term: Look for customer case studies and public benchmarks. Gimlet's claims (3-10x speedup, order of magnitude performance/watt improvement) need independent validation. If a top frontier lab or hyperscaler publicly endorses the platform, that's a strong signal.
Medium-term: Watch for competitive responses from incumbent infrastructure vendors. NVIDIA, AWS, and Google Cloud all have orchestration layers — if Gimlet's approach gains traction, expect them to build or acquire similar multi-silicon capabilities.
Long-term: The real test is whether Gimlet's platform becomes infrastructure-level plumbing that runs beneath every major AI deployment, or remains a specialized optimization tool for the largest players. The difference between those outcomes determines whether this is a $1B company or a $10B+ platform.
For enterprise AI buyers, the key question isn't whether multi-silicon orchestration works (the early customer traction suggests it does). It's whether you want to be an early adopter capturing efficiency gains now or wait for incumbents to bundle similar features into existing platforms.
Given the potential for 3-10x cost reductions on infrastructure bills already in the millions per month, waiting may be expensive.
Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.
Continue Reading
AI Infrastructure & Cost Optimization:
- Palantir's $10B Pentagon Lock-In: What CIOs Must Know — How "Program of Record" status creates vendor dependency (and what that means for your contracts)
- Andromeda AI Hits $1.5B Valuation: On-Demand GPUs Without the Contract Lock-In — GPU-as-a-Service model eliminates long-term commitments (relevant for multi-vendor strategies)
- Surf AI's $57M Series A: Autonomous Execution Beats Detection-Only Security — Infrastructure efficiency through automation (similar ROI thesis)
What's your AI infrastructure utilization rate? If you're tracking cost per token or performance per watt metrics, I'd love to hear how you're optimizing. Connect with me on LinkedIn, Twitter/X, or via the contact form.
— Rajesh