AI Infrastructure Enterprise AI Cost Optimization Vendor Selection Cloud Computing Performance Optimization

Gimlet Labs Raises $80M to Solve AI's Biggest Waste Problem

Stanford founder's multi-silicon cloud delivers 3-10x faster AI inference by orchestrating workloads across CPUs, GPUs, and specialized chips. Already running at top frontier labs and hyperscalers with 8-figure revenue.

By Rajesh Beri·March 23, 2026·10 min read

THE DAILY BRIEF

AI InfrastructureEnterprise AICost OptimizationVendor SelectionCloud ComputingPerformance Optimization

Gimlet Labs Raises $80M to Solve AI's Biggest Waste Problem

By Rajesh Beri·March 23, 2026·10 min read

Gimlet Labs Raises $80M to Solve AI's Biggest Waste Problem

Your AI infrastructure is wasting hundreds of billions of dollars. Not because you bought the wrong chips, but because you're using them wrong.

Gimlet Labs just raised $80 million in Series A funding (total $92M) to fix this with what they call the industry's first "multi-silicon inference cloud." Instead of forcing every AI workload to run on a single type of hardware, their software slices workloads across CPUs, GPUs, and specialized chips simultaneously — delivering 3-10x faster inference at the same cost and power.

The company publicly launched just five months ago with eight-figure revenues. It's already tripled its customer base and now serves one of the top three frontier labs and one of the top three hyperscalers (both unnamed). Menlo Ventures led the Series A, with participation from Factory (who led the seed), Eclipse, Prosperity7, and Triatomic.

For enterprise AI leaders, this matters for two reasons: your existing hardware is sitting 70-85% idle, and your 2026 AI datacenter budget just got a lot more defensible.

The Problem: Homogeneous Hardware Hit a Wall

The industry is gearing up to spend $650 billion on AI datacenter CapEx this year, according to Gimlet's press release. McKinsey estimates total data center spending will reach $7 trillion by 2030 if current trends continue.

But here's the problem: most AI infrastructure uses existing hardware "somewhere between 15 to 30 percent" of the time, says Gimlet CEO and co-founder Zain Asgar, a Stanford adjunct professor and successfully exited founder (Pixie, acquired by New Relic in 2020).

"Another way to think about this: you're wasting hundreds of billions of dollars because you're just leaving idle resources," Asgar told TechCrunch. "Our goal was basically to try to figure out how you can get AI workloads to be 10x more efficient than ever, today."

The root cause isn't lazy engineers. It's architecture. Agentic AI workloads chain together multiple steps, and each step has different compute requirements. As Menlo's Tim Tully writes in a blog post about the funding: "Inference is compute-bound; decode is memory-bound; and tool calls are network-bound."

No single chip handles all three efficiently. GPUs excel at compute-heavy inference but waste energy on memory-bound decode operations. CPUs handle tool calls better but underperform on inference. SRAM-based architectures optimize for specific workloads but can't run general models.

Traditional approaches force everything onto the same hardware anyway. The result: 70-85% of your expensive AI chips sit idle while waiting for bottlenecks to clear.

Photo by Adi Goldstein on Unsplash

The Solution: Orchestrate Workloads Across Diverse Hardware

Gimlet Labs' multi-silicon inference cloud solves this by treating diverse hardware as a single orchestrated resource. Their proprietary software stack automatically maps agentic workloads to the most suitable chips without developer burden.

The platform can even slice a single model across different architectures, using the most optimal chip for each portion of the model. This isn't theoretical: the company claims it delivers 3-10x faster inference speed for the same cost and power, plus "an order of magnitude better performance per watt."

How it works in practice: Imagine you're running an AI coding agent (like Cursor or GitHub Copilot). The agent takes a user prompt (inference step, compute-heavy), generates code (decode step, memory-heavy), and calls external tools to test the code (network-bound).

Gimlet's software automatically routes the inference step to a GPU cluster optimized for transformer models, moves the decode operation to high-memory CPUs or SRAM architectures, and handles tool calls on network-optimized processors. All of this happens transparently to the developer — no code changes required.

The platform works as software you deploy on your own hardware or as an API to Gimlet's managed cloud. Either way, it integrates with the major chip vendors: NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix are all partners.

Why This Matters for Enterprise AI Budgets

If you're managing an enterprise AI infrastructure budget, the math here is worth attention. Let's break down the implications.

The Multi-Silicon ROI Case

Baseline scenario: 500 GPU instances running 24/7 at $3/hour each = $1.08M/month ($12.96M/year).

Current utilization: 15-30% (industry average per Gimlet).

With multi-silicon orchestration:

Same workload spread across GPUs (inference), CPUs (decode), and specialized chips (tool calls)
3-10x speedup → reduce instance count by 50-70% while maintaining throughput
New monthly cost: $320K-540K (using cheaper CPU/specialized instances for 50-70% of workload)
Annual savings: $6.5-9.1M

Additional benefits:

No vendor lock-in (platform-agnostic across NVIDIA, AMD, Intel, ARM, Cerebras, d-Matrix)
Reuse aging GPUs for decode/memory-bound tasks (extend depreciation schedule)
Reduce power consumption (order of magnitude per Gimlet) → lower cooling costs

The broader strategic implication: you can defend your AI budget expansion by showing concrete efficiency gains. Instead of asking for 50% more GPUs next quarter, you can show how multi-silicon orchestration delivers 3-10x better throughput with the same hardware.

For CFOs evaluating AI infrastructure spend, this changes the conversation from "how do we afford more GPUs?" to "how do we optimize what we already have?"

Who This Is For (And Who It's Not)

Gimlet's product isn't for every AI developer. It's specifically designed for the largest AI model labs and data centers — organizations running inference at scale where inefficiency costs millions per month.

Current customers include one of the top three frontier labs (OpenAI, Anthropic, or Google DeepMind) and one of the top three hyperscalers (AWS, Azure, or Google Cloud), though Gimlet won't name them publicly.

The company launched in October 2025 with eight-figure revenues (at least $10 million). CEO Asgar says the customer base has more than doubled in the last four months, which suggests at least $20M+ annual run rate.

You're a good fit if you:

Run thousands of GPU instances for AI inference
Operate your own data centers or private cloud
Have platform engineering teams that can integrate orchestration software
Spend $500K+/month on AI infrastructure
Need to optimize cost per token at scale

You're not a good fit if you:

Use managed AI APIs (OpenAI, Anthropic Claude, etc.) without your own infrastructure
Run small-scale inference workloads (<100 instances)
Lack platform engineering resources to integrate new orchestration layers

This is infrastructure-level plumbing, not an off-the-shelf SaaS product. Gimlet delivers it either as software you deploy on your hardware stack or through an API to their managed Gimlet Cloud. Either way, expect implementation timelines of weeks, not hours.

Strategic Implications for AI Infrastructure Planning

This funding round and product launch signal three broader trends worth tracking.

Trend 1: Heterogeneous hardware becomes the default. The one-size-fits-all GPU approach worked for training-focused infrastructure. But as inference workloads dominate (quadrillions of tokens per month per Gimlet), specialized hardware optimized for specific tasks will outperform general-purpose chips. Platform teams should plan for multi-vendor hardware strategies, not single-vendor lock-in.

Trend 2: Software orchestration becomes competitive advantage. Just buying more GPUs won't win. The companies that extract maximum value from diverse hardware through intelligent orchestration will have lower costs, faster inference, and better margins. This shifts competitive advantage from hardware procurement to software integration.

Trend 3: Efficiency metrics matter more than raw compute. As datacenter capacity hits bottlenecks (power, cooling, space), performance per watt becomes as important as raw FLOPS. Gimlet's "order of magnitude better performance per watt" claim is the kind of metric that will separate leaders from laggards in constrained environments.

For enterprise AI leaders, the implication is clear: start testing multi-silicon approaches now before your competitors do. Even a 3x efficiency gain becomes a massive competitive advantage when scaled across millions in infrastructure spend.

Action Items for CTOs and Infrastructure Leaders

Immediate (this quarter):

Measure current GPU utilization across inference workloads (most tools show 15-30% per Gimlet's data)
Calculate idle capacity cost (instances × hours idle × hourly rate)
Evaluate multi-silicon orchestration platforms (Gimlet Labs, competitors)

Near-term (next 2 quarters):

Pilot multi-silicon orchestration on non-critical workloads
Map workload types to optimal hardware (inference → GPU, decode → CPU/SRAM, tools → network-optimized)
Test cost per token improvements (target 3-5x reduction)

Strategic (12-18 months):

Shift from single-vendor GPU procurement to multi-vendor hardware strategy
Build platform engineering capability to manage heterogeneous infrastructure
Redefine AI infrastructure budgets around efficiency metrics (performance per watt, cost per token) not raw capacity

The Funding Story: From Random Encounter to Oversubscribed Round

The origin story here is worth noting. CEO Zain Asgar and his co-founders (Michelle Nguyen, Omid Azizi, and Natalie Serrino) had previously worked together at Pixie, an open source Kubernetes observability startup that New Relic acquired in December 2020, just two months after a $9 million Series A led by Benchmark. Pixie's tech is now part of the open source organization that oversees Kubernetes.

About a year ago, Asgar randomly ran into Tim Tully from Menlo Ventures. He also received angel investments from Stanford professors. After the October public launch with eight-figure revenues, VCs started calling. When a term sheet landed on Asgar's desk and word got out, "we got a pretty big swarm of funding," Asgar told TechCrunch. The round was quickly oversubscribed.

Angel investors include Sequoia's Bill Coughran, Stanford Professor Nick McKeown, former VMware CEO Raghu Raghuram, and Intel CEO Lip-Bu Tan — a strong signal of technical credibility and enterprise connections.

The company now employs 30 people and plans to use the funding to expand engineering and customer success teams.

What to Watch

Near-term: Look for customer case studies and public benchmarks. Gimlet's claims (3-10x speedup, order of magnitude performance/watt improvement) need independent validation. If a top frontier lab or hyperscaler publicly endorses the platform, that's a strong signal.

Medium-term: Watch for competitive responses from incumbent infrastructure vendors. NVIDIA, AWS, and Google Cloud all have orchestration layers — if Gimlet's approach gains traction, expect them to build or acquire similar multi-silicon capabilities.

Long-term: The real test is whether Gimlet's platform becomes infrastructure-level plumbing that runs beneath every major AI deployment, or remains a specialized optimization tool for the largest players. The difference between those outcomes determines whether this is a $1B company or a $10B+ platform.

For enterprise AI buyers, the key question isn't whether multi-silicon orchestration works (the early customer traction suggests it does). It's whether you want to be an early adopter capturing efficiency gains now or wait for incumbents to bundle similar features into existing platforms.

Given the potential for 3-10x cost reductions on infrastructure bills already in the millions per month, waiting may be expensive.

Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.

Continue Reading

AI Infrastructure & Cost Optimization:

Palantir's $10B Pentagon Lock-In: What CIOs Must Know — How "Program of Record" status creates vendor dependency (and what that means for your contracts)
Andromeda AI Hits $1.5B Valuation: On-Demand GPUs Without the Contract Lock-In — GPU-as-a-Service model eliminates long-term commitments (relevant for multi-vendor strategies)
Surf AI's $57M Series A: Autonomous Execution Beats Detection-Only Security — Infrastructure efficiency through automation (similar ROI thesis)

What's your AI infrastructure utilization rate? If you're tracking cost per token or performance per watt metrics, I'd love to hear how you're optimizing. Connect with me on LinkedIn, Twitter/X, or via the contact form.

— Rajesh

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi | X: x.com/rajeshberi

Photo by [Taylor Vick](https://unsplash.com/@tvick) on Unsplash

Gimlet Labs Raises $80M to Solve AI's Biggest Waste Problem

Your AI infrastructure is wasting hundreds of billions of dollars. Not because you bought the wrong chips, but because you're using them wrong.

For enterprise AI leaders, this matters for two reasons: your existing hardware is sitting 70-85% idle, and your 2026 AI datacenter budget just got a lot more defensible.

The Problem: Homogeneous Hardware Hit a Wall

Traditional approaches force everything onto the same hardware anyway. The result: 70-85% of your expensive AI chips sit idle while waiting for bottlenecks to clear.

Abstract server infrastructure showing network connections Photo by Adi Goldstein on Unsplash

The Solution: Orchestrate Workloads Across Diverse Hardware

Why This Matters for Enterprise AI Budgets

If you're managing an enterprise AI infrastructure budget, the math here is worth attention. Let's break down the implications.

The Multi-Silicon ROI Case

Baseline scenario: 500 GPU instances running 24/7 at $3/hour each = $1.08M/month ($12.96M/year).

Current utilization: 15-30% (industry average per Gimlet).

With multi-silicon orchestration:

Same workload spread across GPUs (inference), CPUs (decode), and specialized chips (tool calls)
3-10x speedup → reduce instance count by 50-70% while maintaining throughput
New monthly cost: $320K-540K (using cheaper CPU/specialized instances for 50-70% of workload)
Annual savings: $6.5-9.1M

Additional benefits:

No vendor lock-in (platform-agnostic across NVIDIA, AMD, Intel, ARM, Cerebras, d-Matrix)
Reuse aging GPUs for decode/memory-bound tasks (extend depreciation schedule)
Reduce power consumption (order of magnitude per Gimlet) → lower cooling costs

For CFOs evaluating AI infrastructure spend, this changes the conversation from "how do we afford more GPUs?" to "how do we optimize what we already have?"

Who This Is For (And Who It's Not)

You're a good fit if you:

Run thousands of GPU instances for AI inference
Operate your own data centers or private cloud
Have platform engineering teams that can integrate orchestration software
Spend $500K+/month on AI infrastructure
Need to optimize cost per token at scale

You're not a good fit if you:

Use managed AI APIs (OpenAI, Anthropic Claude, etc.) without your own infrastructure
Run small-scale inference workloads (<100 instances)
Lack platform engineering resources to integrate new orchestration layers

Strategic Implications for AI Infrastructure Planning

This funding round and product launch signal three broader trends worth tracking.

Action Items for CTOs and Infrastructure Leaders

Immediate (this quarter):

Measure current GPU utilization across inference workloads (most tools show 15-30% per Gimlet's data)
Calculate idle capacity cost (instances × hours idle × hourly rate)
Evaluate multi-silicon orchestration platforms (Gimlet Labs, competitors)

Near-term (next 2 quarters):

Pilot multi-silicon orchestration on non-critical workloads
Map workload types to optimal hardware (inference → GPU, decode → CPU/SRAM, tools → network-optimized)
Test cost per token improvements (target 3-5x reduction)

Strategic (12-18 months):

Shift from single-vendor GPU procurement to multi-vendor hardware strategy
Build platform engineering capability to manage heterogeneous infrastructure
Redefine AI infrastructure budgets around efficiency metrics (performance per watt, cost per token) not raw capacity

The Funding Story: From Random Encounter to Oversubscribed Round

The company now employs 30 people and plans to use the funding to expand engineering and customer success teams.

What to Watch

Given the potential for 3-10x cost reductions on infrastructure bills already in the millions per month, waiting may be expensive.

Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.

Continue Reading

AI Infrastructure & Cost Optimization:

Palantir's $10B Pentagon Lock-In: What CIOs Must Know — How "Program of Record" status creates vendor dependency (and what that means for your contracts)
Andromeda AI Hits $1.5B Valuation: On-Demand GPUs Without the Contract Lock-In — GPU-as-a-Service model eliminates long-term commitments (relevant for multi-vendor strategies)
Surf AI's $57M Series A: Autonomous Execution Beats Detection-Only Security — Infrastructure efficiency through automation (similar ROI thesis)

— Rajesh

THE DAILY BRIEF

AI InfrastructureEnterprise AICost OptimizationVendor SelectionCloud ComputingPerformance Optimization

Gimlet Labs Raises $80M to Solve AI's Biggest Waste Problem

By Rajesh Beri·March 23, 2026·10 min read

Gimlet Labs Raises $80M to Solve AI's Biggest Waste Problem

Your AI infrastructure is wasting hundreds of billions of dollars. Not because you bought the wrong chips, but because you're using them wrong.

For enterprise AI leaders, this matters for two reasons: your existing hardware is sitting 70-85% idle, and your 2026 AI datacenter budget just got a lot more defensible.

The Problem: Homogeneous Hardware Hit a Wall

Traditional approaches force everything onto the same hardware anyway. The result: 70-85% of your expensive AI chips sit idle while waiting for bottlenecks to clear.

Photo by Adi Goldstein on Unsplash

The Solution: Orchestrate Workloads Across Diverse Hardware

Why This Matters for Enterprise AI Budgets

If you're managing an enterprise AI infrastructure budget, the math here is worth attention. Let's break down the implications.

The Multi-Silicon ROI Case

Baseline scenario: 500 GPU instances running 24/7 at $3/hour each = $1.08M/month ($12.96M/year).

Current utilization: 15-30% (industry average per Gimlet).

With multi-silicon orchestration:

Same workload spread across GPUs (inference), CPUs (decode), and specialized chips (tool calls)
3-10x speedup → reduce instance count by 50-70% while maintaining throughput
New monthly cost: $320K-540K (using cheaper CPU/specialized instances for 50-70% of workload)
Annual savings: $6.5-9.1M

Additional benefits:

No vendor lock-in (platform-agnostic across NVIDIA, AMD, Intel, ARM, Cerebras, d-Matrix)
Reuse aging GPUs for decode/memory-bound tasks (extend depreciation schedule)
Reduce power consumption (order of magnitude per Gimlet) → lower cooling costs

For CFOs evaluating AI infrastructure spend, this changes the conversation from "how do we afford more GPUs?" to "how do we optimize what we already have?"

Who This Is For (And Who It's Not)

You're a good fit if you:

Run thousands of GPU instances for AI inference
Operate your own data centers or private cloud
Have platform engineering teams that can integrate orchestration software
Spend $500K+/month on AI infrastructure
Need to optimize cost per token at scale

You're not a good fit if you:

Use managed AI APIs (OpenAI, Anthropic Claude, etc.) without your own infrastructure
Run small-scale inference workloads (<100 instances)
Lack platform engineering resources to integrate new orchestration layers

Strategic Implications for AI Infrastructure Planning

This funding round and product launch signal three broader trends worth tracking.

Action Items for CTOs and Infrastructure Leaders

Immediate (this quarter):

Measure current GPU utilization across inference workloads (most tools show 15-30% per Gimlet's data)
Calculate idle capacity cost (instances × hours idle × hourly rate)
Evaluate multi-silicon orchestration platforms (Gimlet Labs, competitors)

Near-term (next 2 quarters):

Pilot multi-silicon orchestration on non-critical workloads
Map workload types to optimal hardware (inference → GPU, decode → CPU/SRAM, tools → network-optimized)
Test cost per token improvements (target 3-5x reduction)

Strategic (12-18 months):

Shift from single-vendor GPU procurement to multi-vendor hardware strategy
Build platform engineering capability to manage heterogeneous infrastructure
Redefine AI infrastructure budgets around efficiency metrics (performance per watt, cost per token) not raw capacity

The Funding Story: From Random Encounter to Oversubscribed Round

The company now employs 30 people and plans to use the funding to expand engineering and customer success teams.

What to Watch

Given the potential for 3-10x cost reductions on infrastructure bills already in the millions per month, waiting may be expensive.

Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.

Continue Reading

AI Infrastructure & Cost Optimization:

Palantir's $10B Pentagon Lock-In: What CIOs Must Know — How "Program of Record" status creates vendor dependency (and what that means for your contracts)
Andromeda AI Hits $1.5B Valuation: On-Demand GPUs Without the Contract Lock-In — GPU-as-a-Service model eliminates long-term commitments (relevant for multi-vendor strategies)
Surf AI's $57M Series A: Autonomous Execution Beats Detection-Only Security — Infrastructure efficiency through automation (similar ROI thesis)

— Rajesh

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi | X: x.com/rajeshberi

Mentioned Tools

Agno

Build, run, and manage agentic software at scale with Agno's powerful AI framework.

Anthropic Claude Haiku 4.5

Fastest, most cost-effective Claude model for high-volume tasks

Anthropic Claude Opus 4.6

Most intelligent model for agentic workflows, coding, and long-horizon tasks

Anthropic Claude Sonnet 4.6

Optimal balance of intelligence, cost, and speed for production workloads

AI ROI

Latest Articles

View All →

Gimlet Labs Raises $80M to Solve AI's Biggest Waste Problem

THE DAILY BRIEF

Gimlet Labs Raises $80M to Solve AI's Biggest Waste Problem

Gimlet Labs Raises $80M to Solve AI's Biggest Waste Problem

The Problem: Homogeneous Hardware Hit a Wall

The Solution: Orchestrate Workloads Across Diverse Hardware

Why This Matters for Enterprise AI Budgets

The Multi-Silicon ROI Case

Who This Is For (And Who It's Not)

Strategic Implications for AI Infrastructure Planning

Action Items for CTOs and Infrastructure Leaders

The Funding Story: From Random Encounter to Oversubscribed Round

What to Watch

Continue Reading

THE DAILY BRIEF

Gimlet Labs Raises $80M to Solve AI's Biggest Waste Problem

The Problem: Homogeneous Hardware Hit a Wall

The Solution: Orchestrate Workloads Across Diverse Hardware

Why This Matters for Enterprise AI Budgets

The Multi-Silicon ROI Case

Who This Is For (And Who It's Not)

Strategic Implications for AI Infrastructure Planning

Action Items for CTOs and Infrastructure Leaders

The Funding Story: From Random Encounter to Oversubscribed Round

What to Watch

Continue Reading

THE DAILY BRIEF

Gimlet Labs Raises $80M to Solve AI's Biggest Waste Problem

Gimlet Labs Raises $80M to Solve AI's Biggest Waste Problem

The Problem: Homogeneous Hardware Hit a Wall

The Solution: Orchestrate Workloads Across Diverse Hardware

Why This Matters for Enterprise AI Budgets

The Multi-Silicon ROI Case

Who This Is For (And Who It's Not)

Strategic Implications for AI Infrastructure Planning

Action Items for CTOs and Infrastructure Leaders

The Funding Story: From Random Encounter to Oversubscribed Round

What to Watch

Continue Reading

THE DAILY BRIEF

Stay Ahead of the Curve

Mentioned Tools

Agno

Anthropic Claude Haiku 4.5

Anthropic Claude Opus 4.6

Anthropic Claude Sonnet 4.6

Related Articles

Why 67% of AI ROI Comes from Culture, Not Tech

Why 34% of Enterprises Choose Anthropic Over OpenAI

JPMorgan's $12T/Day Agentic AI Kills the 95% Pilot Trap

Broadridge Goes Live: 40 Clients, 30% Cost Cut, 0 Pilots

Latest Articles

Why 67% of AI ROI Comes from Culture, Not Tech

Why 34% of Enterprises Choose Anthropic Over OpenAI

JPMorgan's $12T/Day Agentic AI Kills the 95% Pilot Trap

Broadridge Goes Live: 40 Clients, 30% Cost Cut, 0 Pilots