Google Triples Gemini Price Yet Undercuts OpenAI By Half

Google's Gemini 3.5 Flash triples in price yet undercuts OpenAI by half. Speed at 289 tokens/sec is the new enterprise differentiator.

By Rajesh Beri·May 19, 2026·6 min read
Share:

THE DAILY BRIEF

GoogleGeminiEnterprise AIPricing StrategyOpenAIAnthropic

Google Triples Gemini Price Yet Undercuts OpenAI By Half

Google's Gemini 3.5 Flash triples in price yet undercuts OpenAI by half. Speed at 289 tokens/sec is the new enterprise differentiator.

By Rajesh Beri·May 19, 2026·6 min read

Google just shipped its boldest pricing bet yet. At I/O 2026, the company released Gemini 3.5 Flash at $1.50 per million input tokens and $9.00 per million output tokens—a clean 3x jump over the previous Flash model's $0.50/$3.00 rate. Yet CEO Sundar Pichai stood on stage and called it "less than half the price" of frontier competitors.

Both claims are true. And that tension tells you everything about where enterprise AI pricing is heading in 2026.

The Paradox: More Expensive, But Cheaper

Here's what happened. Google's Gemini 3 Flash, launched earlier this year, was the budget option—fast, capable, and priced to undercut everyone. Gemini 3.5 Flash triples that cost structure. By any measure, that's a steep hike.

But context matters. OpenAI's GPT-4 variants run $2-$3 input and $8-$15 output per million tokens. Anthropic's Claude Sonnet sits at $3/$15. Against those numbers, Google's new Flash pricing lands at roughly half the cost for comparable frontier-tier performance.

The real story isn't the 3x increase over Google's own budget tier. It's that Google is repositioning Flash from a budget model to a speed-optimized agent runtime—and still pricing it below OpenAI and Anthropic.

Speed as the New Differentiator

Google's pitch isn't about token price anymore. It's about throughput.

Gemini 3.5 Flash delivers 284-289 tokens per second, according to both CEO Sundar Pichai and independent evaluator Artificial Analysis. That's the fastest output speed recorded in the market. For enterprises running agentic workflows—where agents spawn sub-agents, execute multi-step tasks, and consume thousands of tokens per operation—speed matters more than headline pricing.

DeepMind chief technologist Koray Kavukcuoglu told reporters the model "outperforms our latest frontier model, 3.1 Pro, on nearly all the benchmarks," including coding, agentic tasks, and multimodal reasoning. On Terminal-Bench 2.1, it scored 76.2% versus 3.1 Pro's 70.3%. On GDPval-AA (a benchmark tracking economically valuable work), it jumped 340 Elo points to 1,656 from 3.1 Pro's 1,317.

Translation for CIOs: Your AI agents will complete tasks faster, consume fewer billable seconds, and handle more concurrent operations—even if each token costs more.

The Enterprise "Sticker Shock" Problem

Constellation Research noted that Google's aggressive pricing move addresses something new in 2026: enterprise "sticker shock" as companies see their first real AI agent bills at scale.

When you're running a chatbot answering customer questions, token costs are predictable. When you're running autonomous agents that spawn sub-agents, scrape data, execute workflows, and iterate on failures, token consumption explodes.

Example: Artificial Analysis found Gemini 3.5 Flash cost 5.5x more to run its full benchmark suite than Gemini 3 Flash—not just because of higher token prices, but because agentic workflows consume more input tokens per task.

This is the hidden cost enterprises are discovering in 2026. Google's bet is that faster execution (289 tokens/sec) reduces total cost-per-task, even if each token costs more upfront.

Real Enterprise Deployments (Not Just Demos)

Google's model page confirms several named customers already in production:

  • Salesforce: Integrating 3.5 Flash into Agentforce to automate complex enterprise tasks using multiple context-retaining sub-agents
  • Shopify: Running parallel sub-agents for global merchant-growth forecasting
  • Macquarie Bank: Piloting document processing for onboarding over 100-plus-page financial files
  • Ramp: Using it for invoice OCR at scale
  • Xero: Automating multi-week tax-form workflows

These aren't proof-of-concept pilots. They're production deployments with named brands willing to stake their workflows on a model released the same day.

Box reported a 19.6% improvement over Gemini 3 Flash on its enterprise-work evaluation—one of the few partner metrics Google published directly.

What This Means for Technical Leaders

If you're a CTO, CIO, or VP Engineering evaluating AI infrastructure in 2026, here's the decision tree:

Choose Gemini 3.5 Flash if:

  • You're running agentic workflows (not just chatbots)
  • Latency and throughput matter more than token price
  • You need to process high volumes of concurrent tasks
  • Your workloads involve multi-step reasoning or code generation
  • You're cost-sensitive but need frontier-tier performance

Stick with OpenAI/Anthropic if:

  • You need absolute best-in-class reasoning (3.5 Flash trails on long-context retrieval)
  • Your workflows are already optimized for their APIs
  • You value ecosystem maturity over raw speed
  • Switching costs outweigh per-token savings

Consider Gemini 3 Flash (the old one) if:

  • You're running simple retrieval or Q&A workloads
  • Token price is your primary constraint
  • You don't need agentic capabilities

The Strategic Play: Agents Over Chatbots

Google paired the 3.5 Flash release with a relaunched Antigravity platform—rebuilt as a standalone desktop application specifically for agent development. On stage, Google engineer Varun Mohan demonstrated agents spawning sub-agents to build components in parallel and assemble a full operating system inside Antigravity.

This is the real bet. Google isn't trying to win the chatbot wars. It's betting that 2026-2027 will be defined by agentic AI—autonomous systems that execute multi-step workflows with minimal human intervention.

The 3x price increase isn't a tax on existing customers. It's a signal that Google is moving upmarket—targeting enterprises willing to pay more per token for faster, more reliable agent execution.

What This Means for Business Leaders

For CFOs, COOs, and business VPs evaluating AI investments, the pricing shift has three implications:

1. Operational AI costs are about to rise. If your teams are running AI agents in production, expect token bills to climb 3-5x in 2026 as models shift from chatbot-optimized to agent-optimized pricing tiers.

2. Total cost-per-task may drop anyway. Faster models complete tasks in fewer seconds, reducing compute costs and infrastructure overhead. A 3x token price increase paired with 4x faster execution yields net savings.

3. Vendor lock-in risk is real. OpenAI's $2-3/$8-15 pricing, Anthropic's $3/$15, and Google's $1.50/$9.00 all sit within 2x of each other. But switching between them mid-deployment is expensive. Pick based on long-term workflow compatibility, not short-term price arbitrage.

The Bottom Line

Google's Gemini 3.5 Flash pricing is a bet that speed beats sticker price in the enterprise agent era.

If you're running chatbots, the 3x increase stings. If you're running autonomous agents at scale, the 289 tokens/sec throughput and sub-$10 output pricing versus OpenAI's $15 starts to look strategic.

The real question for enterprises isn't whether Google's pricing is "cheap" or "expensive." It's whether your 2026 AI strategy is built for chatbots or agents. Because the pricing models are diverging fast.


Continue Reading

Explore related enterprise AI infrastructure and vendor analysis:


About the Author: Rajesh Beri is Head of AI Engineering and writes THE DAILY BRIEF, a twice-weekly newsletter on Enterprise AI for technical and business leaders. Subscribe here | LinkedIn | Twitter/X

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Google Triples Gemini Price Yet Undercuts OpenAI By Half

Photo by fauxels on Pexels

Google just shipped its boldest pricing bet yet. At I/O 2026, the company released Gemini 3.5 Flash at $1.50 per million input tokens and $9.00 per million output tokens—a clean 3x jump over the previous Flash model's $0.50/$3.00 rate. Yet CEO Sundar Pichai stood on stage and called it "less than half the price" of frontier competitors.

Both claims are true. And that tension tells you everything about where enterprise AI pricing is heading in 2026.

The Paradox: More Expensive, But Cheaper

Here's what happened. Google's Gemini 3 Flash, launched earlier this year, was the budget option—fast, capable, and priced to undercut everyone. Gemini 3.5 Flash triples that cost structure. By any measure, that's a steep hike.

But context matters. OpenAI's GPT-4 variants run $2-$3 input and $8-$15 output per million tokens. Anthropic's Claude Sonnet sits at $3/$15. Against those numbers, Google's new Flash pricing lands at roughly half the cost for comparable frontier-tier performance.

The real story isn't the 3x increase over Google's own budget tier. It's that Google is repositioning Flash from a budget model to a speed-optimized agent runtime—and still pricing it below OpenAI and Anthropic.

Speed as the New Differentiator

Google's pitch isn't about token price anymore. It's about throughput.

Gemini 3.5 Flash delivers 284-289 tokens per second, according to both CEO Sundar Pichai and independent evaluator Artificial Analysis. That's the fastest output speed recorded in the market. For enterprises running agentic workflows—where agents spawn sub-agents, execute multi-step tasks, and consume thousands of tokens per operation—speed matters more than headline pricing.

DeepMind chief technologist Koray Kavukcuoglu told reporters the model "outperforms our latest frontier model, 3.1 Pro, on nearly all the benchmarks," including coding, agentic tasks, and multimodal reasoning. On Terminal-Bench 2.1, it scored 76.2% versus 3.1 Pro's 70.3%. On GDPval-AA (a benchmark tracking economically valuable work), it jumped 340 Elo points to 1,656 from 3.1 Pro's 1,317.

Translation for CIOs: Your AI agents will complete tasks faster, consume fewer billable seconds, and handle more concurrent operations—even if each token costs more.

The Enterprise "Sticker Shock" Problem

Constellation Research noted that Google's aggressive pricing move addresses something new in 2026: enterprise "sticker shock" as companies see their first real AI agent bills at scale.

When you're running a chatbot answering customer questions, token costs are predictable. When you're running autonomous agents that spawn sub-agents, scrape data, execute workflows, and iterate on failures, token consumption explodes.

Example: Artificial Analysis found Gemini 3.5 Flash cost 5.5x more to run its full benchmark suite than Gemini 3 Flash—not just because of higher token prices, but because agentic workflows consume more input tokens per task.

This is the hidden cost enterprises are discovering in 2026. Google's bet is that faster execution (289 tokens/sec) reduces total cost-per-task, even if each token costs more upfront.

Real Enterprise Deployments (Not Just Demos)

Google's model page confirms several named customers already in production:

  • Salesforce: Integrating 3.5 Flash into Agentforce to automate complex enterprise tasks using multiple context-retaining sub-agents
  • Shopify: Running parallel sub-agents for global merchant-growth forecasting
  • Macquarie Bank: Piloting document processing for onboarding over 100-plus-page financial files
  • Ramp: Using it for invoice OCR at scale
  • Xero: Automating multi-week tax-form workflows

These aren't proof-of-concept pilots. They're production deployments with named brands willing to stake their workflows on a model released the same day.

Box reported a 19.6% improvement over Gemini 3 Flash on its enterprise-work evaluation—one of the few partner metrics Google published directly.

What This Means for Technical Leaders

If you're a CTO, CIO, or VP Engineering evaluating AI infrastructure in 2026, here's the decision tree:

Choose Gemini 3.5 Flash if:

  • You're running agentic workflows (not just chatbots)
  • Latency and throughput matter more than token price
  • You need to process high volumes of concurrent tasks
  • Your workloads involve multi-step reasoning or code generation
  • You're cost-sensitive but need frontier-tier performance

Stick with OpenAI/Anthropic if:

  • You need absolute best-in-class reasoning (3.5 Flash trails on long-context retrieval)
  • Your workflows are already optimized for their APIs
  • You value ecosystem maturity over raw speed
  • Switching costs outweigh per-token savings

Consider Gemini 3 Flash (the old one) if:

  • You're running simple retrieval or Q&A workloads
  • Token price is your primary constraint
  • You don't need agentic capabilities

The Strategic Play: Agents Over Chatbots

Google paired the 3.5 Flash release with a relaunched Antigravity platform—rebuilt as a standalone desktop application specifically for agent development. On stage, Google engineer Varun Mohan demonstrated agents spawning sub-agents to build components in parallel and assemble a full operating system inside Antigravity.

This is the real bet. Google isn't trying to win the chatbot wars. It's betting that 2026-2027 will be defined by agentic AI—autonomous systems that execute multi-step workflows with minimal human intervention.

The 3x price increase isn't a tax on existing customers. It's a signal that Google is moving upmarket—targeting enterprises willing to pay more per token for faster, more reliable agent execution.

What This Means for Business Leaders

For CFOs, COOs, and business VPs evaluating AI investments, the pricing shift has three implications:

1. Operational AI costs are about to rise. If your teams are running AI agents in production, expect token bills to climb 3-5x in 2026 as models shift from chatbot-optimized to agent-optimized pricing tiers.

2. Total cost-per-task may drop anyway. Faster models complete tasks in fewer seconds, reducing compute costs and infrastructure overhead. A 3x token price increase paired with 4x faster execution yields net savings.

3. Vendor lock-in risk is real. OpenAI's $2-3/$8-15 pricing, Anthropic's $3/$15, and Google's $1.50/$9.00 all sit within 2x of each other. But switching between them mid-deployment is expensive. Pick based on long-term workflow compatibility, not short-term price arbitrage.

The Bottom Line

Google's Gemini 3.5 Flash pricing is a bet that speed beats sticker price in the enterprise agent era.

If you're running chatbots, the 3x increase stings. If you're running autonomous agents at scale, the 289 tokens/sec throughput and sub-$10 output pricing versus OpenAI's $15 starts to look strategic.

The real question for enterprises isn't whether Google's pricing is "cheap" or "expensive." It's whether your 2026 AI strategy is built for chatbots or agents. Because the pricing models are diverging fast.


Continue Reading

Explore related enterprise AI infrastructure and vendor analysis:


About the Author: Rajesh Beri is Head of AI Engineering and writes THE DAILY BRIEF, a twice-weekly newsletter on Enterprise AI for technical and business leaders. Subscribe here | LinkedIn | Twitter/X

Share:

THE DAILY BRIEF

GoogleGeminiEnterprise AIPricing StrategyOpenAIAnthropic

Google Triples Gemini Price Yet Undercuts OpenAI By Half

Google's Gemini 3.5 Flash triples in price yet undercuts OpenAI by half. Speed at 289 tokens/sec is the new enterprise differentiator.

By Rajesh Beri·May 19, 2026·6 min read

Google just shipped its boldest pricing bet yet. At I/O 2026, the company released Gemini 3.5 Flash at $1.50 per million input tokens and $9.00 per million output tokens—a clean 3x jump over the previous Flash model's $0.50/$3.00 rate. Yet CEO Sundar Pichai stood on stage and called it "less than half the price" of frontier competitors.

Both claims are true. And that tension tells you everything about where enterprise AI pricing is heading in 2026.

The Paradox: More Expensive, But Cheaper

Here's what happened. Google's Gemini 3 Flash, launched earlier this year, was the budget option—fast, capable, and priced to undercut everyone. Gemini 3.5 Flash triples that cost structure. By any measure, that's a steep hike.

But context matters. OpenAI's GPT-4 variants run $2-$3 input and $8-$15 output per million tokens. Anthropic's Claude Sonnet sits at $3/$15. Against those numbers, Google's new Flash pricing lands at roughly half the cost for comparable frontier-tier performance.

The real story isn't the 3x increase over Google's own budget tier. It's that Google is repositioning Flash from a budget model to a speed-optimized agent runtime—and still pricing it below OpenAI and Anthropic.

Speed as the New Differentiator

Google's pitch isn't about token price anymore. It's about throughput.

Gemini 3.5 Flash delivers 284-289 tokens per second, according to both CEO Sundar Pichai and independent evaluator Artificial Analysis. That's the fastest output speed recorded in the market. For enterprises running agentic workflows—where agents spawn sub-agents, execute multi-step tasks, and consume thousands of tokens per operation—speed matters more than headline pricing.

DeepMind chief technologist Koray Kavukcuoglu told reporters the model "outperforms our latest frontier model, 3.1 Pro, on nearly all the benchmarks," including coding, agentic tasks, and multimodal reasoning. On Terminal-Bench 2.1, it scored 76.2% versus 3.1 Pro's 70.3%. On GDPval-AA (a benchmark tracking economically valuable work), it jumped 340 Elo points to 1,656 from 3.1 Pro's 1,317.

Translation for CIOs: Your AI agents will complete tasks faster, consume fewer billable seconds, and handle more concurrent operations—even if each token costs more.

The Enterprise "Sticker Shock" Problem

Constellation Research noted that Google's aggressive pricing move addresses something new in 2026: enterprise "sticker shock" as companies see their first real AI agent bills at scale.

When you're running a chatbot answering customer questions, token costs are predictable. When you're running autonomous agents that spawn sub-agents, scrape data, execute workflows, and iterate on failures, token consumption explodes.

Example: Artificial Analysis found Gemini 3.5 Flash cost 5.5x more to run its full benchmark suite than Gemini 3 Flash—not just because of higher token prices, but because agentic workflows consume more input tokens per task.

This is the hidden cost enterprises are discovering in 2026. Google's bet is that faster execution (289 tokens/sec) reduces total cost-per-task, even if each token costs more upfront.

Real Enterprise Deployments (Not Just Demos)

Google's model page confirms several named customers already in production:

  • Salesforce: Integrating 3.5 Flash into Agentforce to automate complex enterprise tasks using multiple context-retaining sub-agents
  • Shopify: Running parallel sub-agents for global merchant-growth forecasting
  • Macquarie Bank: Piloting document processing for onboarding over 100-plus-page financial files
  • Ramp: Using it for invoice OCR at scale
  • Xero: Automating multi-week tax-form workflows

These aren't proof-of-concept pilots. They're production deployments with named brands willing to stake their workflows on a model released the same day.

Box reported a 19.6% improvement over Gemini 3 Flash on its enterprise-work evaluation—one of the few partner metrics Google published directly.

What This Means for Technical Leaders

If you're a CTO, CIO, or VP Engineering evaluating AI infrastructure in 2026, here's the decision tree:

Choose Gemini 3.5 Flash if:

  • You're running agentic workflows (not just chatbots)
  • Latency and throughput matter more than token price
  • You need to process high volumes of concurrent tasks
  • Your workloads involve multi-step reasoning or code generation
  • You're cost-sensitive but need frontier-tier performance

Stick with OpenAI/Anthropic if:

  • You need absolute best-in-class reasoning (3.5 Flash trails on long-context retrieval)
  • Your workflows are already optimized for their APIs
  • You value ecosystem maturity over raw speed
  • Switching costs outweigh per-token savings

Consider Gemini 3 Flash (the old one) if:

  • You're running simple retrieval or Q&A workloads
  • Token price is your primary constraint
  • You don't need agentic capabilities

The Strategic Play: Agents Over Chatbots

Google paired the 3.5 Flash release with a relaunched Antigravity platform—rebuilt as a standalone desktop application specifically for agent development. On stage, Google engineer Varun Mohan demonstrated agents spawning sub-agents to build components in parallel and assemble a full operating system inside Antigravity.

This is the real bet. Google isn't trying to win the chatbot wars. It's betting that 2026-2027 will be defined by agentic AI—autonomous systems that execute multi-step workflows with minimal human intervention.

The 3x price increase isn't a tax on existing customers. It's a signal that Google is moving upmarket—targeting enterprises willing to pay more per token for faster, more reliable agent execution.

What This Means for Business Leaders

For CFOs, COOs, and business VPs evaluating AI investments, the pricing shift has three implications:

1. Operational AI costs are about to rise. If your teams are running AI agents in production, expect token bills to climb 3-5x in 2026 as models shift from chatbot-optimized to agent-optimized pricing tiers.

2. Total cost-per-task may drop anyway. Faster models complete tasks in fewer seconds, reducing compute costs and infrastructure overhead. A 3x token price increase paired with 4x faster execution yields net savings.

3. Vendor lock-in risk is real. OpenAI's $2-3/$8-15 pricing, Anthropic's $3/$15, and Google's $1.50/$9.00 all sit within 2x of each other. But switching between them mid-deployment is expensive. Pick based on long-term workflow compatibility, not short-term price arbitrage.

The Bottom Line

Google's Gemini 3.5 Flash pricing is a bet that speed beats sticker price in the enterprise agent era.

If you're running chatbots, the 3x increase stings. If you're running autonomous agents at scale, the 289 tokens/sec throughput and sub-$10 output pricing versus OpenAI's $15 starts to look strategic.

The real question for enterprises isn't whether Google's pricing is "cheap" or "expensive." It's whether your 2026 AI strategy is built for chatbots or agents. Because the pricing models are diverging fast.


Continue Reading

Explore related enterprise AI infrastructure and vendor analysis:


About the Author: Rajesh Beri is Head of AI Engineering and writes THE DAILY BRIEF, a twice-weekly newsletter on Enterprise AI for technical and business leaders. Subscribe here | LinkedIn | Twitter/X

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Subscribe