Google Cross-Cloud Lakehouse: AI Agents Without Egress Fees

Google's Cross-Cloud Lakehouse lets AI agents query AWS S3 and Azure Data Lake as native via Apache Iceberg — zero copy, no egress. CIO and CFO impact.

By Rajesh Beri·April 25, 2026·11 min read
Share:

THE DAILY BRIEF

Google CloudData LakehouseApache IcebergEnterprise AIMulti-CloudAgentic AI

Google Cross-Cloud Lakehouse: AI Agents Without Egress Fees

Google's Cross-Cloud Lakehouse lets AI agents query AWS S3 and Azure Data Lake as native via Apache Iceberg — zero copy, no egress. CIO and CFO impact.

By Rajesh Beri·April 25, 2026·11 min read

Google just removed the most expensive friction point in multi-cloud AI.

On April 22, 2026, at Google Cloud Next 2026 in Las Vegas, Google announced the Cross-Cloud Lakehouse — an Apache Iceberg-based data platform that lets AI agents and analytics engines query data sitting in AWS S3 or Azure Data Lake as if it were native to Google Cloud, without copies, without ETL, and without egress fees.

If that sentence sounds incremental, the implications are not. The Cross-Cloud Lakehouse attacks the single biggest economic and architectural blocker to enterprise AI agents at scale — that 80% of enterprise data lives outside the cloud where the AI runs, and moving it has been priced as a tax that compounds with every agent invocation.

For CIOs, CTOs, and CFOs, this announcement reframes a decision that's been pending for two years: do we standardize our AI data layer on a single hyperscaler and pay the migration cost, or accept the per-query egress and latency penalty of cross-cloud federation? Google's answer is neither — federate at the open-format catalog layer and pay for compute, not for data movement.

What Actually Shipped

Cross-Cloud Lakehouse is the headline, but it's the visible tip of a coordinated rebrand and rebuild.

Google quietly renamed BigLake to Google Cloud Lakehouse on April 20, 2026 — a signal that the company is treating the lakehouse as a first-class product, not a BigQuery feature flag. The architecture stack now reads:

Layer Component Status
Catalog Lakehouse Runtime Catalog (Iceberg) GA
Catalog REST Catalog Endpoint Preview
Compute BigQuery + Managed Service for Apache Spark GA
Compute Lightning Engine (2× price-performance) GA
Cross-Cloud Cross-Cloud Caching to S3 / Azure Preview
Federation AWS Glue, Databricks, Snowflake, SAP catalogs Preview
Multimodal BigQuery ObjectRefs (unstructured + Iceberg) GA
Real-time Spanner / AlloyDB / Cloud SQL → Iceberg replication Preview
Agents BigQuery Data Engineering Agent, Data Science Agent Preview

The throughline is Apache Iceberg as the universal interchange format. Every component speaks Iceberg natively. Every cross-vendor integration is at the Iceberg catalog layer, not the data plane. That is a meaningful architectural choice — and a competitive one — because it means you do not have to leave Snowflake, Databricks, or AWS to participate.

Andi Gutmans, Google Cloud's VP and GM of Data Cloud, framed the design intent around what he called "agent gravity" — the autonomy loss agents experience when cross-cloud latency, egress costs, or access policies break their reasoning loop. The Cross-Cloud Lakehouse is engineered to let an agent reason across data residing in any of the three major clouds without the agent — or the user — having to know or care.

The Technical Read for CTOs and Data Architects

Strip away the marketing and three engineering decisions matter most.

First, Apache Iceberg is now the de facto enterprise table format. A year ago, Iceberg vs. Delta vs. Hudi was an active debate. Cloud Next 2026 effectively closed it. Snowflake (Polaris), Databricks (Unity Catalog with Iceberg read/write), AWS (Glue Data Catalog), Confluent (Tableflow), Salesforce (zero-copy via Iceberg with Google), and now Google Cloud Lakehouse all interoperate through Iceberg's REST catalog. If you are picking a table format in 2026, Iceberg is the answer. Anything else is a bet against the entire vendor ecosystem.

Second, the federation is bidirectional and catalog-native. Cross-Cloud Lakehouse doesn't ingest your AWS data — it federates at the Iceberg REST Catalog level through Cross-Cloud Caching (Preview). BigQuery and Managed Spark see remote Iceberg tables in S3 as if they were local. Conversely, Databricks and Snowflake can read Google Cloud Iceberg tables via their own catalog federation. That kind of bidirectional, vendor-neutral catalog is what the Hadoop-era Hive Metastore tried and failed to be — and the difference now is that the open-source format is mature, and the hyperscalers are all economically aligned to interoperate.

Third, Cross-Cloud Caching changes the cost curve. The headline feature isn't "we eliminated egress fees" — Google can't unilaterally do that, AWS still charges egress when you read S3 — it's that Cross-Cloud Caching keeps frequently accessed data in a Google-side cache, which dramatically reduces per-query egress for repeated agent reads. Combined with Cross-Cloud Interconnect's dedicated private networking, the per-token data-access cost for agentic workloads drops materially. Forrester's TEI study cites 117% ROI (run the numbers with our ROI calculator) with payback in under six months — that is mostly egress and ETL avoidance.

The agent integration layer is what most CTOs will care about operationally:

  • BigQuery Data Engineering Agent and Data Science Agent (Preview) speak Iceberg natively. They can plan and execute queries across federated catalogs without the agent needing to know which cloud the data is in.
  • Agent Developer Kit (ADK) + Model Context Protocol (MCP) sit on top, so agents you build in your own framework get the same data substrate.
  • Real-time replication from Spanner, AlloyDB, and Cloud SQL into Iceberg (Preview) closes the operational/analytical gap. Agents can act on data that is seconds old, not hours.

The pattern that emerges:

Operational DBs (Spanner/AlloyDB) ─┐
AWS S3 Iceberg ────────────────────┤
Azure Data Lake Iceberg ───────────┤── Google Cloud Lakehouse ── Agents
Snowflake Polaris (federated) ─────┤      (Iceberg + REST Catalog)
Databricks Unity (federated) ──────┘

For data architects, this is the borderless lakehouse pattern Iceberg promised in 2022 finally delivered as a managed service.

The Business Read for CFOs and Procurement

The cost story is what makes this real for finance.

Most enterprises today operate one of three painful patterns:

  1. Migrate everything to one cloud. The cost is denominated in petabytes × $0.02–$0.09 per GB egress + multi-quarter migration projects + ongoing dual-pipeline cost during cutover. For a 5-PB enterprise data estate, the egress alone can exceed $400K, and the all-in migration runs $5–$30M.

  2. Run AI in the cloud where the data already lives. This works only if your AI vendor of choice happens to live in the same cloud — which, given that Anthropic (AWS + GCP), OpenAI (Azure), and Google (GCP) split the field, is rarely true.

  3. Accept the cross-cloud egress tax. A typical agentic workload reading 10TB/month across cloud boundaries costs $900K–$1.1M annually in egress alone at standard list rates. Multiply by every agent in production and the AI line item starts looking like an undisclosed infrastructure tax.

Cross-Cloud Lakehouse changes the procurement math in three concrete ways:

It eliminates migration as a precondition for cross-cloud AI. You don't have to move 5PB out of S3 to run Gemini-powered agents on it. That removes the single biggest delay item from most enterprise AI roadmaps — typically 9–18 months of project work that produces zero AI capability while it's underway.

It compresses the per-query data-access cost for repeated reads via Cross-Cloud Caching. An agent that re-reads a sales dataset 50 times an hour pays the egress once, not 50 times. For high-frequency agentic workloads — customer support, fraud detection, supply chain — this is the difference between "pilot" and "production economics."

It restores cross-cloud negotiation leverage. When migration is the only alternative, AWS and Azure get to set egress prices because they know you can't move. With federated Iceberg, the threat is credible: we can compute against your data from another cloud at near-native cost. That changes the renewal conversation. Expect AWS and Azure to respond with discounted egress for committed-use customers and "lakehouse-friendly" pricing tiers in the next two quarters.

The named customer references back this up. Spotify is using the lakehouse to consolidate analytics across BigQuery, Dataflow, and OSS engines without duplication. Accenture's Global Lead for the Google Business Group — Scott Alfieri — credits the "zero-copy" architecture as what makes "agentic AI with surgical precision" achievable for clients. Those are not pilot quotes; they are reference customers running production patterns.

The Competitive Layer

Google is not the only player here. The honest read is that Iceberg federation is a multi-vendor norm, and Google's contribution is making it the substrate of the agent platform.

  • Snowflake announced Polaris last year and continues expanding catalog federation. Cortex Code is its agent layer.
  • Databricks owns Unity Catalog, has read/write Iceberg interop, and runs its own Genie/Mosaic agent stack.
  • AWS Glue Data Catalog federates with Iceberg natively; AWS DevOps Agent and Security Agent are GA on Bedrock AgentCore.
  • Confluent Tableflow brings streaming Kafka data into Iceberg as managed tables — coming later in 2026.

What Google is betting on is that the agent layer + the data layer + the open format is more valuable as an integrated whole than any of the parts. The bet has merit. Snowflake and Databricks are stronger on data; AWS is stronger on infrastructure breadth; but Google is the only one shipping a unified stack from chips (TPU) to model (Gemini) to agent platform (Gemini Enterprise) to data (Cross-Cloud Lakehouse) to surface (Workspace).

For enterprises, the question is not which vendor wins — Iceberg federation means you don't have to choose — but where you run the agent reasoning loop. That decision will be driven by where your most-used models live, what your existing data gravity looks like, and which surface (Workspace, Office, Salesforce, ServiceNow) your end users already inhabit.

The Decision Framework for the Next 90 Days

For most enterprises, this is a quarter to evaluate, not commit. A practical playbook:

1. Audit your cross-cloud data flows. Most CIOs do not have a clean number for monthly cross-cloud egress. Pull the AWS Cost Explorer + Azure billing + GCP Network egress reports. The number is usually 30–60% higher than expected.

2. Identify your top 3 agentic workloads waiting on data residency. These are the workloads where you've said "we'd love to run this on Gemini, but the data lives in S3." Cross-Cloud Lakehouse moves them from blocked to feasible.

3. Standardize your table format on Apache Iceberg. If you are still on proprietary Delta-only or Parquet without a catalog, this is the quarter to commit. Iceberg's catalog interoperability is the unlock for everything downstream — including non-Google scenarios.

4. Run a pilot on Cross-Cloud Caching for one high-volume agent workload. Measure end-to-end latency vs. native, measure egress savings, measure agent reasoning quality. Most teams will find latency within 10–15% of native and cost 30–50% lower at steady state.

5. Renegotiate cross-cloud egress with AWS and Azure now. With Cross-Cloud Lakehouse on the table as a credible alternative to wholesale migration, your renewal conversation has different gravity. Ask for committed-use discounts on egress, "AI workload" exemptions, and price-protection guarantees. The leverage exists; use it.

6. Update your data governance program for federated catalogs. Knowledge Catalog (formerly Dataplex) provides table-level access controls, lineage, and quality across federated sources. Your DLP, masking, and audit programs need to extend to remote Iceberg tables before you scale agent access.

Teams that take these steps in Q2 2026 will be running production cross-cloud agentic workloads by Q4. Teams that wait will spend the back half of the year explaining why their AI roadmap is gated on a 12-month data-migration project they could have skipped.

The Bottom Line

The Cross-Cloud Lakehouse is not the most exciting announcement from Cloud Next 2026 — Gemini Enterprise Agent Platform and the new TPU generation got more press. It may be the most economically consequential.

For five years, "data gravity" has been the silent governor on multi-cloud AI ambition. You couldn't run the model where the data wasn't. You couldn't move the data without paying a tax that scaled with your data growth. Every enterprise AI roadmap has carried a data-migration line item, and every one of those line items was an excuse for delay.

Open Iceberg federation does not eliminate the problem. AWS still prices egress. Latency still matters. Compliance still requires data-residency decisions. But the strategic premise that you must consolidate data in one cloud to do serious AI is now demonstrably false. You can leave the data where it is and bring the compute — and the agents — to it.

For CIOs, CTOs, and CFOs, the playbook this quarter is straightforward: standardize on Iceberg, audit your cross-cloud flows, pilot federated agents on a workload you've been waiting to unblock, and reopen the egress negotiation with your incumbents. The teams that operationalize this pattern first will run their 2027 AI strategy on infrastructure their competitors are still budgeting to migrate.

That is a structurally better position to compete from.

Sources


Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Google Cross-Cloud Lakehouse: AI Agents Without Egress Fees

Photo by Brett Sayles on Pexels

Google just removed the most expensive friction point in multi-cloud AI.

On April 22, 2026, at Google Cloud Next 2026 in Las Vegas, Google announced the Cross-Cloud Lakehouse — an Apache Iceberg-based data platform that lets AI agents and analytics engines query data sitting in AWS S3 or Azure Data Lake as if it were native to Google Cloud, without copies, without ETL, and without egress fees.

If that sentence sounds incremental, the implications are not. The Cross-Cloud Lakehouse attacks the single biggest economic and architectural blocker to enterprise AI agents at scale — that 80% of enterprise data lives outside the cloud where the AI runs, and moving it has been priced as a tax that compounds with every agent invocation.

For CIOs, CTOs, and CFOs, this announcement reframes a decision that's been pending for two years: do we standardize our AI data layer on a single hyperscaler and pay the migration cost, or accept the per-query egress and latency penalty of cross-cloud federation? Google's answer is neither — federate at the open-format catalog layer and pay for compute, not for data movement.

What Actually Shipped

Cross-Cloud Lakehouse is the headline, but it's the visible tip of a coordinated rebrand and rebuild.

Google quietly renamed BigLake to Google Cloud Lakehouse on April 20, 2026 — a signal that the company is treating the lakehouse as a first-class product, not a BigQuery feature flag. The architecture stack now reads:

Layer Component Status
Catalog Lakehouse Runtime Catalog (Iceberg) GA
Catalog REST Catalog Endpoint Preview
Compute BigQuery + Managed Service for Apache Spark GA
Compute Lightning Engine (2× price-performance) GA
Cross-Cloud Cross-Cloud Caching to S3 / Azure Preview
Federation AWS Glue, Databricks, Snowflake, SAP catalogs Preview
Multimodal BigQuery ObjectRefs (unstructured + Iceberg) GA
Real-time Spanner / AlloyDB / Cloud SQL → Iceberg replication Preview
Agents BigQuery Data Engineering Agent, Data Science Agent Preview

The throughline is Apache Iceberg as the universal interchange format. Every component speaks Iceberg natively. Every cross-vendor integration is at the Iceberg catalog layer, not the data plane. That is a meaningful architectural choice — and a competitive one — because it means you do not have to leave Snowflake, Databricks, or AWS to participate.

Andi Gutmans, Google Cloud's VP and GM of Data Cloud, framed the design intent around what he called "agent gravity" — the autonomy loss agents experience when cross-cloud latency, egress costs, or access policies break their reasoning loop. The Cross-Cloud Lakehouse is engineered to let an agent reason across data residing in any of the three major clouds without the agent — or the user — having to know or care.

The Technical Read for CTOs and Data Architects

Strip away the marketing and three engineering decisions matter most.

First, Apache Iceberg is now the de facto enterprise table format. A year ago, Iceberg vs. Delta vs. Hudi was an active debate. Cloud Next 2026 effectively closed it. Snowflake (Polaris), Databricks (Unity Catalog with Iceberg read/write), AWS (Glue Data Catalog), Confluent (Tableflow), Salesforce (zero-copy via Iceberg with Google), and now Google Cloud Lakehouse all interoperate through Iceberg's REST catalog. If you are picking a table format in 2026, Iceberg is the answer. Anything else is a bet against the entire vendor ecosystem.

Second, the federation is bidirectional and catalog-native. Cross-Cloud Lakehouse doesn't ingest your AWS data — it federates at the Iceberg REST Catalog level through Cross-Cloud Caching (Preview). BigQuery and Managed Spark see remote Iceberg tables in S3 as if they were local. Conversely, Databricks and Snowflake can read Google Cloud Iceberg tables via their own catalog federation. That kind of bidirectional, vendor-neutral catalog is what the Hadoop-era Hive Metastore tried and failed to be — and the difference now is that the open-source format is mature, and the hyperscalers are all economically aligned to interoperate.

Third, Cross-Cloud Caching changes the cost curve. The headline feature isn't "we eliminated egress fees" — Google can't unilaterally do that, AWS still charges egress when you read S3 — it's that Cross-Cloud Caching keeps frequently accessed data in a Google-side cache, which dramatically reduces per-query egress for repeated agent reads. Combined with Cross-Cloud Interconnect's dedicated private networking, the per-token data-access cost for agentic workloads drops materially. Forrester's TEI study cites 117% ROI (run the numbers with our ROI calculator) with payback in under six months — that is mostly egress and ETL avoidance.

The agent integration layer is what most CTOs will care about operationally:

  • BigQuery Data Engineering Agent and Data Science Agent (Preview) speak Iceberg natively. They can plan and execute queries across federated catalogs without the agent needing to know which cloud the data is in.
  • Agent Developer Kit (ADK) + Model Context Protocol (MCP) sit on top, so agents you build in your own framework get the same data substrate.
  • Real-time replication from Spanner, AlloyDB, and Cloud SQL into Iceberg (Preview) closes the operational/analytical gap. Agents can act on data that is seconds old, not hours.

The pattern that emerges:

Operational DBs (Spanner/AlloyDB) ─┐
AWS S3 Iceberg ────────────────────┤
Azure Data Lake Iceberg ───────────┤── Google Cloud Lakehouse ── Agents
Snowflake Polaris (federated) ─────┤      (Iceberg + REST Catalog)
Databricks Unity (federated) ──────┘

For data architects, this is the borderless lakehouse pattern Iceberg promised in 2022 finally delivered as a managed service.

The Business Read for CFOs and Procurement

The cost story is what makes this real for finance.

Most enterprises today operate one of three painful patterns:

  1. Migrate everything to one cloud. The cost is denominated in petabytes × $0.02–$0.09 per GB egress + multi-quarter migration projects + ongoing dual-pipeline cost during cutover. For a 5-PB enterprise data estate, the egress alone can exceed $400K, and the all-in migration runs $5–$30M.

  2. Run AI in the cloud where the data already lives. This works only if your AI vendor of choice happens to live in the same cloud — which, given that Anthropic (AWS + GCP), OpenAI (Azure), and Google (GCP) split the field, is rarely true.

  3. Accept the cross-cloud egress tax. A typical agentic workload reading 10TB/month across cloud boundaries costs $900K–$1.1M annually in egress alone at standard list rates. Multiply by every agent in production and the AI line item starts looking like an undisclosed infrastructure tax.

Cross-Cloud Lakehouse changes the procurement math in three concrete ways:

It eliminates migration as a precondition for cross-cloud AI. You don't have to move 5PB out of S3 to run Gemini-powered agents on it. That removes the single biggest delay item from most enterprise AI roadmaps — typically 9–18 months of project work that produces zero AI capability while it's underway.

It compresses the per-query data-access cost for repeated reads via Cross-Cloud Caching. An agent that re-reads a sales dataset 50 times an hour pays the egress once, not 50 times. For high-frequency agentic workloads — customer support, fraud detection, supply chain — this is the difference between "pilot" and "production economics."

It restores cross-cloud negotiation leverage. When migration is the only alternative, AWS and Azure get to set egress prices because they know you can't move. With federated Iceberg, the threat is credible: we can compute against your data from another cloud at near-native cost. That changes the renewal conversation. Expect AWS and Azure to respond with discounted egress for committed-use customers and "lakehouse-friendly" pricing tiers in the next two quarters.

The named customer references back this up. Spotify is using the lakehouse to consolidate analytics across BigQuery, Dataflow, and OSS engines without duplication. Accenture's Global Lead for the Google Business Group — Scott Alfieri — credits the "zero-copy" architecture as what makes "agentic AI with surgical precision" achievable for clients. Those are not pilot quotes; they are reference customers running production patterns.

The Competitive Layer

Google is not the only player here. The honest read is that Iceberg federation is a multi-vendor norm, and Google's contribution is making it the substrate of the agent platform.

  • Snowflake announced Polaris last year and continues expanding catalog federation. Cortex Code is its agent layer.
  • Databricks owns Unity Catalog, has read/write Iceberg interop, and runs its own Genie/Mosaic agent stack.
  • AWS Glue Data Catalog federates with Iceberg natively; AWS DevOps Agent and Security Agent are GA on Bedrock AgentCore.
  • Confluent Tableflow brings streaming Kafka data into Iceberg as managed tables — coming later in 2026.

What Google is betting on is that the agent layer + the data layer + the open format is more valuable as an integrated whole than any of the parts. The bet has merit. Snowflake and Databricks are stronger on data; AWS is stronger on infrastructure breadth; but Google is the only one shipping a unified stack from chips (TPU) to model (Gemini) to agent platform (Gemini Enterprise) to data (Cross-Cloud Lakehouse) to surface (Workspace).

For enterprises, the question is not which vendor wins — Iceberg federation means you don't have to choose — but where you run the agent reasoning loop. That decision will be driven by where your most-used models live, what your existing data gravity looks like, and which surface (Workspace, Office, Salesforce, ServiceNow) your end users already inhabit.

The Decision Framework for the Next 90 Days

For most enterprises, this is a quarter to evaluate, not commit. A practical playbook:

1. Audit your cross-cloud data flows. Most CIOs do not have a clean number for monthly cross-cloud egress. Pull the AWS Cost Explorer + Azure billing + GCP Network egress reports. The number is usually 30–60% higher than expected.

2. Identify your top 3 agentic workloads waiting on data residency. These are the workloads where you've said "we'd love to run this on Gemini, but the data lives in S3." Cross-Cloud Lakehouse moves them from blocked to feasible.

3. Standardize your table format on Apache Iceberg. If you are still on proprietary Delta-only or Parquet without a catalog, this is the quarter to commit. Iceberg's catalog interoperability is the unlock for everything downstream — including non-Google scenarios.

4. Run a pilot on Cross-Cloud Caching for one high-volume agent workload. Measure end-to-end latency vs. native, measure egress savings, measure agent reasoning quality. Most teams will find latency within 10–15% of native and cost 30–50% lower at steady state.

5. Renegotiate cross-cloud egress with AWS and Azure now. With Cross-Cloud Lakehouse on the table as a credible alternative to wholesale migration, your renewal conversation has different gravity. Ask for committed-use discounts on egress, "AI workload" exemptions, and price-protection guarantees. The leverage exists; use it.

6. Update your data governance program for federated catalogs. Knowledge Catalog (formerly Dataplex) provides table-level access controls, lineage, and quality across federated sources. Your DLP, masking, and audit programs need to extend to remote Iceberg tables before you scale agent access.

Teams that take these steps in Q2 2026 will be running production cross-cloud agentic workloads by Q4. Teams that wait will spend the back half of the year explaining why their AI roadmap is gated on a 12-month data-migration project they could have skipped.

The Bottom Line

The Cross-Cloud Lakehouse is not the most exciting announcement from Cloud Next 2026 — Gemini Enterprise Agent Platform and the new TPU generation got more press. It may be the most economically consequential.

For five years, "data gravity" has been the silent governor on multi-cloud AI ambition. You couldn't run the model where the data wasn't. You couldn't move the data without paying a tax that scaled with your data growth. Every enterprise AI roadmap has carried a data-migration line item, and every one of those line items was an excuse for delay.

Open Iceberg federation does not eliminate the problem. AWS still prices egress. Latency still matters. Compliance still requires data-residency decisions. But the strategic premise that you must consolidate data in one cloud to do serious AI is now demonstrably false. You can leave the data where it is and bring the compute — and the agents — to it.

For CIOs, CTOs, and CFOs, the playbook this quarter is straightforward: standardize on Iceberg, audit your cross-cloud flows, pilot federated agents on a workload you've been waiting to unblock, and reopen the egress negotiation with your incumbents. The teams that operationalize this pattern first will run their 2027 AI strategy on infrastructure their competitors are still budgeting to migrate.

That is a structurally better position to compete from.

Sources


Continue Reading

Share:

THE DAILY BRIEF

Google CloudData LakehouseApache IcebergEnterprise AIMulti-CloudAgentic AI

Google Cross-Cloud Lakehouse: AI Agents Without Egress Fees

Google's Cross-Cloud Lakehouse lets AI agents query AWS S3 and Azure Data Lake as native via Apache Iceberg — zero copy, no egress. CIO and CFO impact.

By Rajesh Beri·April 25, 2026·11 min read

Google just removed the most expensive friction point in multi-cloud AI.

On April 22, 2026, at Google Cloud Next 2026 in Las Vegas, Google announced the Cross-Cloud Lakehouse — an Apache Iceberg-based data platform that lets AI agents and analytics engines query data sitting in AWS S3 or Azure Data Lake as if it were native to Google Cloud, without copies, without ETL, and without egress fees.

If that sentence sounds incremental, the implications are not. The Cross-Cloud Lakehouse attacks the single biggest economic and architectural blocker to enterprise AI agents at scale — that 80% of enterprise data lives outside the cloud where the AI runs, and moving it has been priced as a tax that compounds with every agent invocation.

For CIOs, CTOs, and CFOs, this announcement reframes a decision that's been pending for two years: do we standardize our AI data layer on a single hyperscaler and pay the migration cost, or accept the per-query egress and latency penalty of cross-cloud federation? Google's answer is neither — federate at the open-format catalog layer and pay for compute, not for data movement.

What Actually Shipped

Cross-Cloud Lakehouse is the headline, but it's the visible tip of a coordinated rebrand and rebuild.

Google quietly renamed BigLake to Google Cloud Lakehouse on April 20, 2026 — a signal that the company is treating the lakehouse as a first-class product, not a BigQuery feature flag. The architecture stack now reads:

Layer Component Status
Catalog Lakehouse Runtime Catalog (Iceberg) GA
Catalog REST Catalog Endpoint Preview
Compute BigQuery + Managed Service for Apache Spark GA
Compute Lightning Engine (2× price-performance) GA
Cross-Cloud Cross-Cloud Caching to S3 / Azure Preview
Federation AWS Glue, Databricks, Snowflake, SAP catalogs Preview
Multimodal BigQuery ObjectRefs (unstructured + Iceberg) GA
Real-time Spanner / AlloyDB / Cloud SQL → Iceberg replication Preview
Agents BigQuery Data Engineering Agent, Data Science Agent Preview

The throughline is Apache Iceberg as the universal interchange format. Every component speaks Iceberg natively. Every cross-vendor integration is at the Iceberg catalog layer, not the data plane. That is a meaningful architectural choice — and a competitive one — because it means you do not have to leave Snowflake, Databricks, or AWS to participate.

Andi Gutmans, Google Cloud's VP and GM of Data Cloud, framed the design intent around what he called "agent gravity" — the autonomy loss agents experience when cross-cloud latency, egress costs, or access policies break their reasoning loop. The Cross-Cloud Lakehouse is engineered to let an agent reason across data residing in any of the three major clouds without the agent — or the user — having to know or care.

The Technical Read for CTOs and Data Architects

Strip away the marketing and three engineering decisions matter most.

First, Apache Iceberg is now the de facto enterprise table format. A year ago, Iceberg vs. Delta vs. Hudi was an active debate. Cloud Next 2026 effectively closed it. Snowflake (Polaris), Databricks (Unity Catalog with Iceberg read/write), AWS (Glue Data Catalog), Confluent (Tableflow), Salesforce (zero-copy via Iceberg with Google), and now Google Cloud Lakehouse all interoperate through Iceberg's REST catalog. If you are picking a table format in 2026, Iceberg is the answer. Anything else is a bet against the entire vendor ecosystem.

Second, the federation is bidirectional and catalog-native. Cross-Cloud Lakehouse doesn't ingest your AWS data — it federates at the Iceberg REST Catalog level through Cross-Cloud Caching (Preview). BigQuery and Managed Spark see remote Iceberg tables in S3 as if they were local. Conversely, Databricks and Snowflake can read Google Cloud Iceberg tables via their own catalog federation. That kind of bidirectional, vendor-neutral catalog is what the Hadoop-era Hive Metastore tried and failed to be — and the difference now is that the open-source format is mature, and the hyperscalers are all economically aligned to interoperate.

Third, Cross-Cloud Caching changes the cost curve. The headline feature isn't "we eliminated egress fees" — Google can't unilaterally do that, AWS still charges egress when you read S3 — it's that Cross-Cloud Caching keeps frequently accessed data in a Google-side cache, which dramatically reduces per-query egress for repeated agent reads. Combined with Cross-Cloud Interconnect's dedicated private networking, the per-token data-access cost for agentic workloads drops materially. Forrester's TEI study cites 117% ROI (run the numbers with our ROI calculator) with payback in under six months — that is mostly egress and ETL avoidance.

The agent integration layer is what most CTOs will care about operationally:

  • BigQuery Data Engineering Agent and Data Science Agent (Preview) speak Iceberg natively. They can plan and execute queries across federated catalogs without the agent needing to know which cloud the data is in.
  • Agent Developer Kit (ADK) + Model Context Protocol (MCP) sit on top, so agents you build in your own framework get the same data substrate.
  • Real-time replication from Spanner, AlloyDB, and Cloud SQL into Iceberg (Preview) closes the operational/analytical gap. Agents can act on data that is seconds old, not hours.

The pattern that emerges:

Operational DBs (Spanner/AlloyDB) ─┐
AWS S3 Iceberg ────────────────────┤
Azure Data Lake Iceberg ───────────┤── Google Cloud Lakehouse ── Agents
Snowflake Polaris (federated) ─────┤      (Iceberg + REST Catalog)
Databricks Unity (federated) ──────┘

For data architects, this is the borderless lakehouse pattern Iceberg promised in 2022 finally delivered as a managed service.

The Business Read for CFOs and Procurement

The cost story is what makes this real for finance.

Most enterprises today operate one of three painful patterns:

  1. Migrate everything to one cloud. The cost is denominated in petabytes × $0.02–$0.09 per GB egress + multi-quarter migration projects + ongoing dual-pipeline cost during cutover. For a 5-PB enterprise data estate, the egress alone can exceed $400K, and the all-in migration runs $5–$30M.

  2. Run AI in the cloud where the data already lives. This works only if your AI vendor of choice happens to live in the same cloud — which, given that Anthropic (AWS + GCP), OpenAI (Azure), and Google (GCP) split the field, is rarely true.

  3. Accept the cross-cloud egress tax. A typical agentic workload reading 10TB/month across cloud boundaries costs $900K–$1.1M annually in egress alone at standard list rates. Multiply by every agent in production and the AI line item starts looking like an undisclosed infrastructure tax.

Cross-Cloud Lakehouse changes the procurement math in three concrete ways:

It eliminates migration as a precondition for cross-cloud AI. You don't have to move 5PB out of S3 to run Gemini-powered agents on it. That removes the single biggest delay item from most enterprise AI roadmaps — typically 9–18 months of project work that produces zero AI capability while it's underway.

It compresses the per-query data-access cost for repeated reads via Cross-Cloud Caching. An agent that re-reads a sales dataset 50 times an hour pays the egress once, not 50 times. For high-frequency agentic workloads — customer support, fraud detection, supply chain — this is the difference between "pilot" and "production economics."

It restores cross-cloud negotiation leverage. When migration is the only alternative, AWS and Azure get to set egress prices because they know you can't move. With federated Iceberg, the threat is credible: we can compute against your data from another cloud at near-native cost. That changes the renewal conversation. Expect AWS and Azure to respond with discounted egress for committed-use customers and "lakehouse-friendly" pricing tiers in the next two quarters.

The named customer references back this up. Spotify is using the lakehouse to consolidate analytics across BigQuery, Dataflow, and OSS engines without duplication. Accenture's Global Lead for the Google Business Group — Scott Alfieri — credits the "zero-copy" architecture as what makes "agentic AI with surgical precision" achievable for clients. Those are not pilot quotes; they are reference customers running production patterns.

The Competitive Layer

Google is not the only player here. The honest read is that Iceberg federation is a multi-vendor norm, and Google's contribution is making it the substrate of the agent platform.

  • Snowflake announced Polaris last year and continues expanding catalog federation. Cortex Code is its agent layer.
  • Databricks owns Unity Catalog, has read/write Iceberg interop, and runs its own Genie/Mosaic agent stack.
  • AWS Glue Data Catalog federates with Iceberg natively; AWS DevOps Agent and Security Agent are GA on Bedrock AgentCore.
  • Confluent Tableflow brings streaming Kafka data into Iceberg as managed tables — coming later in 2026.

What Google is betting on is that the agent layer + the data layer + the open format is more valuable as an integrated whole than any of the parts. The bet has merit. Snowflake and Databricks are stronger on data; AWS is stronger on infrastructure breadth; but Google is the only one shipping a unified stack from chips (TPU) to model (Gemini) to agent platform (Gemini Enterprise) to data (Cross-Cloud Lakehouse) to surface (Workspace).

For enterprises, the question is not which vendor wins — Iceberg federation means you don't have to choose — but where you run the agent reasoning loop. That decision will be driven by where your most-used models live, what your existing data gravity looks like, and which surface (Workspace, Office, Salesforce, ServiceNow) your end users already inhabit.

The Decision Framework for the Next 90 Days

For most enterprises, this is a quarter to evaluate, not commit. A practical playbook:

1. Audit your cross-cloud data flows. Most CIOs do not have a clean number for monthly cross-cloud egress. Pull the AWS Cost Explorer + Azure billing + GCP Network egress reports. The number is usually 30–60% higher than expected.

2. Identify your top 3 agentic workloads waiting on data residency. These are the workloads where you've said "we'd love to run this on Gemini, but the data lives in S3." Cross-Cloud Lakehouse moves them from blocked to feasible.

3. Standardize your table format on Apache Iceberg. If you are still on proprietary Delta-only or Parquet without a catalog, this is the quarter to commit. Iceberg's catalog interoperability is the unlock for everything downstream — including non-Google scenarios.

4. Run a pilot on Cross-Cloud Caching for one high-volume agent workload. Measure end-to-end latency vs. native, measure egress savings, measure agent reasoning quality. Most teams will find latency within 10–15% of native and cost 30–50% lower at steady state.

5. Renegotiate cross-cloud egress with AWS and Azure now. With Cross-Cloud Lakehouse on the table as a credible alternative to wholesale migration, your renewal conversation has different gravity. Ask for committed-use discounts on egress, "AI workload" exemptions, and price-protection guarantees. The leverage exists; use it.

6. Update your data governance program for federated catalogs. Knowledge Catalog (formerly Dataplex) provides table-level access controls, lineage, and quality across federated sources. Your DLP, masking, and audit programs need to extend to remote Iceberg tables before you scale agent access.

Teams that take these steps in Q2 2026 will be running production cross-cloud agentic workloads by Q4. Teams that wait will spend the back half of the year explaining why their AI roadmap is gated on a 12-month data-migration project they could have skipped.

The Bottom Line

The Cross-Cloud Lakehouse is not the most exciting announcement from Cloud Next 2026 — Gemini Enterprise Agent Platform and the new TPU generation got more press. It may be the most economically consequential.

For five years, "data gravity" has been the silent governor on multi-cloud AI ambition. You couldn't run the model where the data wasn't. You couldn't move the data without paying a tax that scaled with your data growth. Every enterprise AI roadmap has carried a data-migration line item, and every one of those line items was an excuse for delay.

Open Iceberg federation does not eliminate the problem. AWS still prices egress. Latency still matters. Compliance still requires data-residency decisions. But the strategic premise that you must consolidate data in one cloud to do serious AI is now demonstrably false. You can leave the data where it is and bring the compute — and the agents — to it.

For CIOs, CTOs, and CFOs, the playbook this quarter is straightforward: standardize on Iceberg, audit your cross-cloud flows, pilot federated agents on a workload you've been waiting to unblock, and reopen the egress negotiation with your incumbents. The teams that operationalize this pattern first will run their 2027 AI strategy on infrastructure their competitors are still budgeting to migrate.

That is a structurally better position to compete from.

Sources


Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Subscribe