Galtea's $3.2M: AI Testing Becomes Enterprise Infrastructure

Barcelona startup raises $3.2M from 42CAP and Mozilla to solve the problem keeping 95% of AI projects from production.

By Rajesh Beri·March 25, 2026·4 min read
Share:

THE DAILY BRIEF

AI InfrastructureEnterprise AIComplianceAI FundingDeployment

Galtea's $3.2M: AI Testing Becomes Enterprise Infrastructure

Barcelona startup raises $3.2M from 42CAP and Mozilla to solve the problem keeping 95% of AI projects from production.

By Rajesh Beri·March 25, 2026·4 min read

Building AI agents is easy now. Making sure they work in production? That's the $3.2 million problem Galtea just raised money to solve.

The Barcelona-based startup announced its seed round today, led by 42CAP with participation from Mozilla Ventures, bringing total funding to $4.1 million. The company spun out of the Barcelona Supercomputing Center in October 2024, and is building what amounts to quality assurance infrastructure for enterprise AI deployments.

The Production Gap

Here's the reality check: 95% of enterprise AI projects fail to deliver ROI, according to MIT's 2025 research. The models work. The APIs are accessible. The developer tooling has matured. What hasn't kept pace is the infrastructure for knowing whether what you built will behave reliably when real users hit it at scale.

Galtea's platform generates test cases and synthetic user simulations automatically from descriptions of how an AI agent should behave. The idea: create adversarial and edge-case scenarios at scale without requiring engineering teams to write them by hand—a process that is time-consuming, expensive, and rarely complete.

The platform evaluates across hallucination rates, bias, security vulnerabilities, and toxicity, outputting structured metrics that developers and compliance teams can use to make deployment decisions. This week, the company launched a self-service tier with a free trial, broadening access beyond its existing enterprise customer base.

Why Now: Regulation Creates Urgency

The timing isn't accidental. The EU AI Act now requires companies deploying AI in high-risk applications to document and validate their models' safety and compliance, with fines of up to €35 million for violations.

For European enterprises building AI products without systematic testing infrastructure, the regulation has created urgency. Galtea sits directly in that gap: helping development and legal teams produce the evidence of compliance that the regulation demands, without rebuilding workflows from scratch.

Mozilla Ventures' involvement signals that the round was framed partly around trustworthy-AI narrative—a thesis that has defined the fund's portfolio since its 2022 launch.

The Numbers That Matter

Galtea reports:

  • 71% reduction in operational costs for AI validation processes
  • 10× ROI combining direct savings and regulatory risk mitigation
  • 70%+ increase in team efficiency by reducing manual testing tasks
  • 23.6× improvement in vulnerability detection compared to manual processes

Those aren't vanity metrics. For CFOs evaluating AI infrastructure spend, the unit economics are straightforward: automated testing costs less than manual QA, catches more problems, and prevents production failures that damage customer trust.

What's Different

The founders bring unusual depth. CEO Jorge Palomar worked at Amazon and within BSC's Language Technologies research group. CTO Baybars Külebi holds a PhD in Astrophysics, co-founded several earlier language and audio technology projects, and spent years as a machine learning expert at BSC.

The technology was originally developed at BSC to evaluate large language models for internal research, running on MareNostrum 5—one of Europe's most powerful supercomputers. That provenance gives Galtea scientific credibility that eighteen-month-old startups don't typically have.

The Enterprise Calculus

Galtea evaluates the product end-to-end, not individual model calls. Modern AI products are pipelines: intent detection, retrieval, reasoning, output formatting—each node potentially running a different model. The platform tests what users experience, not which model powers it.

That's the right abstraction for enterprise buyers. CIOs don't care if your customer service agent uses GPT-4 or Claude. They care whether it hallucinates in front of customers, whether it's biased, and whether it passes compliance audits.

The platform is model-agnostic and framework-agnostic. LangChain, LlamaIndex, Vercel AI SDK, raw API calls—if your app calls an LLM, Galtea can evaluate it. That's table stakes for enterprise adoption.

What This Means for AI Buyers

For Technical Leaders:

  • Pre-production testing infrastructure is now a category, not a build-it-yourself project
  • Systematic evaluation beats hope-and-manual-QA for production readiness
  • Compliance documentation is automation-ready, not a legal bottleneck

For Business Leaders:

  • The gap between AI demos and production deployments has a measurable cost
  • Regulatory compliance isn't optional in Europe; automated validation is cheaper than manual processes
  • Unit economics favor platforms that prevent production failures over observability tools that report them

Action Items:

  1. Audit your current AI testing infrastructure. If you're shipping AI agents with hand-written test cases, you're in the 95% that stalls.
  2. Evaluate compliance gaps. EU AI Act deadlines are here. Documentation requirements won't go away.
  3. Compare pre-production testing vs. post-deployment monitoring. Both matter, but preventing failures costs less than fixing them in production.

The Bottom Line

Galtea isn't solving a novel research problem. They're building infrastructure for a production problem that already exists at scale. The 95% failure rate isn't a forecast—it's MIT data from 2025.

The regulatory environment creates forcing functions. The technical abstraction (product-level evaluation, not model-level) aligns with how enterprises actually buy AI. The unit economics are defensible.

That's the kind of boring, necessary infrastructure that becomes enterprise-critical. Testing isn't glamorous. But neither is explaining to your board why your AI agent hallucinated in front of a Fortune 500 customer.

$3.2 million to fix the invisible bottleneck. For enterprises deploying AI at scale, that's infrastructure spend, not a bet.


Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.

Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Galtea's $3.2M: AI Testing Becomes Enterprise Infrastructure

ThisisEngineering RAEng

Building AI agents is easy now. Making sure they work in production? That's the $3.2 million problem Galtea just raised money to solve.

The Barcelona-based startup announced its seed round today, led by 42CAP with participation from Mozilla Ventures, bringing total funding to $4.1 million. The company spun out of the Barcelona Supercomputing Center in October 2024, and is building what amounts to quality assurance infrastructure for enterprise AI deployments.

The Production Gap

Here's the reality check: 95% of enterprise AI projects fail to deliver ROI, according to MIT's 2025 research. The models work. The APIs are accessible. The developer tooling has matured. What hasn't kept pace is the infrastructure for knowing whether what you built will behave reliably when real users hit it at scale.

Galtea's platform generates test cases and synthetic user simulations automatically from descriptions of how an AI agent should behave. The idea: create adversarial and edge-case scenarios at scale without requiring engineering teams to write them by hand—a process that is time-consuming, expensive, and rarely complete.

The platform evaluates across hallucination rates, bias, security vulnerabilities, and toxicity, outputting structured metrics that developers and compliance teams can use to make deployment decisions. This week, the company launched a self-service tier with a free trial, broadening access beyond its existing enterprise customer base.

Why Now: Regulation Creates Urgency

The timing isn't accidental. The EU AI Act now requires companies deploying AI in high-risk applications to document and validate their models' safety and compliance, with fines of up to €35 million for violations.

For European enterprises building AI products without systematic testing infrastructure, the regulation has created urgency. Galtea sits directly in that gap: helping development and legal teams produce the evidence of compliance that the regulation demands, without rebuilding workflows from scratch.

Mozilla Ventures' involvement signals that the round was framed partly around trustworthy-AI narrative—a thesis that has defined the fund's portfolio since its 2022 launch.

The Numbers That Matter

Galtea reports:

  • 71% reduction in operational costs for AI validation processes
  • 10× ROI combining direct savings and regulatory risk mitigation
  • 70%+ increase in team efficiency by reducing manual testing tasks
  • 23.6× improvement in vulnerability detection compared to manual processes

Those aren't vanity metrics. For CFOs evaluating AI infrastructure spend, the unit economics are straightforward: automated testing costs less than manual QA, catches more problems, and prevents production failures that damage customer trust.

What's Different

The founders bring unusual depth. CEO Jorge Palomar worked at Amazon and within BSC's Language Technologies research group. CTO Baybars Külebi holds a PhD in Astrophysics, co-founded several earlier language and audio technology projects, and spent years as a machine learning expert at BSC.

The technology was originally developed at BSC to evaluate large language models for internal research, running on MareNostrum 5—one of Europe's most powerful supercomputers. That provenance gives Galtea scientific credibility that eighteen-month-old startups don't typically have.

The Enterprise Calculus

Galtea evaluates the product end-to-end, not individual model calls. Modern AI products are pipelines: intent detection, retrieval, reasoning, output formatting—each node potentially running a different model. The platform tests what users experience, not which model powers it.

That's the right abstraction for enterprise buyers. CIOs don't care if your customer service agent uses GPT-4 or Claude. They care whether it hallucinates in front of customers, whether it's biased, and whether it passes compliance audits.

The platform is model-agnostic and framework-agnostic. LangChain, LlamaIndex, Vercel AI SDK, raw API calls—if your app calls an LLM, Galtea can evaluate it. That's table stakes for enterprise adoption.

What This Means for AI Buyers

For Technical Leaders:

  • Pre-production testing infrastructure is now a category, not a build-it-yourself project
  • Systematic evaluation beats hope-and-manual-QA for production readiness
  • Compliance documentation is automation-ready, not a legal bottleneck

For Business Leaders:

  • The gap between AI demos and production deployments has a measurable cost
  • Regulatory compliance isn't optional in Europe; automated validation is cheaper than manual processes
  • Unit economics favor platforms that prevent production failures over observability tools that report them

Action Items:

  1. Audit your current AI testing infrastructure. If you're shipping AI agents with hand-written test cases, you're in the 95% that stalls.
  2. Evaluate compliance gaps. EU AI Act deadlines are here. Documentation requirements won't go away.
  3. Compare pre-production testing vs. post-deployment monitoring. Both matter, but preventing failures costs less than fixing them in production.

The Bottom Line

Galtea isn't solving a novel research problem. They're building infrastructure for a production problem that already exists at scale. The 95% failure rate isn't a forecast—it's MIT data from 2025.

The regulatory environment creates forcing functions. The technical abstraction (product-level evaluation, not model-level) aligns with how enterprises actually buy AI. The unit economics are defensible.

That's the kind of boring, necessary infrastructure that becomes enterprise-critical. Testing isn't glamorous. But neither is explaining to your board why your AI agent hallucinated in front of a Fortune 500 customer.

$3.2 million to fix the invisible bottleneck. For enterprises deploying AI at scale, that's infrastructure spend, not a bet.


Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.

Continue Reading

Share:

THE DAILY BRIEF

AI InfrastructureEnterprise AIComplianceAI FundingDeployment

Galtea's $3.2M: AI Testing Becomes Enterprise Infrastructure

Barcelona startup raises $3.2M from 42CAP and Mozilla to solve the problem keeping 95% of AI projects from production.

By Rajesh Beri·March 25, 2026·4 min read

Building AI agents is easy now. Making sure they work in production? That's the $3.2 million problem Galtea just raised money to solve.

The Barcelona-based startup announced its seed round today, led by 42CAP with participation from Mozilla Ventures, bringing total funding to $4.1 million. The company spun out of the Barcelona Supercomputing Center in October 2024, and is building what amounts to quality assurance infrastructure for enterprise AI deployments.

The Production Gap

Here's the reality check: 95% of enterprise AI projects fail to deliver ROI, according to MIT's 2025 research. The models work. The APIs are accessible. The developer tooling has matured. What hasn't kept pace is the infrastructure for knowing whether what you built will behave reliably when real users hit it at scale.

Galtea's platform generates test cases and synthetic user simulations automatically from descriptions of how an AI agent should behave. The idea: create adversarial and edge-case scenarios at scale without requiring engineering teams to write them by hand—a process that is time-consuming, expensive, and rarely complete.

The platform evaluates across hallucination rates, bias, security vulnerabilities, and toxicity, outputting structured metrics that developers and compliance teams can use to make deployment decisions. This week, the company launched a self-service tier with a free trial, broadening access beyond its existing enterprise customer base.

Why Now: Regulation Creates Urgency

The timing isn't accidental. The EU AI Act now requires companies deploying AI in high-risk applications to document and validate their models' safety and compliance, with fines of up to €35 million for violations.

For European enterprises building AI products without systematic testing infrastructure, the regulation has created urgency. Galtea sits directly in that gap: helping development and legal teams produce the evidence of compliance that the regulation demands, without rebuilding workflows from scratch.

Mozilla Ventures' involvement signals that the round was framed partly around trustworthy-AI narrative—a thesis that has defined the fund's portfolio since its 2022 launch.

The Numbers That Matter

Galtea reports:

  • 71% reduction in operational costs for AI validation processes
  • 10× ROI combining direct savings and regulatory risk mitigation
  • 70%+ increase in team efficiency by reducing manual testing tasks
  • 23.6× improvement in vulnerability detection compared to manual processes

Those aren't vanity metrics. For CFOs evaluating AI infrastructure spend, the unit economics are straightforward: automated testing costs less than manual QA, catches more problems, and prevents production failures that damage customer trust.

What's Different

The founders bring unusual depth. CEO Jorge Palomar worked at Amazon and within BSC's Language Technologies research group. CTO Baybars Külebi holds a PhD in Astrophysics, co-founded several earlier language and audio technology projects, and spent years as a machine learning expert at BSC.

The technology was originally developed at BSC to evaluate large language models for internal research, running on MareNostrum 5—one of Europe's most powerful supercomputers. That provenance gives Galtea scientific credibility that eighteen-month-old startups don't typically have.

The Enterprise Calculus

Galtea evaluates the product end-to-end, not individual model calls. Modern AI products are pipelines: intent detection, retrieval, reasoning, output formatting—each node potentially running a different model. The platform tests what users experience, not which model powers it.

That's the right abstraction for enterprise buyers. CIOs don't care if your customer service agent uses GPT-4 or Claude. They care whether it hallucinates in front of customers, whether it's biased, and whether it passes compliance audits.

The platform is model-agnostic and framework-agnostic. LangChain, LlamaIndex, Vercel AI SDK, raw API calls—if your app calls an LLM, Galtea can evaluate it. That's table stakes for enterprise adoption.

What This Means for AI Buyers

For Technical Leaders:

  • Pre-production testing infrastructure is now a category, not a build-it-yourself project
  • Systematic evaluation beats hope-and-manual-QA for production readiness
  • Compliance documentation is automation-ready, not a legal bottleneck

For Business Leaders:

  • The gap between AI demos and production deployments has a measurable cost
  • Regulatory compliance isn't optional in Europe; automated validation is cheaper than manual processes
  • Unit economics favor platforms that prevent production failures over observability tools that report them

Action Items:

  1. Audit your current AI testing infrastructure. If you're shipping AI agents with hand-written test cases, you're in the 95% that stalls.
  2. Evaluate compliance gaps. EU AI Act deadlines are here. Documentation requirements won't go away.
  3. Compare pre-production testing vs. post-deployment monitoring. Both matter, but preventing failures costs less than fixing them in production.

The Bottom Line

Galtea isn't solving a novel research problem. They're building infrastructure for a production problem that already exists at scale. The 95% failure rate isn't a forecast—it's MIT data from 2025.

The regulatory environment creates forcing functions. The technical abstraction (product-level evaluation, not model-level) aligns with how enterprises actually buy AI. The unit economics are defensible.

That's the kind of boring, necessary infrastructure that becomes enterprise-critical. Testing isn't glamorous. But neither is explaining to your board why your AI agent hallucinated in front of a Fortune 500 customer.

$3.2 million to fix the invisible bottleneck. For enterprises deploying AI at scale, that's infrastructure spend, not a bet.


Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.

Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Subscribe

Latest Articles

View All →