Resolve AI just raised $40 million in a Series A extension at a $1.5 billion valuation, led by DST Global and Salesforce Ventures. That's the headline. Here's what matters for enterprise leaders: this company is betting big on a new category called "AI for production"—automating the work of diagnosing and fixing software when it breaks in complex enterprise environments. And they've already got Coinbase, DoorDash, MSCI, Salesforce, and a Fortune 500 security company as customers.
The company has raised more than $190 million in just 18 months since emerging from stealth. That rapid capital accumulation signals serious investor confidence in a market that's been burned by AI hype. But Resolve AI isn't selling vaporware—they're solving a problem every CTO and VP of Engineering knows intimately: production incidents are expensive, time-consuming, and getting worse as systems grow more complex.
Traditional observability tools like Datadog, Splunk, and New Relic tell you WHAT broke. Resolve AI automates figuring out WHY and HOW to fix it. That's the fundamental shift—from data collection to automated reasoning and action. And according to Meir Amiel, Salesforce's Chief Trust and Infrastructure Officer, "What used to take hours of manual investigation and coordination across teams now gets resolved in a fraction of the time."
The Production Operations Problem: Why Now?
Enterprise software environments have become fundamentally harder to operate. You've got microservices spread across multi-cloud infrastructure, fragmented telemetry (logs, metrics, traces, infrastructure events), constantly changing dependencies, and the AI-generated code explosion making things worse. Engineering teams spend more time fighting fires than building features.
The traditional playbook—hire more SREs, buy more observability tools—doesn't scale. According to a 2026 LogicMonitor survey, 66% of organizations run 2-3 observability platforms, and 18% run 4-5. That means duplicate data pipelines, overlapping capabilities, integration overhead, and context-switching hell during incidents. You're paying more for tools but getting slower incident resolution.
This is where AI for production comes in. Instead of dashboards that show you 10,000 metrics and expect humans to connect the dots, Resolve AI builds domain-specific AI models that reason across noisy telemetry, understand complex dependencies, and recommend or execute fixes automatically. Think of it as moving from "here's all the data" to "here's what's broken and here's how to fix it."
The ROI case is straightforward: production incidents cost real money. Engineering time wasted on manual investigation, customer trust eroded by downtime, and business continuity disrupted by outages. If you can cut incident resolution time from hours to minutes—as Salesforce reported—you're talking about meaningful cost savings and competitive advantage.
What Makes Resolve AI Different?
Resolve AI is launching Resolve AI Labs to build domain-specific AI models for production operations. This is critical because general-purpose foundation models like GPT-4 or Claude weren't trained on production telemetry. They don't understand the nuances of log patterns, metric correlations, trace dependencies, or the operational constraints that make production different from development.
The company just hired Dhruv Mahajan as Chief AI Scientist—previously at Meta, where he led post-training for Llama foundation models. His job is to apply that large-scale model training expertise to building AI systems that work reliably in production environments. That means domain-specific post-training, evaluation frameworks for measuring accuracy in operational workflows, synthetic data generation, and governance guardrails for AI taking actions in production.
The technical challenges are legitimate. Production AI systems must reason across fragmented telemetry from dozens of sources, handle long-running multi-step workflows where mistakes have consequences, adapt to constantly changing systems, and meet strict requirements for accuracy, latency, and reliability. Off-the-shelf LLMs can't do this reliably—hence the investment in custom models.
Founded by observability pioneers Spiros Xanthos and Mayank Agarwal, Resolve AI combines custom AI models, production-specific agents, and deep systems expertise. This isn't a ChatGPT wrapper slapped on top of existing tools. It's purpose-built infrastructure for a new operational paradigm.
What Enterprise Leaders Should Watch
1. Cost Consolidation Opportunity: If you're running multiple observability platforms (Datadog + Splunk + New Relic + PagerDuty), you're paying for overlapping capabilities. AI-native platforms like Resolve AI promise to consolidate incident detection, diagnosis, and resolution into a single workflow—potentially reducing tool sprawl and associated costs.
2. Engineering Productivity Gains: The Salesforce quote about cutting resolution time from hours to minutes isn't marketing fluff—it's a real operational metric. If your SRE teams spend 30-40% of their time on incident response, cutting that by 50-70% through automation unlocks capacity for strategic work.
3. Vendor Lock-In Considerations: Domain-specific AI models trained on your production telemetry create switching costs. That's both a feature (better accuracy over time) and a risk (harder to migrate). Ask about data portability and model export capabilities before committing.
4. Accuracy and Trust Thresholds: AI that takes automated actions in production needs to be RIGHT. Ask vendors for evaluation metrics, false positive rates, and governance controls. Shadow mode deployment (AI recommendations run alongside human workflows for validation) should be table stakes.
5. Competitive Landscape: Watch how incumbents respond. Datadog, Splunk (now Cisco), New Relic, and PagerDuty all have AI initiatives. The question is whether they can retrofit AI into legacy architectures or if AI-native startups like Resolve AI have an architectural advantage.
The Broader Trend: AI Eating Operations
Resolve AI is part of a larger shift toward autonomous IT operations. Industry analysts forecast that by 2028, 40% of enterprise IT operations tasks will be fully automated through AI. That includes not just incident response, but capacity planning, cost optimization, security threat detection, and configuration management.
The 2026 observability market is consolidating around AI-native platforms. Trends include predictive analytics (catching issues before they cause outages), unified data platforms (breaking down tool silos), and autonomous operations (AI taking action without human intervention). Resolve AI is positioned at the intersection of all three.
For CFOs and business leaders, this matters because IT operations directly impacts revenue. Every minute of downtime costs money. Every engineer stuck debugging production issues instead of building features is opportunity cost. AI that makes operations faster, cheaper, and more reliable translates to better margins and competitive positioning.
For CTOs and technical leaders, the strategic question is whether to build or buy. Can you assemble this capability in-house with foundation models + custom tooling? Or does the domain-specific model training, operational expertise, and proven customer deployments justify buying from specialists like Resolve AI? There's no universal answer—it depends on your scale, engineering talent, and strategic priorities.
What to Do Next
If you're a CTO or VP Engineering at an enterprise running complex production environments:
-
Audit your current incident response workflow. How long does it take to detect, diagnose, and resolve typical incidents? Where are the bottlenecks? What percentage of engineering time goes to firefighting vs. feature development?
-
Evaluate your observability tool sprawl. How many platforms are you paying for? Are there overlapping capabilities? What would consolidation look like?
-
Run a pilot with AI-native incident management. Shadow mode deployment (AI recommendations run alongside human workflows) lets you measure accuracy and build trust before automating actions. Look for vendors offering free trials or POCs.
-
Invest in unified telemetry. AI works best with comprehensive, high-quality data. If your logs, metrics, and traces are fragmented across silos, start building connectors and standardized data pipelines.
-
Set governance guardrails for production AI. What actions can AI take autonomously? What requires human approval? How do you audit AI decisions? These policies should be in place before deployment, not retrofitted later.
The bottom line: AI for production is a nascent category, but the fundamentals are sound. Production incidents cost money. Manual investigation doesn't scale. Domain-specific AI can automate diagnosis and resolution. Whether Resolve AI specifically wins this market or incumbents catch up remains to be seen—but the trend toward autonomous operations is inevitable.
For enterprise leaders, the question isn't whether AI will operate production systems—it's whether you'll adopt early and gain competitive advantage, or wait until competitors force your hand.
Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.
Continue Reading
- How Enterprise AI Adoption Is Reshaping CIO Priorities in 2026
- The Real Cost of Tool Sprawl: Why Observability Platforms Are Consolidating
- From Reactive to Predictive: The Evolution of Enterprise IT Operations
Source: Resolve AI Official Announcement
Disclosure: Content reflects industry analysis based on publicly available information. No financial relationship with Resolve AI or competitors.