95% of AI Pilots Fail ROI: What the 5% Do Differently

MIT study reveals 95% of enterprise AI pilots deliver zero P&L impact. Here's the three-layer foundation that separates winners from the $675B spending spree.

By Rajesh Beri·May 16, 2026·6 min read
Share:

THE DAILY BRIEF

AI ROIEnterprise AIAI StrategyDigital Transformation

95% of AI Pilots Fail ROI: What the 5% Do Differently

MIT study reveals 95% of enterprise AI pilots deliver zero P&L impact. Here's the three-layer foundation that separates winners from the $675B spending spree.

By Rajesh Beri·May 16, 2026·6 min read

Hyperscalers will spend $675 billion on AI infrastructure in 2026, up 63% from last year. Yet MIT research reveals that 95% of enterprise AI pilots deliver zero measurable P&L impact. The gap between AI spending and AI proof has become the defining tension of the current cycle.

The data keeps getting worse. S&P Global found that 42% of companies abandoned most of their AI projects in 2025, more than double the prior year. IBM's CEO study put successful AI initiatives at just 25%. Morgan Stanley reported that only 21% of S&P 500 companies could cite a measurable AI benefit at all.

Investors noticed before most boards did. Citi identified a 30 basis point credit spread penalty for companies spending on AI without evidence of return. Translation: The debt market is already charging a premium for AI theater. The market is pricing the measurement gap in both equity and debt.

What Separates the 5% From the 95%

The companies pulling ahead didn't buy better models. They built three foundational layers underneath the technology before deploying it: measurement that proves whether AI tasks work, infrastructure that connects those tasks into automated workflows, and strategy that keeps the system learning.

The layers are sequential and nested. Most companies never built the first one, and the layers above collapsed as a result.

Layer 1: Measurement designed in, not bolted on. The fatal mistake happens at project kickoff. Most AI pilots launch without predefined success criteria, which means there's no way to declare success even if the technology performs exactly as designed. Early adopters tracked usage metrics — how many employees logged in, which teams had access. Those numbers were satisfying to report and completely irrelevant to the question that matters: Did the AI produce better outcomes than what it replaced?

Matt Marze, CIO of New York Life Group Benefit Solutions, treats AI investments the same way the company evaluates all capital allocation. "We look at operating expense reduction, margin improvement, top-line revenue growth, customer satisfaction, and client retention, but at the end of the day it boils down to our earnings contribution." That P&L mindset forces discipline. Projects that can't articulate their revenue or margin impact don't get funded.

Layer 2: Infrastructure that connects tasks into workflows. Measurement reveals whether individual AI tasks work. Infrastructure determines whether those tasks can scale into business value. Roughly 80% of the work required to move from pilot to production is data engineering, governance, workflow integration, and measurement infrastructure. Most organizations never budgeted for that reality.

Palo Alto Networks CIO Meerah Rajavel is automating 90% of IT operations using AI. The project jumped from 12% automated in early 2024 to 75% by late 2024. The key wasn't the AI model — it was having modernized systems, cloud-native infrastructure, and strategic data management ready before deployment. "There is a readiness component to leveraging AI effectively," she explained. "You have to have modernized computing, modernized apps, and cloud-native solutions to take advantage of AI."

Layer 3: Strategy that keeps the system learning. The first two layers enable execution. The third layer creates compound advantage. New York Life prioritizes AI initiatives in areas with available data, systems, and skills, then uses returns from those projects to fund subsequent initiatives. The company designs reusable AI systems so that each new project launches faster and cheaper than the last.

This isn't accidental. It's strategy designed to turn AI into a learning system rather than a cost center. Most companies treat AI as a series of disconnected pilots. Leaders treat it as infrastructure that gets stronger with use.

The Shift From Productivity Theater to P&L Impact

For two years, "AI ROI" meant time saved and employee satisfaction scores. That era is over. Boards are asking how AI contributes to EBITDA. CFOs want to see margin improvement or revenue growth. The productivity argument that dominated the generative AI pilot phase is no longer the leading success metric.

This shift explains why agentic AI is suddenly the priority. Unlike traditional AI tools that assist humans, agentic systems complete tasks independently and can be measured on outcomes rather than activity. Financial services firms are deploying agentic fraud detection systems with an 8-month average payback timeline. Sales agents are measured by signed contracts, not outreach volume. The focus moved from "did employees use it" to "did it move the P&L."

The market is already pricing this difference. Companies that score as dual leaders on measurement and infrastructure returned 41.38% over twelve months versus the S&P 500's 29.40% — a spread of nearly 1,200 basis points. Companies with only one layer trail the benchmark.

Why Most AI Projects Never Had a Chance

The root cause isn't technological. Bank Director's 2025 survey of 141 directors at banks under $100 billion found that 82% don't measure ROI on any technology investment, not just AI. S&P Global's banking survey revealed that 91% of boards approved AI programs while only 26% had the capability to execute them.

Most AI failures trace back to organizational gaps — culture, governance, workflow design, data strategy — rather than model limitations. When 95% of pilots fail to deliver P&L impact, the problem isn't the technology. It's that organizations deployed pilots without the foundation needed to turn them into production systems.

Terminal X ran a twelve-report analysis across five sectors — financial services, defense/aerospace, healthcare, manufacturing/energy, and enterprise technology. The measurement gap in banking turned out to be the same gap in defense, healthcare, manufacturing, and enterprise tech. The structure is universal.

What Leaders Should Do Monday Morning

Stop measuring activity. Start measuring outcomes. If your AI project can't articulate its revenue or margin impact before deployment, don't fund it. Define success criteria that connect to financial statements, not usage dashboards.

Audit your infrastructure readiness. AI requires modernized data systems, cloud-native architecture, and workflow integration before it can deliver at scale. If you're running pilots on legacy infrastructure, you're running experiments, not deployments. Budget for the 80% of work that happens after the pilot proves the technology works.

Build a learning system, not a pilot farm. Prioritize AI in areas where your data, systems, and skills are already strong. Use returns from early wins to fund harder problems. Design reusable components so each new project launches faster than the last. Compound advantage comes from treating AI as infrastructure, not innovation theater.

Demand P&L accountability. The era of "soft ROI" is over. If your team can't explain how AI contributes to earnings, you're funding productivity theater while competitors build systems that scale. The 30 basis point credit spread penalty for AI spenders without proof of return isn't going away — it's expanding.

The companies that built measurement, infrastructure, and strategy layers before deploying AI are compounding their advantage. The rest are explaining to boards why $675 billion in spending produced zero P&L impact. The gap between those two outcomes is the entire story of enterprise AI in 2026.


Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

95% of AI Pilots Fail ROI: What the 5% Do Differently

Photo by Anna Nekrashevich on Pexels

Hyperscalers will spend $675 billion on AI infrastructure in 2026, up 63% from last year. Yet MIT research reveals that 95% of enterprise AI pilots deliver zero measurable P&L impact. The gap between AI spending and AI proof has become the defining tension of the current cycle.

The data keeps getting worse. S&P Global found that 42% of companies abandoned most of their AI projects in 2025, more than double the prior year. IBM's CEO study put successful AI initiatives at just 25%. Morgan Stanley reported that only 21% of S&P 500 companies could cite a measurable AI benefit at all.

Investors noticed before most boards did. Citi identified a 30 basis point credit spread penalty for companies spending on AI without evidence of return. Translation: The debt market is already charging a premium for AI theater. The market is pricing the measurement gap in both equity and debt.

What Separates the 5% From the 95%

The companies pulling ahead didn't buy better models. They built three foundational layers underneath the technology before deploying it: measurement that proves whether AI tasks work, infrastructure that connects those tasks into automated workflows, and strategy that keeps the system learning.

The layers are sequential and nested. Most companies never built the first one, and the layers above collapsed as a result.

Layer 1: Measurement designed in, not bolted on. The fatal mistake happens at project kickoff. Most AI pilots launch without predefined success criteria, which means there's no way to declare success even if the technology performs exactly as designed. Early adopters tracked usage metrics — how many employees logged in, which teams had access. Those numbers were satisfying to report and completely irrelevant to the question that matters: Did the AI produce better outcomes than what it replaced?

Matt Marze, CIO of New York Life Group Benefit Solutions, treats AI investments the same way the company evaluates all capital allocation. "We look at operating expense reduction, margin improvement, top-line revenue growth, customer satisfaction, and client retention, but at the end of the day it boils down to our earnings contribution." That P&L mindset forces discipline. Projects that can't articulate their revenue or margin impact don't get funded.

Layer 2: Infrastructure that connects tasks into workflows. Measurement reveals whether individual AI tasks work. Infrastructure determines whether those tasks can scale into business value. Roughly 80% of the work required to move from pilot to production is data engineering, governance, workflow integration, and measurement infrastructure. Most organizations never budgeted for that reality.

Palo Alto Networks CIO Meerah Rajavel is automating 90% of IT operations using AI. The project jumped from 12% automated in early 2024 to 75% by late 2024. The key wasn't the AI model — it was having modernized systems, cloud-native infrastructure, and strategic data management ready before deployment. "There is a readiness component to leveraging AI effectively," she explained. "You have to have modernized computing, modernized apps, and cloud-native solutions to take advantage of AI."

Layer 3: Strategy that keeps the system learning. The first two layers enable execution. The third layer creates compound advantage. New York Life prioritizes AI initiatives in areas with available data, systems, and skills, then uses returns from those projects to fund subsequent initiatives. The company designs reusable AI systems so that each new project launches faster and cheaper than the last.

This isn't accidental. It's strategy designed to turn AI into a learning system rather than a cost center. Most companies treat AI as a series of disconnected pilots. Leaders treat it as infrastructure that gets stronger with use.

The Shift From Productivity Theater to P&L Impact

For two years, "AI ROI" meant time saved and employee satisfaction scores. That era is over. Boards are asking how AI contributes to EBITDA. CFOs want to see margin improvement or revenue growth. The productivity argument that dominated the generative AI pilot phase is no longer the leading success metric.

This shift explains why agentic AI is suddenly the priority. Unlike traditional AI tools that assist humans, agentic systems complete tasks independently and can be measured on outcomes rather than activity. Financial services firms are deploying agentic fraud detection systems with an 8-month average payback timeline. Sales agents are measured by signed contracts, not outreach volume. The focus moved from "did employees use it" to "did it move the P&L."

The market is already pricing this difference. Companies that score as dual leaders on measurement and infrastructure returned 41.38% over twelve months versus the S&P 500's 29.40% — a spread of nearly 1,200 basis points. Companies with only one layer trail the benchmark.

Why Most AI Projects Never Had a Chance

The root cause isn't technological. Bank Director's 2025 survey of 141 directors at banks under $100 billion found that 82% don't measure ROI on any technology investment, not just AI. S&P Global's banking survey revealed that 91% of boards approved AI programs while only 26% had the capability to execute them.

Most AI failures trace back to organizational gaps — culture, governance, workflow design, data strategy — rather than model limitations. When 95% of pilots fail to deliver P&L impact, the problem isn't the technology. It's that organizations deployed pilots without the foundation needed to turn them into production systems.

Terminal X ran a twelve-report analysis across five sectors — financial services, defense/aerospace, healthcare, manufacturing/energy, and enterprise technology. The measurement gap in banking turned out to be the same gap in defense, healthcare, manufacturing, and enterprise tech. The structure is universal.

What Leaders Should Do Monday Morning

Stop measuring activity. Start measuring outcomes. If your AI project can't articulate its revenue or margin impact before deployment, don't fund it. Define success criteria that connect to financial statements, not usage dashboards.

Audit your infrastructure readiness. AI requires modernized data systems, cloud-native architecture, and workflow integration before it can deliver at scale. If you're running pilots on legacy infrastructure, you're running experiments, not deployments. Budget for the 80% of work that happens after the pilot proves the technology works.

Build a learning system, not a pilot farm. Prioritize AI in areas where your data, systems, and skills are already strong. Use returns from early wins to fund harder problems. Design reusable components so each new project launches faster than the last. Compound advantage comes from treating AI as infrastructure, not innovation theater.

Demand P&L accountability. The era of "soft ROI" is over. If your team can't explain how AI contributes to earnings, you're funding productivity theater while competitors build systems that scale. The 30 basis point credit spread penalty for AI spenders without proof of return isn't going away — it's expanding.

The companies that built measurement, infrastructure, and strategy layers before deploying AI are compounding their advantage. The rest are explaining to boards why $675 billion in spending produced zero P&L impact. The gap between those two outcomes is the entire story of enterprise AI in 2026.


Continue Reading

Share:

THE DAILY BRIEF

AI ROIEnterprise AIAI StrategyDigital Transformation

95% of AI Pilots Fail ROI: What the 5% Do Differently

MIT study reveals 95% of enterprise AI pilots deliver zero P&L impact. Here's the three-layer foundation that separates winners from the $675B spending spree.

By Rajesh Beri·May 16, 2026·6 min read

Hyperscalers will spend $675 billion on AI infrastructure in 2026, up 63% from last year. Yet MIT research reveals that 95% of enterprise AI pilots deliver zero measurable P&L impact. The gap between AI spending and AI proof has become the defining tension of the current cycle.

The data keeps getting worse. S&P Global found that 42% of companies abandoned most of their AI projects in 2025, more than double the prior year. IBM's CEO study put successful AI initiatives at just 25%. Morgan Stanley reported that only 21% of S&P 500 companies could cite a measurable AI benefit at all.

Investors noticed before most boards did. Citi identified a 30 basis point credit spread penalty for companies spending on AI without evidence of return. Translation: The debt market is already charging a premium for AI theater. The market is pricing the measurement gap in both equity and debt.

What Separates the 5% From the 95%

The companies pulling ahead didn't buy better models. They built three foundational layers underneath the technology before deploying it: measurement that proves whether AI tasks work, infrastructure that connects those tasks into automated workflows, and strategy that keeps the system learning.

The layers are sequential and nested. Most companies never built the first one, and the layers above collapsed as a result.

Layer 1: Measurement designed in, not bolted on. The fatal mistake happens at project kickoff. Most AI pilots launch without predefined success criteria, which means there's no way to declare success even if the technology performs exactly as designed. Early adopters tracked usage metrics — how many employees logged in, which teams had access. Those numbers were satisfying to report and completely irrelevant to the question that matters: Did the AI produce better outcomes than what it replaced?

Matt Marze, CIO of New York Life Group Benefit Solutions, treats AI investments the same way the company evaluates all capital allocation. "We look at operating expense reduction, margin improvement, top-line revenue growth, customer satisfaction, and client retention, but at the end of the day it boils down to our earnings contribution." That P&L mindset forces discipline. Projects that can't articulate their revenue or margin impact don't get funded.

Layer 2: Infrastructure that connects tasks into workflows. Measurement reveals whether individual AI tasks work. Infrastructure determines whether those tasks can scale into business value. Roughly 80% of the work required to move from pilot to production is data engineering, governance, workflow integration, and measurement infrastructure. Most organizations never budgeted for that reality.

Palo Alto Networks CIO Meerah Rajavel is automating 90% of IT operations using AI. The project jumped from 12% automated in early 2024 to 75% by late 2024. The key wasn't the AI model — it was having modernized systems, cloud-native infrastructure, and strategic data management ready before deployment. "There is a readiness component to leveraging AI effectively," she explained. "You have to have modernized computing, modernized apps, and cloud-native solutions to take advantage of AI."

Layer 3: Strategy that keeps the system learning. The first two layers enable execution. The third layer creates compound advantage. New York Life prioritizes AI initiatives in areas with available data, systems, and skills, then uses returns from those projects to fund subsequent initiatives. The company designs reusable AI systems so that each new project launches faster and cheaper than the last.

This isn't accidental. It's strategy designed to turn AI into a learning system rather than a cost center. Most companies treat AI as a series of disconnected pilots. Leaders treat it as infrastructure that gets stronger with use.

The Shift From Productivity Theater to P&L Impact

For two years, "AI ROI" meant time saved and employee satisfaction scores. That era is over. Boards are asking how AI contributes to EBITDA. CFOs want to see margin improvement or revenue growth. The productivity argument that dominated the generative AI pilot phase is no longer the leading success metric.

This shift explains why agentic AI is suddenly the priority. Unlike traditional AI tools that assist humans, agentic systems complete tasks independently and can be measured on outcomes rather than activity. Financial services firms are deploying agentic fraud detection systems with an 8-month average payback timeline. Sales agents are measured by signed contracts, not outreach volume. The focus moved from "did employees use it" to "did it move the P&L."

The market is already pricing this difference. Companies that score as dual leaders on measurement and infrastructure returned 41.38% over twelve months versus the S&P 500's 29.40% — a spread of nearly 1,200 basis points. Companies with only one layer trail the benchmark.

Why Most AI Projects Never Had a Chance

The root cause isn't technological. Bank Director's 2025 survey of 141 directors at banks under $100 billion found that 82% don't measure ROI on any technology investment, not just AI. S&P Global's banking survey revealed that 91% of boards approved AI programs while only 26% had the capability to execute them.

Most AI failures trace back to organizational gaps — culture, governance, workflow design, data strategy — rather than model limitations. When 95% of pilots fail to deliver P&L impact, the problem isn't the technology. It's that organizations deployed pilots without the foundation needed to turn them into production systems.

Terminal X ran a twelve-report analysis across five sectors — financial services, defense/aerospace, healthcare, manufacturing/energy, and enterprise technology. The measurement gap in banking turned out to be the same gap in defense, healthcare, manufacturing, and enterprise tech. The structure is universal.

What Leaders Should Do Monday Morning

Stop measuring activity. Start measuring outcomes. If your AI project can't articulate its revenue or margin impact before deployment, don't fund it. Define success criteria that connect to financial statements, not usage dashboards.

Audit your infrastructure readiness. AI requires modernized data systems, cloud-native architecture, and workflow integration before it can deliver at scale. If you're running pilots on legacy infrastructure, you're running experiments, not deployments. Budget for the 80% of work that happens after the pilot proves the technology works.

Build a learning system, not a pilot farm. Prioritize AI in areas where your data, systems, and skills are already strong. Use returns from early wins to fund harder problems. Design reusable components so each new project launches faster than the last. Compound advantage comes from treating AI as infrastructure, not innovation theater.

Demand P&L accountability. The era of "soft ROI" is over. If your team can't explain how AI contributes to earnings, you're funding productivity theater while competitors build systems that scale. The 30 basis point credit spread penalty for AI spenders without proof of return isn't going away — it's expanding.

The companies that built measurement, infrastructure, and strategy layers before deploying AI are compounding their advantage. The rest are explaining to boards why $675 billion in spending produced zero P&L impact. The gap between those two outcomes is the entire story of enterprise AI in 2026.


Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Subscribe