94% Can't Prove AI Coding ROI. Harness Built the Receipt.

Harness launches AI DLC Insights as 94% of engineering leaders admit they can't measure AI coding ROI. ROI calculator + 25-point readiness assessment inside.

By Rajesh Beri·May 30, 2026·16 min read
Share:

THE DAILY BRIEF

Enterprise AIEngineering ROIAI FinOpsCIODeveloper Productivity

94% Can't Prove AI Coding ROI. Harness Built the Receipt.

Harness launches AI DLC Insights as 94% of engineering leaders admit they can't measure AI coding ROI. ROI calculator + 25-point readiness assessment inside.

By Rajesh Beri·May 30, 2026·16 min read

Harness shipped the receipt. On May 28, the company launched AI DLC Insights and Cloud & AI Cost Management, two beta products aimed squarely at the problem its own 700-respondent April 2026 survey just quantified: 94% of engineering leaders admit the metrics that matter most are missing from how they measure AI's impact. The same survey, run by Sapio Research across the US, UK, India, France, and Germany, found 89% of those same leaders also believe AI has improved developer productivity. Both statements cannot be true at once. Either the leaders are wrong about the productivity gains, or they cannot prove them. In May 2026, with Gartner forecasting $2.59 trillion in worldwide AI spending, the second answer is the more expensive one.

This is the moment enterprise AI moved from adoption metrics to accountability metrics. The CIOs who survive the FY27 budget review are the ones who can answer one question — "show me the dollar that turned into shipped code" — and the tooling to actually answer it just shipped in beta. What changed, why it matters now, and the two practical frameworks every engineering leader needs to score their organization before the September board cycle starts.

What Changed

On May 28, 2026, Harness announced two complementary products that close the AI ROI loop from the developer's IDE to the production agent's last token, both available in beta now and built on top of the company's existing Software Engineering Insights and Cloud Cost Management platforms.

AI DLC Insights is the developer-side half. It deploys an on-machine agent inside the IDE and terminal, captures every AI-generated line of code, records the token cost per model and per tool, and then maps that spend through the delivery chain to the PR, ticket, and deployment it actually produced. The agent works across Claude Code, Cursor, GitHub Copilot, and Windsurf — meaning a single engineering org running a mixed coding-agent stack can finally see which tool, which developer, and which prompt pattern is producing shipped work versus abandoned code. The product surfaces wasted spend (bloated prompts, expensive model choices, abandoned generations) and correlates per-developer token economics with DORA metrics, ship rates, and downstream incident and vulnerability data.

Cloud & AI Cost Management is the infrastructure-side half. It connects directly to OpenAI, Anthropic, AWS Bedrock, and GCP Vertex AI, captures spend at the individual request level, and ties each request back to the agent, session, or workflow that triggered it. Crucially, it extends Harness's existing FinOps controls — unit economics, anomaly detection, budget governance — to AI infrastructure, so the question "is this agent worth what it costs?" finally has a number with three significant digits behind it. Budget governance can be set at the agent level, the team level, or the business-unit level, with anomaly detection that fires before — not after — a runaway agent session burns through a quarter's allocation.

Both products are anchored to a survey finding that is hard to ignore. In April 2026, Harness commissioned Sapio Research to survey 700 software engineering practitioners and managers across five countries. The headline numbers: 89% of engineering leaders report improved developer productivity since adopting AI coding tools, 88% say developer satisfaction has improved, 89% believe current metrics accurately reflect AI's impact — and yet 94% acknowledge that key factors are missing from those same metrics. Only 6% believe their existing frameworks can address the gap. The same report found that 81% of leaders say developers spend more time in code review since AI adoption, with 28% reporting that the increase exceeds 30%.

Trevor Stuart, Harness SVP and General Manager, framed the launch bluntly in the company's announcement: "we're spending more on AI than ever, so why can't we show what it's doing for us?" Named customer references at launch included United Airlines, Morningstar, and Choice Hotels — the kind of regulated, change-managed enterprise that has been deploying AI coding tools for two years and is now being asked, by CFOs and audit committees, to justify the line item.

The competitive context matters. Atlassian acquired DX for $1 billion in late 2025 and now bundles the Developer Experience Index — a 14-factor productivity score — inside the Jira/Bitbucket platform. Jellyfish runs the DevFinOps and allocations model that translates engineering effort into board-ready financials. LinearB leans into automation over dashboards. Faros AI markets multi-tool visibility across 50+ systems. Harness's new positioning — token economics tied to shipped work, end-to-end — is the first product on the market that treats AI coding spend as a first-class FinOps category rather than an engineering analytics afterthought.

Why This Matters

This is not a feature announcement. It is the first credible answer to the question that is about to consume every Q3 budget review.

Technical Implications for CIOs and CTOs

For two years, the AI coding stack has been measured the way SaaS seats are measured: adoption rate, daily active users, lines of code generated. None of those numbers connect to shipped work, code quality, or production stability. The 2026 DORA report makes the consequence concrete: individual output is up sharply — 21% more tasks completed, 98% more PRs merged per developer — but bugs per developer have risen 54%, incidents per PR are up 242.7%, and delivery stability has dropped 7.2%. The same Google-published research models a 500-person engineering organization at 39% first-year ROI on AI coding, with an 8-month payback and $11.6 million in value against $8.4 million of investment — but warns those returns evaporate without the underlying engineering foundations: testing, CI/CD, observability, and clean toolchain telemetry.

The architectural implication is that ROI measurement is now an integration problem, not an analytics problem. To compute true return per developer per week, the platform has to (1) capture token spend at the IDE, (2) capture infrastructure spend at the inference API, (3) tie both to ticket, PR, and deployment IDs, (4) overlay DORA metrics for stability and lead time, and (5) reconcile against incident and security signals downstream. Few enterprise toolchains have that data plumbed end-to-end. The teams that win FY27 budget are the ones that close that loop in Q3.

Business Implications for CFOs, COOs, and Boards

The financial framing is sharper. Gartner's John-David Lovelock has gone on record that 40% of CIOs currently say "I can't point to the value that we get from AI." IDC's FutureScape 2026 projects G1000 organizations face up to a 30% upward revision in underestimated AI infrastructure costs by 2027, driven by what IDC calls the "opaque consumption models" of agentic workloads. AnalyticsWeek has documented a $400 million collective cloud spend leak across the Fortune 500, driven by agent sessions running without per-session cost ceilings. The public examples — a $47,000 single-deployment overrun documented on Hacker News in late 2025, a $4,000/month misconfigured pipeline reported on Medium in April 2026 — are the tip of an iceberg whose mass shows up in the consolidated invoice from OpenAI, Anthropic, and the hyperscalers.

For CFOs sitting in front of audit committees, this is the year unmanaged AI spend becomes a material control weakness. The fact that only 44% of organizations have adopted financial guardrails or AI FinOps practices, per Gartner's March 2026 survey of 353 D&A and AI leaders, is no longer defensible as a "we're still piloting" answer. Gartner's correlated finding — that 40% of agentic AI projects will be canceled by end of 2027 due to cost overruns, unclear value, or risk controls — is the cancellation list those audit committees are about to write.

Market Context

The AI coding ROI category did not exist as a discrete buyer two years ago. In May 2026 it is one of the fastest-consolidating segments in enterprise software.

Cursor has crossed $2 billion in ARR. Claude Code now leads developer satisfaction at 46% and is being deployed inside Anthropic's enterprise services JV. GitHub Copilot is in 90% of Fortune 100 companies. 41% of all code written globally in 2026 is AI-generated or AI-assisted, with 22% of merged code AI-authored. Daily AI users merge approximately 60% more pull requests than light users. The volume question is settled. The accountability question is the next bottleneck.

Real enterprise outcomes are emerging — and they are uneven. Bancolombia reports a 30% code generation boost and 18,000 changes per year attributable to GitHub Copilot. EchoStar Hughes reports 25% productivity and 35,000 hours saved using Copilot. JPMorgan publicly disclosed a 10–20% productivity increase from its AI coding rollout. But the Stanford research cited in the DORA report is more nuanced: 35–40% productivity gain on greenfield, simple tasks, and only ~10% impact on complex legacy code — which is where most enterprise engineering hours go. Realistic enterprise ROI ranges, per a synthesis of these benchmarks, are 2.5–3.5x average and 4–6x for top-quartile teams, but only when the cost denominator includes actual per-developer token spend (typically $200–$600/month, with agentic tools running up to $2,000+/developer/month) — not just the seat license.

The analyst signal is converging. Forrester sees 2026 as the year multi-agent systems scale, but only inside enterprises that can measure them. Gartner has put AI FinOps onto its priority lists alongside agentic governance and agentic security. IDC's "opaque consumption" framing is being adopted by FinOps Foundation chapters as the operating definition for what AI cost management has to solve. The category is moving from "nice-to-have analytics" to "control objective the CFO signs off on."

The vendor field is consolidating accordingly. DX's $1 billion Atlassian outcome set the strategic-acquisition price floor. Jellyfish, LinearB, and Faros AI are all racing to add AI-specific cost attribution to their existing engineering intelligence platforms. Harness's advantage, with this launch, is that the FinOps spine — Cloud Cost Management with $1B+ in customer cloud spend already under management — was already in place. Bolting AI-aware unit economics onto it is a closer step than what the pure-play engineering analytics vendors face.

Framework #1: The AI Coding ROI Calculator (CFO-Ready)

Before approving the next AI coding tool renewal — or, more likely, before defending the existing rollout to a finance review — engineering leaders need a defensible per-engineer ROI number that includes everything finance is going to ask about. The calculator below is calibrated to the 2026 benchmarks above.

Three scenarios. Same math. Different inputs.

Inputs the model needs:

  1. Fully loaded developer cost (salary + benefits + overhead). US enterprise average: $200,000/year.
  2. AI coding tool seat license per developer per month.
  3. Average per-developer token spend per month (this is the line item most CFOs do not see).
  4. Realistic productivity gain percentage (apply the legacy-vs-greenfield adjustment).
  5. Code review overhead increase (81% of teams report this; budget the cost).
  6. Incident/quality offset (DORA's 242.7% incident-per-PR increase, where applicable).
Scenario Small Team (25 devs) Mid-Size (250 devs) Enterprise (1,000 devs)
Fully loaded annual dev cost $5.0M $50.0M $200.0M
Inline AI tool ($30/dev/mo) $9,000/yr $90,000/yr $360,000/yr
Agentic token spend ($400/dev/mo avg) $120,000/yr $1,200,000/yr $4,800,000/yr
Total AI investment $129,000 $1,290,000 $5,160,000
Realistic productivity gain (mixed workload, 18%) $900,000 $9,000,000 $36,000,000
Less: code review overhead (5% of dev cost) ($250,000) ($2,500,000) ($10,000,000)
Less: incident remediation offset (2.5%) ($125,000) ($1,250,000) ($5,000,000)
Net annual value $525,000 $5,250,000 $21,000,000
Year-1 ROI 307% 307% 307%

How to use this with your CFO:

  • Run the same model three times with three productivity assumptions: 10% (legacy-heavy), 18% (mixed), 30% (greenfield-heavy). Show all three numbers. The DORA report explicitly recommends presenting ROI as a range, not a point estimate, because the underlying productivity gain depends on what your developers actually work on.
  • Include the offsets. If you submit a productivity-gain-only number to finance, you will lose credibility the first time an incident is traced back to AI-generated code. The 5% review overhead and 2.5% incident offset above are defensible based on the 2026 DORA data and the Harness survey's 81% review-time finding.
  • Make the token spend explicit. Agentic coding tools at $200–$2,000+/developer/month in token spend are typically buried in cloud invoices, not in software budgets. CFOs are about to ask why this line did not appear in the FY26 plan. The receipt is the answer.

Framework #2: The 25-Point AI Coding ROI Readiness Assessment

Tooling alone does not produce ROI. The DORA report's central finding is that AI acts as an amplifier — returns come from the underlying engineering foundation, not from the AI tool itself. Score your organization across the five dimensions below before you commit FY27 budget to expanded AI coding rollout.

Five dimensions. Five points each. 25 points total.

1. Token-to-Ship Attribution (0–5)

  • 1: We cannot tie token spend to any specific developer
  • 3: We can attribute by developer, but not to PR or deployment
  • 5: Every AI-generated line is tied to a PR, ticket, deployment, and incident outcome

2. Cost Governance and Anomaly Detection (0–5)

  • 1: Monthly invoice review; no per-session ceilings
  • 3: Budget alerts at team level, post-hoc
  • 5: Per-agent and per-session ceilings, pre-emptive anomaly detection across all providers

3. DORA + Quality Telemetry (0–5)

  • 1: We track adoption only (DAU, seats)
  • 3: We track lead time and deployment frequency but not change failure rate
  • 5: All four DORA metrics plus AI-attributable incident and vulnerability tracking

4. Engineering Foundations (0–5)

  • 1: Fragmented toolchain, manual QA, weak observability
  • 3: CI/CD in place, partial test automation, basic monitoring
  • 5: Full CI/CD, comprehensive test automation, mature observability, codified SRE practices

5. People and Trust (0–5)

  • 1: AI metrics are used in individual performance reviews; team is anxious
  • 3: Metrics policy is unclear; mixed adoption
  • 5: Clear separation between improvement metrics and evaluation, transparent measurement policy, developer involvement in metric definition

Scoring:

  • 20–25 (Mature): Expand AI investment. ROI is real and measurable.
  • 15–19 (Capable): Invest in the gaps before expanding scope.
  • 10–14 (Foundational): Fix engineering foundations first. AI is amplifying chaos.
  • <10 (At risk): Pause expansion. Build the foundation. The 54% developer-fear and 46% surveillance-concern numbers from the Harness survey are the canary signal here.

Note on dimension 5: The Harness survey found 54% of developers fear that individual evaluations will be based on AI metrics, 46% cite privacy or surveillance concerns, and managers are 4x more likely than practitioners to report no concerns. If your readiness score is high on tooling but low on dimension 5, you have a trust crisis you will discover the hard way during attrition season.

Case Study: A Fortune 500 Financial Services Firm

A Fortune 500 financial services firm — referenced anonymously in 2026 DORA-aligned case studies and tracking the public profile of disclosed banking deployments like JPMorgan — provides the cleanest worked example of what ROI measurement looks like when it is done correctly.

The firm rolled out GitHub Copilot to 4,200 engineers across consumer banking, capital markets, and core infrastructure starting in mid-2025. Headline productivity numbers tracked the public JPMorgan disclosure: 10–20% lift in tasks completed per developer per week. Reported time saved averaged 3.6 hours per developer per week. At a fully loaded $250,000 annual cost per developer, that translated to $66 million in modeled annual productivity value across the engineering org.

The CFO asked three follow-up questions and ROI dropped. Question 1: what does the token spend actually cost? Answer: $312/developer/month average, weighted toward the 800-engineer platform team that was using agentic tools. Annualized: $15.7 million on top of the seat license. Question 2: has code review time increased? Answer: yes, by 27% — consistent with the Harness survey's 28% finding. Annualized cost: $11.2 million in review-cycle overhead. Question 3: has change failure rate moved? Answer: up 18% — below DORA's reported 242.7% incident-per-PR jump because the firm's existing SRE foundation absorbed the impact. Annualized cost: $4.8 million in incident remediation.

Net annual value, after the offsets: roughly $34 million on a $26 million investment — a real 31% ROI, not the 200%+ figure the productivity-only model produced. The firm did three things that made the answer defensible: (1) deployed token-level attribution before the rollout exited pilot, (2) ran a parallel DORA telemetry stream against pre-AI baselines, and (3) committed publicly to separating AI-derived improvement metrics from individual performance reviews — which preserved engineering trust through the rollout.

The lesson, and the part that lines up with the new Harness products: a defensible ROI number requires the receipt at every layer — token, ship, quality, trust. Not one. Not three. All four.

What to Do About It

For CIOs: Before the FY27 plan is locked, run the 25-point readiness assessment on your current AI coding rollout. If you score below 15, pause expansion and fix the foundation. If you score 15–19, identify the lowest dimension and assign an executive owner for Q3 closure. The single highest-leverage investment is token-to-ship attribution — it is the input every downstream conversation needs and the dimension most enterprises are weakest on today.

For CFOs: Pull the consolidated AI cloud spend from OpenAI, Anthropic, AWS Bedrock, GCP Vertex AI, and any internal model hosting. Compare it line-by-line against the software budget. If the cloud line is more than 1x the software line and not on your FY27 plan, that is the gap. Demand per-agent and per-session ceilings before the next quarter starts — the Gartner data that only 44% of organizations have AI FinOps in place is the audit-committee question you do not want to answer with "we are still working on it."

For Engineering and HR Leaders: Publish your AI metrics policy this quarter, before the trust gap widens. The 54% developer fear of AI-based individual evaluation is the leading indicator of attrition. Commit publicly to the separation of improvement metrics from performance metrics, codify it in the engineering handbook, and put developers on the metric-definition committee. The Harness survey was explicit: 55% of developers want separation, 50% want transparency, 49% want involvement. Those are inexpensive to give and expensive to recover after you have lost senior engineers.

The category-defining product just shipped. The benchmarks are public. The frameworks are in front of you. The FY27 budget cycle that starts in September is the one in which AI coding stops being a software line item and starts being a P&L item that finance owns. The 94% gap closes in 2026 or the 40% project cancellation rate finds you in 2027. Pick the side of the line you want to be on.


Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

94% Can't Prove AI Coding ROI. Harness Built the Receipt.

Photo by fauxels on Pexels

Harness shipped the receipt. On May 28, the company launched AI DLC Insights and Cloud & AI Cost Management, two beta products aimed squarely at the problem its own 700-respondent April 2026 survey just quantified: 94% of engineering leaders admit the metrics that matter most are missing from how they measure AI's impact. The same survey, run by Sapio Research across the US, UK, India, France, and Germany, found 89% of those same leaders also believe AI has improved developer productivity. Both statements cannot be true at once. Either the leaders are wrong about the productivity gains, or they cannot prove them. In May 2026, with Gartner forecasting $2.59 trillion in worldwide AI spending, the second answer is the more expensive one.

This is the moment enterprise AI moved from adoption metrics to accountability metrics. The CIOs who survive the FY27 budget review are the ones who can answer one question — "show me the dollar that turned into shipped code" — and the tooling to actually answer it just shipped in beta. What changed, why it matters now, and the two practical frameworks every engineering leader needs to score their organization before the September board cycle starts.

What Changed

On May 28, 2026, Harness announced two complementary products that close the AI ROI loop from the developer's IDE to the production agent's last token, both available in beta now and built on top of the company's existing Software Engineering Insights and Cloud Cost Management platforms.

AI DLC Insights is the developer-side half. It deploys an on-machine agent inside the IDE and terminal, captures every AI-generated line of code, records the token cost per model and per tool, and then maps that spend through the delivery chain to the PR, ticket, and deployment it actually produced. The agent works across Claude Code, Cursor, GitHub Copilot, and Windsurf — meaning a single engineering org running a mixed coding-agent stack can finally see which tool, which developer, and which prompt pattern is producing shipped work versus abandoned code. The product surfaces wasted spend (bloated prompts, expensive model choices, abandoned generations) and correlates per-developer token economics with DORA metrics, ship rates, and downstream incident and vulnerability data.

Cloud & AI Cost Management is the infrastructure-side half. It connects directly to OpenAI, Anthropic, AWS Bedrock, and GCP Vertex AI, captures spend at the individual request level, and ties each request back to the agent, session, or workflow that triggered it. Crucially, it extends Harness's existing FinOps controls — unit economics, anomaly detection, budget governance — to AI infrastructure, so the question "is this agent worth what it costs?" finally has a number with three significant digits behind it. Budget governance can be set at the agent level, the team level, or the business-unit level, with anomaly detection that fires before — not after — a runaway agent session burns through a quarter's allocation.

Both products are anchored to a survey finding that is hard to ignore. In April 2026, Harness commissioned Sapio Research to survey 700 software engineering practitioners and managers across five countries. The headline numbers: 89% of engineering leaders report improved developer productivity since adopting AI coding tools, 88% say developer satisfaction has improved, 89% believe current metrics accurately reflect AI's impact — and yet 94% acknowledge that key factors are missing from those same metrics. Only 6% believe their existing frameworks can address the gap. The same report found that 81% of leaders say developers spend more time in code review since AI adoption, with 28% reporting that the increase exceeds 30%.

Trevor Stuart, Harness SVP and General Manager, framed the launch bluntly in the company's announcement: "we're spending more on AI than ever, so why can't we show what it's doing for us?" Named customer references at launch included United Airlines, Morningstar, and Choice Hotels — the kind of regulated, change-managed enterprise that has been deploying AI coding tools for two years and is now being asked, by CFOs and audit committees, to justify the line item.

The competitive context matters. Atlassian acquired DX for $1 billion in late 2025 and now bundles the Developer Experience Index — a 14-factor productivity score — inside the Jira/Bitbucket platform. Jellyfish runs the DevFinOps and allocations model that translates engineering effort into board-ready financials. LinearB leans into automation over dashboards. Faros AI markets multi-tool visibility across 50+ systems. Harness's new positioning — token economics tied to shipped work, end-to-end — is the first product on the market that treats AI coding spend as a first-class FinOps category rather than an engineering analytics afterthought.

Why This Matters

This is not a feature announcement. It is the first credible answer to the question that is about to consume every Q3 budget review.

Technical Implications for CIOs and CTOs

For two years, the AI coding stack has been measured the way SaaS seats are measured: adoption rate, daily active users, lines of code generated. None of those numbers connect to shipped work, code quality, or production stability. The 2026 DORA report makes the consequence concrete: individual output is up sharply — 21% more tasks completed, 98% more PRs merged per developer — but bugs per developer have risen 54%, incidents per PR are up 242.7%, and delivery stability has dropped 7.2%. The same Google-published research models a 500-person engineering organization at 39% first-year ROI on AI coding, with an 8-month payback and $11.6 million in value against $8.4 million of investment — but warns those returns evaporate without the underlying engineering foundations: testing, CI/CD, observability, and clean toolchain telemetry.

The architectural implication is that ROI measurement is now an integration problem, not an analytics problem. To compute true return per developer per week, the platform has to (1) capture token spend at the IDE, (2) capture infrastructure spend at the inference API, (3) tie both to ticket, PR, and deployment IDs, (4) overlay DORA metrics for stability and lead time, and (5) reconcile against incident and security signals downstream. Few enterprise toolchains have that data plumbed end-to-end. The teams that win FY27 budget are the ones that close that loop in Q3.

Business Implications for CFOs, COOs, and Boards

The financial framing is sharper. Gartner's John-David Lovelock has gone on record that 40% of CIOs currently say "I can't point to the value that we get from AI." IDC's FutureScape 2026 projects G1000 organizations face up to a 30% upward revision in underestimated AI infrastructure costs by 2027, driven by what IDC calls the "opaque consumption models" of agentic workloads. AnalyticsWeek has documented a $400 million collective cloud spend leak across the Fortune 500, driven by agent sessions running without per-session cost ceilings. The public examples — a $47,000 single-deployment overrun documented on Hacker News in late 2025, a $4,000/month misconfigured pipeline reported on Medium in April 2026 — are the tip of an iceberg whose mass shows up in the consolidated invoice from OpenAI, Anthropic, and the hyperscalers.

For CFOs sitting in front of audit committees, this is the year unmanaged AI spend becomes a material control weakness. The fact that only 44% of organizations have adopted financial guardrails or AI FinOps practices, per Gartner's March 2026 survey of 353 D&A and AI leaders, is no longer defensible as a "we're still piloting" answer. Gartner's correlated finding — that 40% of agentic AI projects will be canceled by end of 2027 due to cost overruns, unclear value, or risk controls — is the cancellation list those audit committees are about to write.

Market Context

The AI coding ROI category did not exist as a discrete buyer two years ago. In May 2026 it is one of the fastest-consolidating segments in enterprise software.

Cursor has crossed $2 billion in ARR. Claude Code now leads developer satisfaction at 46% and is being deployed inside Anthropic's enterprise services JV. GitHub Copilot is in 90% of Fortune 100 companies. 41% of all code written globally in 2026 is AI-generated or AI-assisted, with 22% of merged code AI-authored. Daily AI users merge approximately 60% more pull requests than light users. The volume question is settled. The accountability question is the next bottleneck.

Real enterprise outcomes are emerging — and they are uneven. Bancolombia reports a 30% code generation boost and 18,000 changes per year attributable to GitHub Copilot. EchoStar Hughes reports 25% productivity and 35,000 hours saved using Copilot. JPMorgan publicly disclosed a 10–20% productivity increase from its AI coding rollout. But the Stanford research cited in the DORA report is more nuanced: 35–40% productivity gain on greenfield, simple tasks, and only ~10% impact on complex legacy code — which is where most enterprise engineering hours go. Realistic enterprise ROI ranges, per a synthesis of these benchmarks, are 2.5–3.5x average and 4–6x for top-quartile teams, but only when the cost denominator includes actual per-developer token spend (typically $200–$600/month, with agentic tools running up to $2,000+/developer/month) — not just the seat license.

The analyst signal is converging. Forrester sees 2026 as the year multi-agent systems scale, but only inside enterprises that can measure them. Gartner has put AI FinOps onto its priority lists alongside agentic governance and agentic security. IDC's "opaque consumption" framing is being adopted by FinOps Foundation chapters as the operating definition for what AI cost management has to solve. The category is moving from "nice-to-have analytics" to "control objective the CFO signs off on."

The vendor field is consolidating accordingly. DX's $1 billion Atlassian outcome set the strategic-acquisition price floor. Jellyfish, LinearB, and Faros AI are all racing to add AI-specific cost attribution to their existing engineering intelligence platforms. Harness's advantage, with this launch, is that the FinOps spine — Cloud Cost Management with $1B+ in customer cloud spend already under management — was already in place. Bolting AI-aware unit economics onto it is a closer step than what the pure-play engineering analytics vendors face.

Framework #1: The AI Coding ROI Calculator (CFO-Ready)

Before approving the next AI coding tool renewal — or, more likely, before defending the existing rollout to a finance review — engineering leaders need a defensible per-engineer ROI number that includes everything finance is going to ask about. The calculator below is calibrated to the 2026 benchmarks above.

Three scenarios. Same math. Different inputs.

Inputs the model needs:

  1. Fully loaded developer cost (salary + benefits + overhead). US enterprise average: $200,000/year.
  2. AI coding tool seat license per developer per month.
  3. Average per-developer token spend per month (this is the line item most CFOs do not see).
  4. Realistic productivity gain percentage (apply the legacy-vs-greenfield adjustment).
  5. Code review overhead increase (81% of teams report this; budget the cost).
  6. Incident/quality offset (DORA's 242.7% incident-per-PR increase, where applicable).
Scenario Small Team (25 devs) Mid-Size (250 devs) Enterprise (1,000 devs)
Fully loaded annual dev cost $5.0M $50.0M $200.0M
Inline AI tool ($30/dev/mo) $9,000/yr $90,000/yr $360,000/yr
Agentic token spend ($400/dev/mo avg) $120,000/yr $1,200,000/yr $4,800,000/yr
Total AI investment $129,000 $1,290,000 $5,160,000
Realistic productivity gain (mixed workload, 18%) $900,000 $9,000,000 $36,000,000
Less: code review overhead (5% of dev cost) ($250,000) ($2,500,000) ($10,000,000)
Less: incident remediation offset (2.5%) ($125,000) ($1,250,000) ($5,000,000)
Net annual value $525,000 $5,250,000 $21,000,000
Year-1 ROI 307% 307% 307%

How to use this with your CFO:

  • Run the same model three times with three productivity assumptions: 10% (legacy-heavy), 18% (mixed), 30% (greenfield-heavy). Show all three numbers. The DORA report explicitly recommends presenting ROI as a range, not a point estimate, because the underlying productivity gain depends on what your developers actually work on.
  • Include the offsets. If you submit a productivity-gain-only number to finance, you will lose credibility the first time an incident is traced back to AI-generated code. The 5% review overhead and 2.5% incident offset above are defensible based on the 2026 DORA data and the Harness survey's 81% review-time finding.
  • Make the token spend explicit. Agentic coding tools at $200–$2,000+/developer/month in token spend are typically buried in cloud invoices, not in software budgets. CFOs are about to ask why this line did not appear in the FY26 plan. The receipt is the answer.

Framework #2: The 25-Point AI Coding ROI Readiness Assessment

Tooling alone does not produce ROI. The DORA report's central finding is that AI acts as an amplifier — returns come from the underlying engineering foundation, not from the AI tool itself. Score your organization across the five dimensions below before you commit FY27 budget to expanded AI coding rollout.

Five dimensions. Five points each. 25 points total.

1. Token-to-Ship Attribution (0–5)

  • 1: We cannot tie token spend to any specific developer
  • 3: We can attribute by developer, but not to PR or deployment
  • 5: Every AI-generated line is tied to a PR, ticket, deployment, and incident outcome

2. Cost Governance and Anomaly Detection (0–5)

  • 1: Monthly invoice review; no per-session ceilings
  • 3: Budget alerts at team level, post-hoc
  • 5: Per-agent and per-session ceilings, pre-emptive anomaly detection across all providers

3. DORA + Quality Telemetry (0–5)

  • 1: We track adoption only (DAU, seats)
  • 3: We track lead time and deployment frequency but not change failure rate
  • 5: All four DORA metrics plus AI-attributable incident and vulnerability tracking

4. Engineering Foundations (0–5)

  • 1: Fragmented toolchain, manual QA, weak observability
  • 3: CI/CD in place, partial test automation, basic monitoring
  • 5: Full CI/CD, comprehensive test automation, mature observability, codified SRE practices

5. People and Trust (0–5)

  • 1: AI metrics are used in individual performance reviews; team is anxious
  • 3: Metrics policy is unclear; mixed adoption
  • 5: Clear separation between improvement metrics and evaluation, transparent measurement policy, developer involvement in metric definition

Scoring:

  • 20–25 (Mature): Expand AI investment. ROI is real and measurable.
  • 15–19 (Capable): Invest in the gaps before expanding scope.
  • 10–14 (Foundational): Fix engineering foundations first. AI is amplifying chaos.
  • <10 (At risk): Pause expansion. Build the foundation. The 54% developer-fear and 46% surveillance-concern numbers from the Harness survey are the canary signal here.

Note on dimension 5: The Harness survey found 54% of developers fear that individual evaluations will be based on AI metrics, 46% cite privacy or surveillance concerns, and managers are 4x more likely than practitioners to report no concerns. If your readiness score is high on tooling but low on dimension 5, you have a trust crisis you will discover the hard way during attrition season.

Case Study: A Fortune 500 Financial Services Firm

A Fortune 500 financial services firm — referenced anonymously in 2026 DORA-aligned case studies and tracking the public profile of disclosed banking deployments like JPMorgan — provides the cleanest worked example of what ROI measurement looks like when it is done correctly.

The firm rolled out GitHub Copilot to 4,200 engineers across consumer banking, capital markets, and core infrastructure starting in mid-2025. Headline productivity numbers tracked the public JPMorgan disclosure: 10–20% lift in tasks completed per developer per week. Reported time saved averaged 3.6 hours per developer per week. At a fully loaded $250,000 annual cost per developer, that translated to $66 million in modeled annual productivity value across the engineering org.

The CFO asked three follow-up questions and ROI dropped. Question 1: what does the token spend actually cost? Answer: $312/developer/month average, weighted toward the 800-engineer platform team that was using agentic tools. Annualized: $15.7 million on top of the seat license. Question 2: has code review time increased? Answer: yes, by 27% — consistent with the Harness survey's 28% finding. Annualized cost: $11.2 million in review-cycle overhead. Question 3: has change failure rate moved? Answer: up 18% — below DORA's reported 242.7% incident-per-PR jump because the firm's existing SRE foundation absorbed the impact. Annualized cost: $4.8 million in incident remediation.

Net annual value, after the offsets: roughly $34 million on a $26 million investment — a real 31% ROI, not the 200%+ figure the productivity-only model produced. The firm did three things that made the answer defensible: (1) deployed token-level attribution before the rollout exited pilot, (2) ran a parallel DORA telemetry stream against pre-AI baselines, and (3) committed publicly to separating AI-derived improvement metrics from individual performance reviews — which preserved engineering trust through the rollout.

The lesson, and the part that lines up with the new Harness products: a defensible ROI number requires the receipt at every layer — token, ship, quality, trust. Not one. Not three. All four.

What to Do About It

For CIOs: Before the FY27 plan is locked, run the 25-point readiness assessment on your current AI coding rollout. If you score below 15, pause expansion and fix the foundation. If you score 15–19, identify the lowest dimension and assign an executive owner for Q3 closure. The single highest-leverage investment is token-to-ship attribution — it is the input every downstream conversation needs and the dimension most enterprises are weakest on today.

For CFOs: Pull the consolidated AI cloud spend from OpenAI, Anthropic, AWS Bedrock, GCP Vertex AI, and any internal model hosting. Compare it line-by-line against the software budget. If the cloud line is more than 1x the software line and not on your FY27 plan, that is the gap. Demand per-agent and per-session ceilings before the next quarter starts — the Gartner data that only 44% of organizations have AI FinOps in place is the audit-committee question you do not want to answer with "we are still working on it."

For Engineering and HR Leaders: Publish your AI metrics policy this quarter, before the trust gap widens. The 54% developer fear of AI-based individual evaluation is the leading indicator of attrition. Commit publicly to the separation of improvement metrics from performance metrics, codify it in the engineering handbook, and put developers on the metric-definition committee. The Harness survey was explicit: 55% of developers want separation, 50% want transparency, 49% want involvement. Those are inexpensive to give and expensive to recover after you have lost senior engineers.

The category-defining product just shipped. The benchmarks are public. The frameworks are in front of you. The FY27 budget cycle that starts in September is the one in which AI coding stops being a software line item and starts being a P&L item that finance owns. The 94% gap closes in 2026 or the 40% project cancellation rate finds you in 2027. Pick the side of the line you want to be on.


Continue Reading

Share:

THE DAILY BRIEF

Enterprise AIEngineering ROIAI FinOpsCIODeveloper Productivity

94% Can't Prove AI Coding ROI. Harness Built the Receipt.

Harness launches AI DLC Insights as 94% of engineering leaders admit they can't measure AI coding ROI. ROI calculator + 25-point readiness assessment inside.

By Rajesh Beri·May 30, 2026·16 min read

Harness shipped the receipt. On May 28, the company launched AI DLC Insights and Cloud & AI Cost Management, two beta products aimed squarely at the problem its own 700-respondent April 2026 survey just quantified: 94% of engineering leaders admit the metrics that matter most are missing from how they measure AI's impact. The same survey, run by Sapio Research across the US, UK, India, France, and Germany, found 89% of those same leaders also believe AI has improved developer productivity. Both statements cannot be true at once. Either the leaders are wrong about the productivity gains, or they cannot prove them. In May 2026, with Gartner forecasting $2.59 trillion in worldwide AI spending, the second answer is the more expensive one.

This is the moment enterprise AI moved from adoption metrics to accountability metrics. The CIOs who survive the FY27 budget review are the ones who can answer one question — "show me the dollar that turned into shipped code" — and the tooling to actually answer it just shipped in beta. What changed, why it matters now, and the two practical frameworks every engineering leader needs to score their organization before the September board cycle starts.

What Changed

On May 28, 2026, Harness announced two complementary products that close the AI ROI loop from the developer's IDE to the production agent's last token, both available in beta now and built on top of the company's existing Software Engineering Insights and Cloud Cost Management platforms.

AI DLC Insights is the developer-side half. It deploys an on-machine agent inside the IDE and terminal, captures every AI-generated line of code, records the token cost per model and per tool, and then maps that spend through the delivery chain to the PR, ticket, and deployment it actually produced. The agent works across Claude Code, Cursor, GitHub Copilot, and Windsurf — meaning a single engineering org running a mixed coding-agent stack can finally see which tool, which developer, and which prompt pattern is producing shipped work versus abandoned code. The product surfaces wasted spend (bloated prompts, expensive model choices, abandoned generations) and correlates per-developer token economics with DORA metrics, ship rates, and downstream incident and vulnerability data.

Cloud & AI Cost Management is the infrastructure-side half. It connects directly to OpenAI, Anthropic, AWS Bedrock, and GCP Vertex AI, captures spend at the individual request level, and ties each request back to the agent, session, or workflow that triggered it. Crucially, it extends Harness's existing FinOps controls — unit economics, anomaly detection, budget governance — to AI infrastructure, so the question "is this agent worth what it costs?" finally has a number with three significant digits behind it. Budget governance can be set at the agent level, the team level, or the business-unit level, with anomaly detection that fires before — not after — a runaway agent session burns through a quarter's allocation.

Both products are anchored to a survey finding that is hard to ignore. In April 2026, Harness commissioned Sapio Research to survey 700 software engineering practitioners and managers across five countries. The headline numbers: 89% of engineering leaders report improved developer productivity since adopting AI coding tools, 88% say developer satisfaction has improved, 89% believe current metrics accurately reflect AI's impact — and yet 94% acknowledge that key factors are missing from those same metrics. Only 6% believe their existing frameworks can address the gap. The same report found that 81% of leaders say developers spend more time in code review since AI adoption, with 28% reporting that the increase exceeds 30%.

Trevor Stuart, Harness SVP and General Manager, framed the launch bluntly in the company's announcement: "we're spending more on AI than ever, so why can't we show what it's doing for us?" Named customer references at launch included United Airlines, Morningstar, and Choice Hotels — the kind of regulated, change-managed enterprise that has been deploying AI coding tools for two years and is now being asked, by CFOs and audit committees, to justify the line item.

The competitive context matters. Atlassian acquired DX for $1 billion in late 2025 and now bundles the Developer Experience Index — a 14-factor productivity score — inside the Jira/Bitbucket platform. Jellyfish runs the DevFinOps and allocations model that translates engineering effort into board-ready financials. LinearB leans into automation over dashboards. Faros AI markets multi-tool visibility across 50+ systems. Harness's new positioning — token economics tied to shipped work, end-to-end — is the first product on the market that treats AI coding spend as a first-class FinOps category rather than an engineering analytics afterthought.

Why This Matters

This is not a feature announcement. It is the first credible answer to the question that is about to consume every Q3 budget review.

Technical Implications for CIOs and CTOs

For two years, the AI coding stack has been measured the way SaaS seats are measured: adoption rate, daily active users, lines of code generated. None of those numbers connect to shipped work, code quality, or production stability. The 2026 DORA report makes the consequence concrete: individual output is up sharply — 21% more tasks completed, 98% more PRs merged per developer — but bugs per developer have risen 54%, incidents per PR are up 242.7%, and delivery stability has dropped 7.2%. The same Google-published research models a 500-person engineering organization at 39% first-year ROI on AI coding, with an 8-month payback and $11.6 million in value against $8.4 million of investment — but warns those returns evaporate without the underlying engineering foundations: testing, CI/CD, observability, and clean toolchain telemetry.

The architectural implication is that ROI measurement is now an integration problem, not an analytics problem. To compute true return per developer per week, the platform has to (1) capture token spend at the IDE, (2) capture infrastructure spend at the inference API, (3) tie both to ticket, PR, and deployment IDs, (4) overlay DORA metrics for stability and lead time, and (5) reconcile against incident and security signals downstream. Few enterprise toolchains have that data plumbed end-to-end. The teams that win FY27 budget are the ones that close that loop in Q3.

Business Implications for CFOs, COOs, and Boards

The financial framing is sharper. Gartner's John-David Lovelock has gone on record that 40% of CIOs currently say "I can't point to the value that we get from AI." IDC's FutureScape 2026 projects G1000 organizations face up to a 30% upward revision in underestimated AI infrastructure costs by 2027, driven by what IDC calls the "opaque consumption models" of agentic workloads. AnalyticsWeek has documented a $400 million collective cloud spend leak across the Fortune 500, driven by agent sessions running without per-session cost ceilings. The public examples — a $47,000 single-deployment overrun documented on Hacker News in late 2025, a $4,000/month misconfigured pipeline reported on Medium in April 2026 — are the tip of an iceberg whose mass shows up in the consolidated invoice from OpenAI, Anthropic, and the hyperscalers.

For CFOs sitting in front of audit committees, this is the year unmanaged AI spend becomes a material control weakness. The fact that only 44% of organizations have adopted financial guardrails or AI FinOps practices, per Gartner's March 2026 survey of 353 D&A and AI leaders, is no longer defensible as a "we're still piloting" answer. Gartner's correlated finding — that 40% of agentic AI projects will be canceled by end of 2027 due to cost overruns, unclear value, or risk controls — is the cancellation list those audit committees are about to write.

Market Context

The AI coding ROI category did not exist as a discrete buyer two years ago. In May 2026 it is one of the fastest-consolidating segments in enterprise software.

Cursor has crossed $2 billion in ARR. Claude Code now leads developer satisfaction at 46% and is being deployed inside Anthropic's enterprise services JV. GitHub Copilot is in 90% of Fortune 100 companies. 41% of all code written globally in 2026 is AI-generated or AI-assisted, with 22% of merged code AI-authored. Daily AI users merge approximately 60% more pull requests than light users. The volume question is settled. The accountability question is the next bottleneck.

Real enterprise outcomes are emerging — and they are uneven. Bancolombia reports a 30% code generation boost and 18,000 changes per year attributable to GitHub Copilot. EchoStar Hughes reports 25% productivity and 35,000 hours saved using Copilot. JPMorgan publicly disclosed a 10–20% productivity increase from its AI coding rollout. But the Stanford research cited in the DORA report is more nuanced: 35–40% productivity gain on greenfield, simple tasks, and only ~10% impact on complex legacy code — which is where most enterprise engineering hours go. Realistic enterprise ROI ranges, per a synthesis of these benchmarks, are 2.5–3.5x average and 4–6x for top-quartile teams, but only when the cost denominator includes actual per-developer token spend (typically $200–$600/month, with agentic tools running up to $2,000+/developer/month) — not just the seat license.

The analyst signal is converging. Forrester sees 2026 as the year multi-agent systems scale, but only inside enterprises that can measure them. Gartner has put AI FinOps onto its priority lists alongside agentic governance and agentic security. IDC's "opaque consumption" framing is being adopted by FinOps Foundation chapters as the operating definition for what AI cost management has to solve. The category is moving from "nice-to-have analytics" to "control objective the CFO signs off on."

The vendor field is consolidating accordingly. DX's $1 billion Atlassian outcome set the strategic-acquisition price floor. Jellyfish, LinearB, and Faros AI are all racing to add AI-specific cost attribution to their existing engineering intelligence platforms. Harness's advantage, with this launch, is that the FinOps spine — Cloud Cost Management with $1B+ in customer cloud spend already under management — was already in place. Bolting AI-aware unit economics onto it is a closer step than what the pure-play engineering analytics vendors face.

Framework #1: The AI Coding ROI Calculator (CFO-Ready)

Before approving the next AI coding tool renewal — or, more likely, before defending the existing rollout to a finance review — engineering leaders need a defensible per-engineer ROI number that includes everything finance is going to ask about. The calculator below is calibrated to the 2026 benchmarks above.

Three scenarios. Same math. Different inputs.

Inputs the model needs:

  1. Fully loaded developer cost (salary + benefits + overhead). US enterprise average: $200,000/year.
  2. AI coding tool seat license per developer per month.
  3. Average per-developer token spend per month (this is the line item most CFOs do not see).
  4. Realistic productivity gain percentage (apply the legacy-vs-greenfield adjustment).
  5. Code review overhead increase (81% of teams report this; budget the cost).
  6. Incident/quality offset (DORA's 242.7% incident-per-PR increase, where applicable).
Scenario Small Team (25 devs) Mid-Size (250 devs) Enterprise (1,000 devs)
Fully loaded annual dev cost $5.0M $50.0M $200.0M
Inline AI tool ($30/dev/mo) $9,000/yr $90,000/yr $360,000/yr
Agentic token spend ($400/dev/mo avg) $120,000/yr $1,200,000/yr $4,800,000/yr
Total AI investment $129,000 $1,290,000 $5,160,000
Realistic productivity gain (mixed workload, 18%) $900,000 $9,000,000 $36,000,000
Less: code review overhead (5% of dev cost) ($250,000) ($2,500,000) ($10,000,000)
Less: incident remediation offset (2.5%) ($125,000) ($1,250,000) ($5,000,000)
Net annual value $525,000 $5,250,000 $21,000,000
Year-1 ROI 307% 307% 307%

How to use this with your CFO:

  • Run the same model three times with three productivity assumptions: 10% (legacy-heavy), 18% (mixed), 30% (greenfield-heavy). Show all three numbers. The DORA report explicitly recommends presenting ROI as a range, not a point estimate, because the underlying productivity gain depends on what your developers actually work on.
  • Include the offsets. If you submit a productivity-gain-only number to finance, you will lose credibility the first time an incident is traced back to AI-generated code. The 5% review overhead and 2.5% incident offset above are defensible based on the 2026 DORA data and the Harness survey's 81% review-time finding.
  • Make the token spend explicit. Agentic coding tools at $200–$2,000+/developer/month in token spend are typically buried in cloud invoices, not in software budgets. CFOs are about to ask why this line did not appear in the FY26 plan. The receipt is the answer.

Framework #2: The 25-Point AI Coding ROI Readiness Assessment

Tooling alone does not produce ROI. The DORA report's central finding is that AI acts as an amplifier — returns come from the underlying engineering foundation, not from the AI tool itself. Score your organization across the five dimensions below before you commit FY27 budget to expanded AI coding rollout.

Five dimensions. Five points each. 25 points total.

1. Token-to-Ship Attribution (0–5)

  • 1: We cannot tie token spend to any specific developer
  • 3: We can attribute by developer, but not to PR or deployment
  • 5: Every AI-generated line is tied to a PR, ticket, deployment, and incident outcome

2. Cost Governance and Anomaly Detection (0–5)

  • 1: Monthly invoice review; no per-session ceilings
  • 3: Budget alerts at team level, post-hoc
  • 5: Per-agent and per-session ceilings, pre-emptive anomaly detection across all providers

3. DORA + Quality Telemetry (0–5)

  • 1: We track adoption only (DAU, seats)
  • 3: We track lead time and deployment frequency but not change failure rate
  • 5: All four DORA metrics plus AI-attributable incident and vulnerability tracking

4. Engineering Foundations (0–5)

  • 1: Fragmented toolchain, manual QA, weak observability
  • 3: CI/CD in place, partial test automation, basic monitoring
  • 5: Full CI/CD, comprehensive test automation, mature observability, codified SRE practices

5. People and Trust (0–5)

  • 1: AI metrics are used in individual performance reviews; team is anxious
  • 3: Metrics policy is unclear; mixed adoption
  • 5: Clear separation between improvement metrics and evaluation, transparent measurement policy, developer involvement in metric definition

Scoring:

  • 20–25 (Mature): Expand AI investment. ROI is real and measurable.
  • 15–19 (Capable): Invest in the gaps before expanding scope.
  • 10–14 (Foundational): Fix engineering foundations first. AI is amplifying chaos.
  • <10 (At risk): Pause expansion. Build the foundation. The 54% developer-fear and 46% surveillance-concern numbers from the Harness survey are the canary signal here.

Note on dimension 5: The Harness survey found 54% of developers fear that individual evaluations will be based on AI metrics, 46% cite privacy or surveillance concerns, and managers are 4x more likely than practitioners to report no concerns. If your readiness score is high on tooling but low on dimension 5, you have a trust crisis you will discover the hard way during attrition season.

Case Study: A Fortune 500 Financial Services Firm

A Fortune 500 financial services firm — referenced anonymously in 2026 DORA-aligned case studies and tracking the public profile of disclosed banking deployments like JPMorgan — provides the cleanest worked example of what ROI measurement looks like when it is done correctly.

The firm rolled out GitHub Copilot to 4,200 engineers across consumer banking, capital markets, and core infrastructure starting in mid-2025. Headline productivity numbers tracked the public JPMorgan disclosure: 10–20% lift in tasks completed per developer per week. Reported time saved averaged 3.6 hours per developer per week. At a fully loaded $250,000 annual cost per developer, that translated to $66 million in modeled annual productivity value across the engineering org.

The CFO asked three follow-up questions and ROI dropped. Question 1: what does the token spend actually cost? Answer: $312/developer/month average, weighted toward the 800-engineer platform team that was using agentic tools. Annualized: $15.7 million on top of the seat license. Question 2: has code review time increased? Answer: yes, by 27% — consistent with the Harness survey's 28% finding. Annualized cost: $11.2 million in review-cycle overhead. Question 3: has change failure rate moved? Answer: up 18% — below DORA's reported 242.7% incident-per-PR jump because the firm's existing SRE foundation absorbed the impact. Annualized cost: $4.8 million in incident remediation.

Net annual value, after the offsets: roughly $34 million on a $26 million investment — a real 31% ROI, not the 200%+ figure the productivity-only model produced. The firm did three things that made the answer defensible: (1) deployed token-level attribution before the rollout exited pilot, (2) ran a parallel DORA telemetry stream against pre-AI baselines, and (3) committed publicly to separating AI-derived improvement metrics from individual performance reviews — which preserved engineering trust through the rollout.

The lesson, and the part that lines up with the new Harness products: a defensible ROI number requires the receipt at every layer — token, ship, quality, trust. Not one. Not three. All four.

What to Do About It

For CIOs: Before the FY27 plan is locked, run the 25-point readiness assessment on your current AI coding rollout. If you score below 15, pause expansion and fix the foundation. If you score 15–19, identify the lowest dimension and assign an executive owner for Q3 closure. The single highest-leverage investment is token-to-ship attribution — it is the input every downstream conversation needs and the dimension most enterprises are weakest on today.

For CFOs: Pull the consolidated AI cloud spend from OpenAI, Anthropic, AWS Bedrock, GCP Vertex AI, and any internal model hosting. Compare it line-by-line against the software budget. If the cloud line is more than 1x the software line and not on your FY27 plan, that is the gap. Demand per-agent and per-session ceilings before the next quarter starts — the Gartner data that only 44% of organizations have AI FinOps in place is the audit-committee question you do not want to answer with "we are still working on it."

For Engineering and HR Leaders: Publish your AI metrics policy this quarter, before the trust gap widens. The 54% developer fear of AI-based individual evaluation is the leading indicator of attrition. Commit publicly to the separation of improvement metrics from performance metrics, codify it in the engineering handbook, and put developers on the metric-definition committee. The Harness survey was explicit: 55% of developers want separation, 50% want transparency, 49% want involvement. Those are inexpensive to give and expensive to recover after you have lost senior engineers.

The category-defining product just shipped. The benchmarks are public. The frameworks are in front of you. The FY27 budget cycle that starts in September is the one in which AI coding stops being a software line item and starts being a P&L item that finance owns. The 94% gap closes in 2026 or the 40% project cancellation rate finds you in 2027. Pick the side of the line you want to be on.


Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Subscribe