Enterprise software development has entered a new era. AI coding assistants now generate billions of lines of code per month. Eighty-four percent of developers use them. Over half rely on them daily. Cursor alone reported generating one billion lines of committed code per day by April 2025. Claude Code has contributed 30.7 billion lines to public repositories in the past 90 days, representing four percent of all public commits on GitHub.
The productivity story is real. The security story is a disaster.
The Purple Book Community's State of AI Risk Management 2026 report, a survey of more than 650 senior cybersecurity leaders across seven industries and two continents, reveals a structural disconnect that should alarm every enterprise shipping AI-assisted code. Seventy percent of organizations report confirmed or suspected vulnerabilities introduced by AI-generated code in their production systems. And 92 percent of those organizations say their existing security tools effectively detect those vulnerabilities.
Both numbers cannot be true at the same time. That gap — between what security leaders believe about their detection capabilities and what production systems reveal about actual exposure — is The Confidence Gap. And it is now the most dangerous blind spot in enterprise software security.
The Vulnerability Data
Let the numbers speak first, because the scale of this problem has been quantified from multiple independent angles and the findings converge.
CodeRabbit's analysis of AI-generated versus human-authored code found that AI-generated code produces 1.75 times more logic errors and 1.57 times more security findings. AI-generated code is 2.74 times more likely to introduce cross-site scripting vulnerabilities. The Cloud Security Alliance and Endor Labs report that 62 percent of AI-generated code contains design flaws or known vulnerabilities. Veracode's 2025 analysis found 45 percent of AI-generated code vulnerable per OWASP Top 10 standards, with Java exceeding a 70 percent failure rate for secure code generation.
For the technical audience, the vulnerability patterns are instructive. AI coding assistants do not write malicious code. They write plausible code that omits security-critical operations: input validation, output encoding, parameterized queries, proper authentication checks, rate limiting. The code compiles. It passes unit tests. It does what the developer asked it to do. It just does not do the ten other things a senior engineer would have done reflexively — the defensive patterns that emerge from years of debugging production incidents, reading CVE advisories, and understanding how adversaries actually exploit real systems.
Georgetown University research tested five leading models and found that approximately 48 percent of generated code contained flagged bugs, while only 30 percent passed security verification. These are not edge cases. This is the baseline performance of the tools that now write a significant share of enterprise production code.
The CVE data adds a forensic dimension. Between May 2025 and March 2026, researchers at Georgia Tech's SSLab tracked 74 confirmed CVEs attributed to AI-generated code across 43,849 security advisories. Claude Code accounted for 49 of them, including 11 rated critical. GitHub Copilot contributed 15, with two critical. The trajectory is accelerating: August 2025 showed two AI-attributed CVEs; by March 2026, that number hit 35 in a single month.
Hanqing Zhao, the Georgia Tech researcher leading the analysis, is direct about the limitation: "Those 74 cases are confirmed instances where we found clear evidence that AI-generated code contributed to the vulnerability." He estimates the actual number is five to 10 times higher. Most AI-generated code has its attribution stripped before commit. The 74 CVEs are what survived the erasure.
The Confidence Gap
The Purple Book Community report maps the disconnect across four dimensions, and each one tells the same story.
AI Inventory and Visibility. Eighty-six percent of security leaders claim they maintain a complete AI inventory across their organization. Fifty-nine percent of those same organizations admit that shadow AI is present and ungoverned. When the report cross-tabulated these responses, 57 percent of organizations claiming complete inventory also acknowledged that shadow AI tools exist in their environment. They believe they have full visibility. They simultaneously know they do not.
Detection and Vulnerability Management. Ninety-two percent trust their tools to find AI-code vulnerabilities. Seventy percent have already observed those vulnerabilities in production. The report's most striking finding: 92 percent of organizations with confirmed AI-code vulnerabilities in production say their security tools effectively detect those vulnerabilities. They have the vulnerabilities. They trust the tools. They do not appear to have connected these facts.
Development Pace and Control. Seventy-three percent admit that AI-accelerated development makes security oversight harder. This is the mechanical explanation for the confidence gap. Code volume has increased faster than review capacity. Pull requests per author are up 20 percent year over year, according to the Cortex Engineering 2026 Benchmark Report. But incidents per pull request have risen 23.5 percent, and change failure rates have climbed approximately 30 percent. Developers are shipping faster. They are also shipping worse.
Tool Sprawl and Prioritization. Eighty-two percent report that tool sprawl actively impairs their remediation capabilities. Security teams are not lacking tools. They are drowning in tools that generate findings they cannot prioritize, triage, or fix at the speed AI-assisted development demands.
The Purple Book Community describes this as organizations possessing "awareness without the ability to convert that awareness into governed action at the pace AI demands." That is an accurate diagnosis. It is also an institutional crisis masquerading as a metrics problem.
The Hallucination Problem
Beyond the code that AI writes incorrectly, there is the code that depends on things that do not exist.
A USENIX Security Symposium 2025 study analyzed 576,000 code samples and found that nearly 20 percent of AI-recommended package dependencies refer to packages that do not exist. Open-source models hallucinated at approximately 22 percent; commercial tools like GPT-4 at roughly five percent. Forty-three percent of these hallucinated package names repeat consistently across queries, meaning an attacker who registers the phantom package name owns every project that follows the AI's recommendation.
This is not a theoretical risk. Dependency confusion and typosquatting attacks are among the most common software supply chain attack vectors. AI coding assistants have industrialized the creation of new attack surface by generating reproducible references to packages that a threat actor can preemptively register and poison.
For the business audience, the implication is concrete: your development team's AI assistant is not just writing vulnerable code. It is writing code that depends on software libraries that do not exist, creating entry points that an adversary can claim before anyone notices. GitGuardian's analysis adds another dimension: 6.4 percent of repositories using GitHub Copilot leak at least one secret — API keys, credentials, tokens — a rate 40 percent higher than the 4.6 percent baseline in non-AI repositories.
Why Traditional Security Is Failing
The reason the confidence gap persists is architectural. Enterprise security tooling was designed for a world where code was written by humans at human speed. Static Application Security Testing (SAST), dynamic testing (DAST), and software composition analysis (SCA) tools all assume a review cadence that matches human commit velocity. When a developer writes 200 lines of code per day, a security scanner that runs nightly and produces a report by morning provides adequate coverage.
When that same developer, augmented by an AI assistant, generates 2,000 lines per day, the scanner still runs nightly. But the attack surface has expanded tenfold. The scanner finds more issues than the security team can triage. The team prioritizes the findings they understand and defers the rest. The deferred findings accumulate. The confidence gap widens.
Kusari's Application Security in Practice report found that 85 percent of developers use AI coding assistants, but only 38 percent apply AI to support code review in pull requests — the exact place where security feedback is most actionable. Only nine percent consider AI-driven security analysis a "must have." The industry has embraced AI for code generation and largely ignored it for code defense.
The result is predictable. Privilege escalation paths in AI-assisted codebases have increased 322 percent. Architectural design flaws are up 153 percent. Two-thirds of security teams spend up to 20 hours weekly responding to supply chain security incidents. Over half report two-to-five-day code review approval times. By the time a vulnerability is found, fixed, and deployed, another hundred have been introduced.
What The Enterprise Must Do Now
The response to the AI code security crisis requires changes at three levels: the development workflow, the security architecture, and the organizational model.
At the development workflow level, every AI-generated code change needs a mandatory human security review before merge. Not a cursory approval. A review by an engineer who understands the security implications of the code in its deployment context. AI-assisted development tools should be configured to prioritize security-hardened libraries, and organizations should maintain curated dependency allowlists that prevent hallucinated or unvetted packages from entering the build pipeline. Software Bills of Materials must track not just what is in the build, but what AI tool generated it and when.
At the security architecture level, the shift is from point-in-time scanning to continuous, inline analysis. Security checks must execute within the pull request workflow, not after merge. AI-generated code should be flagged automatically for enhanced review. Dependency verification must happen at commit time, not at build time, catching hallucinated packages before they enter the codebase. Organizations that currently run nightly SAST scans on codebases that change hourly are performing security theater.
At the organizational level, the confidence gap is ultimately a leadership problem. Security leaders who report 92 percent detection confidence while sitting on 70 percent vulnerability rates are not lying. They are trusting dashboards that measure tool coverage, not outcome effectiveness. The metrics need to change. Detection rate is not a useful metric when code volume has outpaced remediation capacity by an order of magnitude. Mean time to remediation, vulnerability escape rate, and the ratio of findings-to-fixes are the numbers that actually describe your security posture.
The non-human identity problem compounds the challenge. Machine and AI identities are projected to outnumber human employees by a ratio of 80 to one. Human identities are typically over-permissioned by 70 percent. AI identities see rates as high as 90 percent. Every AI coding agent with over-broad permissions is a lateral movement vector waiting to be exploited. Enterprises must treat each AI coding tool as a distinct identity within their IAM framework, with least-privilege access, behavioral baselining, and anomaly detection.
The Uncomfortable Truth
The enterprise is in a position it has never been in before. The tools that make developers dramatically more productive are simultaneously introducing vulnerabilities at a rate that existing security infrastructure cannot absorb. The developers know they are more productive. The security leaders believe the tools are catching the problems. The production data says otherwise.
This is not a problem that can be solved by buying more security tools. The Purple Book report found that 82 percent of organizations say tool sprawl already impairs their remediation capability. Adding another scanner to a pipeline that already generates more findings than the team can process does not reduce risk. It increases noise.
The path forward is uncomfortable because it requires accepting a trade-off that the industry has been avoiding: AI-assisted development at maximum velocity is not compatible with enterprise-grade security using current tooling and processes. Something has to give. Either development velocity needs governance constraints — not a return to pre-AI speed, but guardrails that match the pace of generation with the pace of review — or security architecture needs a fundamental redesign to operate at AI speed.
The organizations that will navigate this successfully are the ones that stop treating the confidence gap as a metrics problem and start treating it as what it is: an institutional failure to update security assumptions for a world where the majority of production code is no longer written by humans.
Seventy percent of enterprises already have AI-generated vulnerabilities in production. Ninety-two percent think they are covered. The gap between those numbers is where the next wave of breaches will originate.
Rajesh Beri is Head of AI Engineering at Zscaler and writes about enterprise AI strategy, security, and the gap between what vendors promise and what production environments deliver.
Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.
