Google's threat intelligence team disclosed on April 23, 2026 that malicious indirect prompt injection (IPI) attempts on the open web rose 32% between November 2025 and February 2026, after scanning 2-3 billion crawled pages per month from Common Crawl snapshots. Researchers Thomas Brunner, Yu-Han Liu, and Moni Pande documented attacks ranging from SEO manipulation and resource-exhaustion traps to fully specified PayPal transactions and Stripe donation redirects designed to trigger autonomous AI agents with payment credentials. Forcepoint's X-Labs and Palo Alto Unit 42 corroborated the findings the same week, with Unit 42 cataloguing live IPI payloads on advertising review sites and SEO-poisoned domains. The headline is uncomfortable for any enterprise that has shipped — or is about to ship — an autonomous browsing agent: the open web is now a hostile input channel, and your agent's integrations are the blast radius.
The data lands at the worst possible moment. Salesforce shipped Agentforce Operations on April 29, OpenAI's Bedrock Managed Agents went live in early May, and Nvidia's Agent Toolkit signed Adobe, Salesforce, and SAP at GTC 2026. Enterprises are wiring agents into payment rails, email, ticketing, and code repos at a pace that outstrips their detection stack. Google's window-to-exploit data — five months in 2023, ten hours by 2026 — means the same LLMs we are deploying defensively are being used to weaponize web content offensively. For CIOs and CISOs, this is no longer a research demo. It is a control-plane problem with audit, regulatory, and balance-sheet consequences.
What Google actually found
The research used Common Crawl monthly snapshots of the English-speaking web, excluding logged-in social platforms. Google split observed injections into four categories:
- Harmless pranks (e.g., "act like a baby bird")
- Defensive instructions from publishers telling crawlers to stay out
- SEO manipulation prompts directing AI summaries to favor specific brands
- Malicious attacks — the only category that grew — split between data exfiltration (credentials, IPs, API keys) and destruction (file deletion commands)
The malicious bucket grew 32% in three months. Sophistication remained low — most payloads still used the well-worn "ignore previous instructions" pattern — but Forcepoint flagged "shared injection templates across multiple domains" suggesting a tooling layer is forming. That is the part to watch. When kits emerge, scale follows.
The most concrete enterprise threat surfaced in the financial-fraud examples. One page contained a fully specified PayPal transaction with step-by-step instructions targeting AI agents with integrated payment capabilities, using jailbreak framing to bypass safety alignments. A second deployed meta-tag namespace injection plus a "persuasion amplifier" keyword to route AI-mediated payments toward a Stripe donation link. A third looked like reconnaissance — designed to fingerprint which agents were susceptible, which is exactly what you build before you build a campaign.
Why this is a control-plane problem, not a model problem
The instinct in security is to treat IPI like a content moderation failure and push the fix into the model. That is the wrong layer. An AI agent with legitimate payment credentials executing a malicious instruction leaves logs that look identical to authorized activity. No anomalous login. No brute force. No exfiltration spike. The system did exactly what it was told — by the wrong source. Existing SIEM, DLP, and CASB tooling cannot tell the difference, because the difference is provenance, not behavior.
Unit 42's severity matrix is a useful frame for the boardroom version of this conversation:
- Critical — system compromise, credential leakage, data destruction
- High — financial fraud, content moderation bypass
- Medium — decision-making manipulation (hiring screens, vendor reviews)
- Low — resource exhaustion, output disruption
Risk scales with agent privilege. A summarization tool reading a poisoned page produces a bad summary. An agent with payments.execute scope reading the same page wires money. The control surface is not the model; it is the entitlement boundary around the agent.
The defense Google is recommending — and where it falls short
Google proposes a three-tier filtering stack: pattern matching for known signatures, LLM-based classification (Gemini, in their case) for intent assessment, and human validation in the loop. They pair it with red-team pressure testing and the AI Vulnerability Reward Program. It is a reasonable starting point, but enterprises should not stop there. Two gaps stand out.
First, pattern matching and intent classification both run on the same untrusted input. An attacker who can hide an instruction can hide a classifier-evading instruction. Unit 42's research shows 85.2% of observed jailbreaks already use social-engineering framing rather than literal "ignore previous instructions" strings. The signature game is losing.
Second, human-in-the-loop only works when the agent operates at human speed. The whole point of agentic workflows is to remove humans from the inner loop. If you bolt approvals back in for every payment, web fetch, or tool call, you have rebuilt the same RPA bottleneck you were trying to escape.
The more durable design pattern is content provenance and capability separation:
- Treat all web-derived text as untrusted data, not trusted instructions. Render it inside a structurally separated "document" channel that the orchestrator parses for facts, not commands.
- Strip and re-tokenize. Drop pixel-sized text, off-screen positioning, color-drained spans, opacity 0 layers, HTML comments, and metadata tags before the model ever sees the content. Most live IPI payloads die at this layer.
- Bind tool entitlements to user intent, not session. A payment tool should require a user-initiated, signed request — not a model-initiated call triggered by web content.
- Log the source of every instruction. Every tool call should carry the provenance of the prompt that produced it: user message, tool output, retrieved document, or scraped page. Without that, you cannot do forensics. With it, your existing detection stack can finally see IPI.
- Run an offline IPI red team. Test against the Unit 42 catalog and Forcepoint's templates. Measure how often your agent follows instructions embedded in a fetched page versus the user's actual prompt.
What this means for engineering leaders
If you are building or operating enterprise AI agents, three questions matter this quarter:
1. What is the entitlement blast radius of every agent in production? List every tool, every API key, every OAuth scope each agent holds. Anything with write, pay, send, delete, or execute semantics is now a high-value target. Cut entitlements to the minimum needed for the next planned action; revoke them after.
2. Where does untrusted text enter the prompt? Every retrieval source — the open web, customer-uploaded files, ticket bodies, email content, vendor documentation — is an injection vector. Inventory them. Mark each as untrusted in your prompt template so reviewers and detectors can tell signal from data.
3. Can you replay an agent run end-to-end? If a fraudulent transaction posts at 02:14 UTC, can you reconstruct the prompt, the retrieved content, the tool calls, the model response, and the entitlement check that approved it? If not, you have a forensics gap. Vendors like W&B Weave, LangSmith, Galileo, and the new wave of AI runtime security players (Capsule, Splx, the Agentic Runtime Security category we covered last week) are converging on this.
What this means for security and compliance leaders
This is the first IPI dataset large enough to brief a board. Google scanned billions of pages and named specific financial payment payloads. That is enough to anchor a real risk conversation, not a hypothetical one. Three asks for the next risk-committee meeting:
- Update third-party AI risk reviews to include "agent execution scope" as a category. Most TPRMs ask whether a vendor uses AI; few ask what the AI agent is allowed to do on the customer's behalf.
- Map IPI to existing frameworks. It maps cleanly to OWASP LLM01 (prompt injection), NIST AI RMF Govern-1.5 / Manage-2.2, and ISO 42001 Annex A.7 controls. Get the mapping into your control library so audit and engineering speak the same language.
- Add an IPI clause to AI vendor contracts. Specifically, require disclosure of (a) what the agent can do without human approval, (b) what monitoring the vendor performs for prompt injection, and (c) liability allocation when an agent executes a fraudulent instruction injected by a third-party site. Decrypt's reporting flagged a real liability gap; contracts are where you close it.
The bigger picture
The honest read on Google's data is that 2026 is the year IPI becomes operational. Sophistication is still low because attackers have not needed to be clever yet — most enterprise agents are still in pilot, and the targets are not lit up. That changes the moment Agentforce Operations, Bedrock Managed Agents, Gemini Enterprise, and Microsoft 365 E7 push agentic workflows into production accounts payable, customer support, and procurement. The attacker ROI calculation flips on the same curve as agent adoption.
The defenders who win this round will be the ones who treated the agent runtime as a control plane from day one — separating data from instructions, binding tool privileges to user intent, and logging provenance everywhere. The defenders who lose will be the ones who shipped agents with broad scopes and trusted the model to figure out what is real. Google just gave everyone a 32% warning. The next data point is the one you do not want to read in the Wall Street Journal.
Sources:
- AI threats in the wild: The current state of prompt injections on the web — Google Online Security Blog (April 23, 2026)
- AI threats in the wild — blog.google/security
- Indirect prompt injection is taking hold in the wild — Help Net Security (April 24, 2026)
- Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild — Palo Alto Unit 42
- Malicious Web Pages Are Hijacking AI Agents, And Some Are Going After Your PayPal — Decrypt
- Malicious AI Prompt Injection Attacks Increasing, but Sophistication Still Low — SecurityWeek
- Google Workspace's continuous approach to mitigating indirect prompt injections — Google Online Security Blog
Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.
Continue Reading
- Scotiabank Cuts Manual Work 70% With Scotia Intelligence AI
- [$40B/Year: Anthropic's Google Lock-In Reshapes AI Strategy](/article/anthropic-google-200b-cloud-lock-in)
- ServiceNow's Universal Agent Control Plane Play