The National Security Agency is testing Anthropic's Mythos AI model to identify security vulnerabilities in Microsoft products, marking a significant shift in how government agencies approach cybersecurity defense. According to Bloomberg, NSA officials have been "impressed by its speed and efficiency" in finding potential security flaws that human security researchers would take days to discover.
This isn't theoretical. The UK AI Security Institute (AISI) conducted rigorous government evaluations of Mythos and found it achieved a 73% success rate on expert-level capture-the-flag (CTF) cybersecurity challenges—tasks that no AI model could complete before April 2025. More striking: Mythos became the first AI model to autonomously complete "The Last Ones," a 32-step corporate network attack simulation that takes human security professionals an estimated 20 hours to finish.
The dual-use dilemma is stark. While the NSA is testing Mythos to find vulnerabilities before adversaries do, the same AI capability that makes it valuable for defense also makes it a powerful offensive tool. And here's the problem every CISO should understand: according to one source, more than 99% of vulnerabilities discovered by Mythos on April 7, 2026, were still unpatched in widely deployed systems.
What Mythos Can Do That Traditional Security Tools Can't
Mythos doesn't just scan for known vulnerabilities—it discovers them. Traditional automated security tools rely on signature-based detection and known vulnerability databases (CVEs). Mythos, by contrast, can:
- Execute multi-stage attacks across network segments (spanning reconnaissance, lateral movement, privilege escalation, and exfiltration)
- Discover and exploit zero-day vulnerabilities in major operating systems and web browsers, including flaws that have existed for decades
- Autonomously chain dozens of attack steps together without human guidance
- Scale performance with increased compute (UK AISI testing showed continued improvement up to 100 million token budgets)
In the UK government's evaluation, Mythos completed an average of 22 out of 32 steps in the corporate network attack simulation across all attempts—compared to the next-best model (Claude Opus 4.6), which averaged only 16 steps.
For context: Expert-level CTF challenges test specific cybersecurity skills in isolation (code injection, privilege escalation, network pivoting). Real-world attacks require chaining these skills across multiple systems over hours or days. Mythos is the first AI model to bridge that gap in controlled testing environments.
The Enterprise Angle: Defense vs. Offense
The NSA isn't the only organization testing Mythos for defensive purposes. Project Glasswing—a consortium including Microsoft, AWS, Apple, and Google—is using Mythos to "harden software against potential threats," according to Anthropic. The goal: use AI to find vulnerabilities before attackers do, then patch them at scale.
But there's a catch. The UK AISI evaluation noted important limitations in their testing:
- Ranges lacked active defenders and defensive tooling (real enterprise environments have EDR, SIEM, SOC monitoring)
- No penalties for actions that would trigger security alerts (in production, attackers face detection risk)
- Focused on weakly defended and vulnerable systems (not hardened enterprise environments)
This means Mythos can autonomously attack "small, weakly defended and vulnerable enterprise systems where access to a network has been gained," but we don't yet know how it would perform against well-defended Fortune 500 infrastructure with mature security programs.
The business impact for CISOs and security teams is twofold:
- Defensive opportunity: AI-powered vulnerability discovery can dramatically accelerate security testing cycles—from quarterly penetration tests to continuous AI-assisted scanning
- Offensive risk: Adversaries with access to similar AI models can automate reconnaissance and exploitation at scale, making "time to exploit" a more critical metric than ever
Cost and ROI: AI-Powered Security Testing vs. Traditional Pentesting
Traditional penetration testing costs enterprises $15,000–$50,000+ per engagement (depending on scope), with testing cycles typically running quarterly or annually. A senior security consultant charges $200–$400/hour, and a comprehensive enterprise pentest can take 200–400 hours.
AI-powered security testing changes the economics:
- Continuous testing instead of point-in-time assessments
- Automated vulnerability chaining (finding exploit paths across systems)
- Inference cost scales with compute (Anthropic hasn't published Mythos API pricing, but inference costs for frontier models typically run $5–$30 per million tokens)
The ROI equation for security leaders: If Mythos can autonomously execute what takes a senior security researcher 20 hours in a fraction of the time, the cost per vulnerability discovered drops dramatically—even accounting for AI inference costs and false positive triage.
But there's a hidden cost: deploying AI-powered security testing requires model access governance, prompt engineering expertise, and integration with existing security workflows. Early adopters will face integration challenges that traditional pentest vendors handle out-of-the-box.
Vendor Comparison: Mythos vs. Traditional Security Tools
| Capability | Mythos (AI-Powered) | Traditional Pentesting | Automated Scanners (e.g., Nessus, Qualys) |
|---|---|---|---|
| Zero-day discovery | ✅ Finds novel vulnerabilities | ✅ Expert-dependent | ❌ Signature-based only |
| Multi-step attack chaining | ✅ Autonomous 32-step attacks | ✅ Manual expertise required | ❌ Single-step detection |
| Speed | ✅ Hours (vs. days for humans) | ⚠️ Days to weeks | ✅ Minutes to hours |
| Cost (per engagement) | ⚠️ TBD (inference + integration) | ❌ $15K–$50K+ | ✅ $2K–$10K/year (subscription) |
| False positive rate | ⚠️ Unknown (early-stage) | ✅ Low (expert validation) | ❌ High (requires triage) |
| Compliance reporting | ❌ Not yet integrated | ✅ Standard deliverables | ✅ Compliance templates |
Key takeaway: AI-powered security testing excels at speed and autonomous discovery but lacks the compliance integration and human validation workflows that mature security programs require. Expect hybrid approaches: AI for continuous discovery, humans for validation and remediation prioritization.
What Security Leaders Should Do Now
The UK AISI's advice to enterprises is blunt: "This highlights the importance of cybersecurity basics." Specifically:
- Patch aggressively. If 99% of Mythos-discovered vulnerabilities remain unpatched, your attack surface is wider than you think.
- Implement Zero Trust access controls. Mythos succeeds when it gains network access and can move laterally—segment networks and enforce least-privilege access.
- Invest in comprehensive logging. AI-powered attacks leave digital footprints—if you're logging comprehensively and monitoring actively.
- Adopt the UK Cyber Essentials framework. Security updates, robust access controls, security configuration, and logging are table stakes.
For forward-looking CISOs, consider these strategic questions:
- Can we use AI-powered security testing defensively? (Pilot programs with Anthropic, OpenAI, or similar models for internal vulnerability discovery)
- How do we detect AI-powered attacks in our environment? (Traditional signature-based detection won't catch novel AI-discovered exploits)
- What's our mean time to patch (MTTP) for critical vulnerabilities? (If it's measured in weeks, you're vulnerable to AI-accelerated exploitation)
The dual-use nature of Mythos means enterprises face both opportunity and risk. The same AI that can harden your defenses can also be used by adversaries to find your weaknesses faster than ever.
The Geopolitical Context: Why the NSA Is Involved
The political backdrop is messy. The Trump administration previously designated Anthropic as a "supply chain risk" due to disagreements over AI use for autonomous weapons and mass surveillance. This theoretically restricts federal agencies from using Anthropic's models—yet Bloomberg reports the NSA is actively testing Mythos anyway.
Why? Because the cybersecurity implications are too significant to ignore. If an AI model can autonomously discover vulnerabilities in critical infrastructure (Microsoft products are deployed across government and enterprise networks), the national security calculus changes. The White House is reportedly exploring ways for federal agencies to access Mythos while maintaining supply chain restrictions.
For enterprise leaders, this signals a broader trend: AI-powered cybersecurity is transitioning from research novelty to operational necessity. If the NSA is testing Mythos despite political complications, it's because the defensive value outweighs bureaucratic friction.
What Comes Next: The AI Security Arms Race
Mythos represents a capability threshold. For the first time, an AI model can autonomously execute complex, multi-step cyberattacks that previously required human expertise. This isn't a lab demo—it's a government-validated capability that has already found thousands of unpatched vulnerabilities in production systems.
The implications for enterprise security leaders are urgent:
- Assume adversaries have similar capabilities. If Mythos can do this, so can models built by nation-state actors or well-funded cybercrime groups.
- Accelerate patch cycles. The window between vulnerability discovery and exploitation is shrinking from months to days (or hours).
- Invest in AI-powered defense. The only way to defend against AI-powered attacks at scale is with AI-powered defense at scale.
The good news: The same organizations testing Mythos offensively (NSA, UK AISI) are also exploring defensive use cases. Project Glasswing's collaboration with Microsoft, AWS, Apple, and Google suggests the industry recognizes the dual-use opportunity.
The bad news: We're in an arms race. Enterprises that treat this as a distant future threat will find themselves defending against AI-accelerated attacks with yesterday's tools.
Want to calculate your own AI ROI? Try our AI ROI Calculator — takes 60 seconds and shows projected savings, payback period, and 3-year ROI.
Continue Reading
- Microsoft Agent 365 Ships: The $99 SKU Is Not Your Bill — Enterprise AI governance and security sprawl
- AI Workforce Automation: Real Enterprise ROI Data from 2026 Deployments — Cost savings and productivity benchmarks
- Zero Trust Architecture: The 2026 Enterprise Playbook — Network segmentation and least-privilege access
Sources:
- Bloomberg: NSA Testing Anthropic's Mythos to Find Flaws in Microsoft Tech (April 30, 2026)
- UK AI Security Institute: Our evaluation of Claude Mythos Preview's cyber capabilities
- Anthropic: Project Glasswing