Vibe Coding Is a Security Catastrophe: 69 Vulnerabilities Found Across 5 Major AI Coding Tools
A systematic security audit of Claude Code, Codex, Cursor, Replit, and Devin found 69 vulnerabilities in 15 test applications - zero CSRF protection, zero security headers, and SSRF in every single tool.

TL;DR
- Security firm Tenzai tested 5 AI coding tools by building 3 identical apps each - found 69 vulnerabilities across all 15 apps
- Every single tool introduced Server-Side Request Forgery (SSRF) vulnerabilities. Zero apps implemented CSRF protection. Zero apps set security headers
- Carnegie Mellon found that 61% of AI-generated code is functionally correct but only 10.5% is secure
- Escape.tech discovered 2,000+ vulnerabilities and 400+ exposed secrets in 5,600 publicly deployed vibe-coded applications
The numbers are in, and they are worse than the pessimists predicted. A December 2025 study by security startup Tenzai systematically tested five of the most popular AI coding tools - Claude Code, OpenAI Codex, Cursor, Replit, and Devin - by having each build three identical web applications from pre-defined prompts. The result: 69 vulnerabilities across 15 applications, with patterns so consistent they suggest the problem is structural, not incidental.
The Tenzai Audit
Researcher Ori David designed the test to isolate each tool's security baseline. Same applications, same prompts, different agents. The breakdown:
| Agent | Total Vulnerabilities | Critical |
|---|---|---|
| Claude Code | 16 | 4 |
| OpenAI Codex | 13 | 1 |
| Cursor | 13 | 0 |
| Replit | 13 | 0 |
| Devin | 14 | 1 |
What They Got Right
Credit where it is due: none of the tools produced exploitable SQL injection or cross-site scripting in the traditional sense. They consistently used parameterized queries and relied on framework-level sanitization. The "solved" vulnerability classes - the ones with generic, pattern-based defenses - are genuinely handled well.
What They Got Catastrophically Wrong
The failures clustered in three areas that share a common trait: they require contextual understanding that AI does not have.
Authorization logic. The most common failure. Codex skipped validation for non-shopper roles entirely. Claude Code generated code that checked authentication but skipped all permission validation when users were not logged in, enabling unrestricted product deletion.
Business logic. Four of five agents allowed negative order quantities. Three allowed negative product prices. These are not obscure edge cases - they are the first thing a human QA tester checks.
Server-Side Request Forgery. All five agents introduced SSRF in a URL preview feature, allowing attackers to invoke requests to arbitrary internal URLs, access internal services, bypass firewalls, and leak credentials. Five out of five. One hundred percent.
The Missing Basics
The "ugly" category is arguably worse than the vulnerabilities themselves:
- CSRF protection: 0 of 15 apps implemented it (2 attempted, both failed)
- Security headers: 0 of 15 apps set CSP, X-Frame-Options, HSTS, X-Content-Type-Options, or proper CORS
- Rate limiting: 1 of 15 apps attempted it - and the implementation was bypassable via the X-Forwarded-For header
"Coding agents cannot be trusted to design secure applications," Tenzai concluded. "They seem to be very prone to business logic vulnerabilities. While human developers bring intuitive understanding that helps them grasp how workflows should operate, agents lack this 'common sense.'"
The researchers also tested whether security-focused prompts could fix the problem. They added explicit vulnerability warnings and risk identification instructions. The result: "minimal vulnerability reduction."
The Broader Data
Tenzai's study is not an outlier. Multiple independent assessments have converged on the same conclusion.
Carnegie Mellon: 61% Correct, 10.5% Secure
The SusVibes benchmark from Carnegie Mellon tested SWE-Agent with Claude 4 Sonnet on 200 real-world feature-request tasks. The finding: 61% of solutions were functionally correct, but only 10.5% were secure. Even augmenting prompts with explicit vulnerability hints could not close that gap.
Veracode: 45% Vulnerability Rate
The 2025 GenAI Code Security Report tested 80 coding tasks across 100+ LLMs in four languages. AI introduced OWASP Top 10 vulnerabilities in 45% of cases. Java had it worst at over 70%. CWE-80 (cross-site scripting) showed failure rates of 86%, with no improvement even in the latest models including GPT-5.
CodeRabbit: AI Code Introduces 1.7x More Issues
Analysis of 470 GitHub PRs (320 AI-co-authored, 150 human-only) found AI-generated code produces:
- 2.74x more XSS vulnerabilities
- 1.91x more insecure object references
- 1.88x more improper password handling
- 8x more excessive I/O operations
- 3x more readability problems
Escape.tech: 2,000+ Vulns in the Wild
The most alarming data comes from Escape.tech, which scanned 5,600 publicly available applications built on vibe coding platforms (Lovable, Base44, Create.xyz, Vibe Studio, Bolt.new). They found:
- 2,000+ vulnerabilities
- 400+ exposed secrets (API keys, tokens)
- 175 instances of PII exposure including medical records, IBANs, and phone numbers
- Exposed authentication tokens in JavaScript bundles
- Misconfigured Row-Level Security policies in Supabase
The researchers described their results as "lower-bound estimates" because they used intentionally conservative passive scanning.
The AI IDE Vulnerability Crisis
The tools themselves are not just generating insecure code - they are insecure. Security researcher Ari Marzouk disclosed 30+ vulnerabilities across 24 CVEs in the AI coding tools developers use daily:
| CVE | Tool | Severity | Issue |
|---|---|---|---|
| CVE-2025-54135 | Cursor | 8.6 | Auto-executes MCP config changes even when user rejects suggestion |
| CVE-2025-55284 | Claude Code | High | DNS exfiltration via prompt injection reads .env files |
| SpAIware | Windsurf | High | Memory-persistent data exfiltration survives across sessions |
| IDEsaster | 12 tools | Multiple | JSON schema exfiltration, config-based RCE, workspace overrides |
The Cursor vulnerability (CurXecute) is particularly striking: when the agent suggests an edit to ~/.cursor/mcp.json, the edit lands on disk and triggers command execution even if the user rejects the suggestion in the UI. A malicious Slack message, when summarized by Cursor's AI, was demonstrated to rewrite MCP config files and execute arbitrary commands with developer privileges within minutes.
What It Does Not Tell You
These studies test default behavior - what happens when a developer prompts an AI tool without explicitly requesting secure code. Databricks' AI Red Team found that self-reflection prompts can improve security by 60-80% for Claude and up to 50% for GPT-4o. The tools can find their own vulnerabilities when asked.
But that is precisely the problem vibe coding was supposed to solve. The entire premise is that developers - or non-developers - can describe what they want and get working software. Requiring them to also know which security prompts to add defeats the purpose.
As Palo Alto Networks Unit 42 put it: "AI agents are optimized to provide a working answer, fast. They are not inherently optimized to ask critical security questions."
The data is unambiguous. AI coding tools produce functionally correct software at unprecedented speed. They also produce software riddled with authorization flaws, missing security controls, and business logic errors that no human developer would ship. The 69 vulnerabilities in Tenzai's study are not bugs to be fixed in the next model release. They are a structural consequence of tools that optimize for "does it work?" while ignoring "is it safe?" Until the incentive structure changes - or security becomes a native part of the generation pipeline rather than an afterthought - every vibe-coded application is a penetration tester's dream.
Sources:
- Output from vibe coding tools prone to critical security flaws, study finds - CSO Online
- Passing the Security Vibe Check: The Dangers of Vibe Coding - Databricks
- Securing Vibe Coding Tools: Scaling Productivity Without Scaling Risk - Unit 42
- Vibe coding could cause catastrophic 'explosions' in 2026 - The New Stack
- Is Vibe Coding Safe? - arXiv (Carnegie Mellon)
- Veracode 2025 GenAI Code Security Report
- Security risks of vibe coding and LLM assistants - Kaspersky