OpenAI Launches Codex Security, 14 Days After Anthropic

TL;DR

OpenAI launches Codex Security in research preview on March 6, exactly 14 days after Anthropic's Claude Code Security announcement
During beta, scanned 1.2 million commits and flagged 792 critical plus 10,561 high-severity vulnerabilities
Earned 14 CVEs across OpenSSH, GnuTLS, GOGS, Chromium, PHP, libssh, and gpg-agent
Available to ChatGPT Enterprise, Business, and Edu customers - free for the first month
Open-source maintainers get free access through the new Codex for OSS program

Fourteen days. That's how long it took for OpenAI to respond after Anthropic's Claude Code Security sent cybersecurity stocks into freefall and forced the industry to confront what AI vulnerability scanning actually looks like at scale. On March 6, OpenAI launched Codex Security in research preview - and brought receipts.

The tool, which evolved from the Aardvark private beta that launched last October, scanned 1.2 million commits across external repositories during its beta period. It surfaced 792 critical-severity and 10,561 high-severity findings. OpenAI also disclosed 14 CVEs across major open-source projects, including GnuTLS, Chromium, PHP, and OpenSSH.

How Codex Security Works

The system operates in three stages, each designed to reduce the noise that plagues traditional static analysis.

Stage 1: Threat modeling. Codex Security clones a repository into an isolated container and analyzes the project's architecture - file structure, trust boundaries, authentication flows, data handling patterns. It produces an editable threat model in natural language that describes what the system does and where it's most exposed. Teams can review and adjust this model before scanning begins.

Stage 2: Context-aware scanning. Using the threat model as a foundation, the agent scans for vulnerabilities and classifies each finding based on real-world exploitability rather than abstract pattern matching. A SQL injection in a test fixture gets triaged differently than one in a production API endpoint. This is where the noise reduction claims come from.

Stage 3: Sandbox validation. Flagged issues get pressure-tested in a sandboxed environment. The agent attempts to construct proof-of-concept exploits to confirm exploitability, then ranks validated findings by severity and generates remediation code with explanations.

The pipeline can take hours or days depending on repository size - this isn't a quick linting pass.

The CVE Haul

OpenAI's 14 CVE assignments span seven projects:

Project	Vulnerability	CVE
GnuTLS	Heap buffer overread in SCT extension parsing	CVE-2025-32989
GnuTLS	Double-free in otherName SAN export	CVE-2025-32988
GOGS	Two-factor authentication bypass	CVE-2025-64175
GOGS	Unauthenticated access bypass	CVE-2026-25242
gpg-agent	Stack buffer overflow (2 findings)	Pending
OpenSSH	Not disclosed (in responsible disclosure)	Pending
PHP	Not disclosed	Pending
Chromium	Not disclosed	Pending
libssh	Not disclosed	Pending

The GnuTLS findings are technically interesting. A heap buffer overread in certificate transparency SCT parsing and a double-free during Subject Alternative Name export are both the kind of memory safety bugs that C codebases accumulate over decades. Neither is the sort of pattern that shows up in signature databases.

The GOGS vulnerabilities are arguably more impactful. CVE-2025-64175 bypasses two-factor authentication entirely, and CVE-2026-25242 allows unauthenticated access. GOGS is a self-hosted Git service used by organizations that specifically chose it to avoid depending on GitHub. Authentication bypasses in that context are severe.

Beta Metrics

OpenAI is claiming significant improvements over the Aardvark beta:

84% noise reduction between initial rollout and the current version
50%+ decrease in false positive rates across all monitored repositories
90% reduction in over-reported severity levels
Critical vulnerabilities appeared in fewer than 0.1% of scanned commits

These numbers describe internal improvement over time, not absolute accuracy against a benchmark. That's an important distinction. The 50% false positive reduction means "50% fewer false positives than our first version," not "50% fewer than Snyk."

Head-to-Head With Claude Code Security

The timing makes comparison inevitable. Semgrep published an independent evaluation that tested both tools against modern web application vulnerabilities. Their findings:

Metric	Claude Code Security	Codex Security
Vulnerabilities found	46	21
True positive rate	14%	18%
False positive rate	86%	82%

Claude finds more issues but with lower precision. Codex finds fewer but is slightly more accurate. Both tools have false positive rates above 80%, which means the majority of flagged issues aren't real vulnerabilities. That's comparable to or worse than existing SAST tools for this particular benchmark, though the nature of what they find - semantic, multi-file vulnerabilities versus pattern matches - is qualitatively different.

The headline numbers tell a different story. Anthropic's red team found 500+ high-severity vulnerabilities in open-source projects. OpenAI's beta flagged 11,353 across 1.2 million commits. These aren't directly comparable - different codebases, different thresholds, different disclosure timelines - but both numbers are large enough to confirm that AI-powered scanning finds things traditional tools miss.

Availability and Pricing

Codex Security is rolling out in research preview to ChatGPT Enterprise, Business, and Edu customers through the Codex web interface. The first month is free. Pricing after that hasn't been disclosed.

OpenAI also launched Codex for OSS, a program giving open-source maintainers free ChatGPT Pro and Plus accounts, code review support, and Codex Security access. The vLLM inference engine team is among the first participants. OpenAI says it plans to expand the program in the coming weeks.

Claude Code Security, by comparison, is in limited research preview for Claude Enterprise and Team customers, with free expedited access for open-source maintainers.

Both companies are making the same bet: give security tools away to open source, lock in enterprise customers with the paid version.

What This Race Means

Two weeks between major AI security tool launches from the two leading foundation model companies isn't coincidence. Both Anthropic and OpenAI see vulnerability scanning as a wedge into enterprise security budgets - and both are willing to give the product away initially to establish their models as the default security layer in development workflows.

The competitive pressure has one clear beneficiary: open-source maintainers who now have two free, AI-powered security scanners to choose from. The cybersecurity industry that watched billions evaporate from its market cap on February 20 now faces a second entrant with its own CVE track record.

For enterprise buyers, the choice between Codex Security and Claude Code Security will likely come down to which AI platform they already use for coding. OpenAI has Codex and ChatGPT Enterprise; Anthropic has Claude Code. Security scanning becomes a feature that deepens platform lock-in rather than a standalone product.

The false positive rates from both tools - above 80% in Semgrep's testing - suggest neither is ready to replace human security review. But that was never the pitch. The pitch is finding the bugs that humans and traditional tools miss, then letting humans triage the results. On that metric, 14 CVEs across seven major open-source projects in a single beta period is hard to argue with.

Sources:

Codex Security: now in research preview (OpenAI)
OpenAI Codex Security Scanned 1.2 Million Commits and Found 10,561 High-Severity Issues (The Hacker News)
OpenAI introduces Codex Security to help developers fix software vulnerabilities (SiliconANGLE)
OpenAI Launches Codex Security To Find Vulnerabilities in Code (Unite.AI)
OpenAI launches Codex Security, an AI agent designed to detect vulnerabilities (The Decoder)
Finding vulnerabilities in modern web apps using Claude Code and OpenAI Codex (Semgrep)
Claude Code Security (Anthropic)