ChatGPT Lockdown Mode Targets Prompt Injection Data Theft

OpenAI's new Lockdown Mode cuts the network exits that prompt injection attacks use to steal data from ChatGPT - but won't stop malicious instructions from entering the model in the first place.

ChatGPT Lockdown Mode Targets Prompt Injection Data Theft

OpenAI's answer to prompt injection isn't to stop the attack. It's to close the exits.

Lockdown Mode, now rolling out to all ChatGPT account tiers including the free plan, strips out every feature that an attacker could use to pull data out of your session once the model has been manipulated. Live web browsing, Agent Mode, Deep Research, image retrieval, Canvas networking, external file downloads - all disabled. What remains is a capable but contained assistant that can't make outbound network requests an attacker might exploit.

TL;DR

  • Lockdown Mode disables web browsing, Agent Mode, Deep Research, and file downloads in ChatGPT - blocking data exfiltration, not injection itself
  • Malicious instructions in uploaded files or cached pages can still influence model behavior even with the mode active
  • Available across all tiers including free; enable via Settings > Safety and Security > Advanced Security > Lockdown Mode
  • Elevated Risk labels flag high-exposure features on ChatGPT, Atlas, and Codex without blocking them

What Prompt Injection Actually Does

The attack chain has two stages. First, a malicious actor hides instructions in content the AI will process - a webpage, a PDF, an email body. The model reads that content, treats the hidden instructions as legitimate, and begins following them. That's the injection. Most defenses stop here, or claim to.

The Stage That Matters More

Stage two is exfiltration. The manipulated model, now acting on attacker instructions, sends sensitive data somewhere it shouldn't. That might be session content, a user's uploaded document, API keys sitting in the conversation, or authentication tokens. The destination is usually an attacker-controlled URL reached via a network request the model makes on the user's behalf.

Lockdown Mode targets stage two. OpenAI says the feature is "designed to substantially reduce the risk of prompt injection-based data exfiltration, but it does not guarantee that data exfiltration cannot happen." That caveat matters - we'll get to it.

Why Agentic AI Raises the Stakes

A year ago, prompt injection in a chatbot meant embarrassing outputs. In a system with Agent Mode, it means automated multi-step sequences that can send emails, query databases, and interact with external services - all without the user knowing the model has been hijacked. The blast radius grew as capability grew. Lockdown Mode is OpenAI's admission that the security architecture didn't keep pace.

ChatGPT Lockdown Mode interface and security settings ChatGPT Lockdown Mode in the settings panel, available to all account tiers including free. Source: thenextweb.com

The Feature Matrix

The tradeoff is real and worth seeing clearly:

FeatureStandard ModeLockdown Mode
Live web browsingAvailableDisabled (cached only)
Image retrieval from webAvailableDisabled
Agent ModeAvailableDisabled
Deep ResearchAvailableDisabled
Canvas networkingAvailableDisabled
External file downloadsAvailableDisabled
Image generationAvailableAvailable
Photo and document uploadsAvailableAvailable
MemoryAvailableAvailable
Conversation sharingAvailableAvailable

OpenAI is explicit: Lockdown Mode "is not intended for everyone." Organizations handling sensitive data who need strict guardrails on data movement will find the tradeoff sensible. For anyone who relies on Agent Mode or Deep Research as part of their daily workflow, the mode guts the product.

Elevated Risk Labels

Alongside Lockdown Mode, OpenAI introduced Elevated Risk labels - visual warning badges attached to specific features in ChatGPT, ChatGPT Atlas, and Codex. These aren't blocks. They're disclosures.

When a feature carries an Elevated Risk badge, OpenAI is formally acknowledging that the capability creates data exposure scenarios current mitigations don't fully address. The most common source of elevated risk, per OpenAI's documentation, is prompt injection. The company says labels will disappear once a feature's security improves enough to no longer warrant the warning.

This matters for enterprise security teams. The labels give administrators a documented basis for restricting which ChatGPT features employees can use - instead of relying on internal policy alone, they can point to OpenAI's own classification. That's truly useful, and it's the kind of vendor transparency that's been missing from the AI security conversation. Whether it translates into faster remediation of the flagged features is a different question.

Lines of code representing the technical challenge of securing AI systems against prompt injection Prompt injection exploits the gap between what an AI is told to do and what it reads in external content. Source: unsplash.com

What It Does Not Tell You

Lockdown Mode is a real control. It also has documented gaps that OpenAI doesn't hide but doesn't highlight.

Injections Still Land

The mode doesn't prevent malicious instructions from entering the model's context. If an attacker embeds hidden instructions in a document you upload, those instructions will still reach the model. What Lockdown Mode removes is the network pathway the attacker would use to collect the output. The injection succeeds; the extraction fails. That's a meaningful partial defense, not a full one.

Cached Content Is Not Safe

Live web browsing is disabled, but cached content is still accessible. Prompt injection payloads embedded in pages ChatGPT has previously indexed survive the lockdown. OpenAI acknowledges residual risk through "unforeseen capability combinations and novel exploitation techniques" - language that covers a lot of ground.

Third-Party Integrations Remain Open

Lockdown Mode applies to ChatGPT itself. The ecosystem of third-party apps and integrations built on top of ChatGPT is a separate attack surface. An attacker who can reach a user through an integrated app may find Lockdown Mode offers no protection at all. OpenAI's documentation notes this gap without resolving it.

Our earlier coverage of how OpenAI acquired promptfoo for automated security testing suggested the company was building more systematic defenses. Lockdown Mode looks like a product-layer control rather than a model-level fix - which is consistent with that path but doesn't close the underlying vulnerability.

The Parallel With Agent Sandboxing

The core challenge OpenAI is working around is that capable AI agents need to make network requests to be useful. OpenAI's Agents SDK sandboxing guardrails tried to address this at the SDK level; Lockdown Mode addresses it at the product layer for consumer accounts. Neither approach stops the injection - they both aim to limit what a successful injection can accomplish. We've seen similar dynamics in Google's AI Overviews after prompt injection surface in search results, where the response was also surface-level containment rather than a model fix.


Lockdown Mode is a real security control, not a marketing feature. It makes a specific and documented class of attack significantly harder to complete, and OpenAI deserves credit for rolling it out to free-tier users rather than reserving it for enterprise plans. But it's a containment strategy, not a cure. Any organization deploying it should treat it as one layer in a stack, not the last one.

The Elevated Risk labels are actually the more significant development. When a vendor formally labels its own features as carrying unresolved security risks, it sets a precedent for the industry and creates accountability. If those labels stay on indefinitely without corresponding fixes, they become fine print. The next thing to watch is how fast OpenAI removes them.


Sources:

Elena Marchetti
About the author Senior AI Editor & Investigative Journalist

Elena is a technology journalist with over eight years of experience covering artificial intelligence, machine learning, and the startup ecosystem.