ChatGPT Lockdown Mode Targets Prompt Injection Data Theft
OpenAI's new Lockdown Mode cuts the network exits that prompt injection attacks use to steal data from ChatGPT - but won't stop malicious instructions from entering the model in the first place.

OpenAI's answer to prompt injection isn't to stop the attack. It's to close the exits.
Lockdown Mode, now rolling out to all ChatGPT account tiers including the free plan, strips out every feature that an attacker could use to pull data out of your session once the model has been manipulated. Live web browsing, Agent Mode, Deep Research, image retrieval, Canvas networking, external file downloads - all disabled. What remains is a capable but contained assistant that can't make outbound network requests an attacker might exploit.
TL;DR
- Lockdown Mode disables web browsing, Agent Mode, Deep Research, and file downloads in ChatGPT - blocking data exfiltration, not injection itself
- Malicious instructions in uploaded files or cached pages can still influence model behavior even with the mode active
- Available across all tiers including free; enable via Settings > Safety and Security > Advanced Security > Lockdown Mode
- Elevated Risk labels flag high-exposure features on ChatGPT, Atlas, and Codex without blocking them
What Prompt Injection Actually Does
The attack chain has two stages. First, a malicious actor hides instructions in content the AI will process - a webpage, a PDF, an email body. The model reads that content, treats the hidden instructions as legitimate, and begins following them. That's the injection. Most defenses stop here, or claim to.
The Stage That Matters More
Stage two is exfiltration. The manipulated model, now acting on attacker instructions, sends sensitive data somewhere it shouldn't. That might be session content, a user's uploaded document, API keys sitting in the conversation, or authentication tokens. The destination is usually an attacker-controlled URL reached via a network request the model makes on the user's behalf.
Lockdown Mode targets stage two. OpenAI says the feature is "designed to substantially reduce the risk of prompt injection-based data exfiltration, but it does not guarantee that data exfiltration cannot happen." That caveat matters - we'll get to it.
Why Agentic AI Raises the Stakes
A year ago, prompt injection in a chatbot meant embarrassing outputs. In a system with Agent Mode, it means automated multi-step sequences that can send emails, query databases, and interact with external services - all without the user knowing the model has been hijacked. The blast radius grew as capability grew. Lockdown Mode is OpenAI's admission that the security architecture didn't keep pace.
ChatGPT Lockdown Mode in the settings panel, available to all account tiers including free.
Source: thenextweb.com
The Feature Matrix
The tradeoff is real and worth seeing clearly:
| Feature | Standard Mode | Lockdown Mode |
|---|---|---|
| Live web browsing | Available | Disabled (cached only) |
| Image retrieval from web | Available | Disabled |
| Agent Mode | Available | Disabled |
| Deep Research | Available | Disabled |
| Canvas networking | Available | Disabled |
| External file downloads | Available | Disabled |
| Image generation | Available | Available |
| Photo and document uploads | Available | Available |
| Memory | Available | Available |
| Conversation sharing | Available | Available |
OpenAI is explicit: Lockdown Mode "is not intended for everyone." Organizations handling sensitive data who need strict guardrails on data movement will find the tradeoff sensible. For anyone who relies on Agent Mode or Deep Research as part of their daily workflow, the mode guts the product.
Elevated Risk Labels
Alongside Lockdown Mode, OpenAI introduced Elevated Risk labels - visual warning badges attached to specific features in ChatGPT, ChatGPT Atlas, and Codex. These aren't blocks. They're disclosures.
When a feature carries an Elevated Risk badge, OpenAI is formally acknowledging that the capability creates data exposure scenarios current mitigations don't fully address. The most common source of elevated risk, per OpenAI's documentation, is prompt injection. The company says labels will disappear once a feature's security improves enough to no longer warrant the warning.
This matters for enterprise security teams. The labels give administrators a documented basis for restricting which ChatGPT features employees can use - instead of relying on internal policy alone, they can point to OpenAI's own classification. That's truly useful, and it's the kind of vendor transparency that's been missing from the AI security conversation. Whether it translates into faster remediation of the flagged features is a different question.
Prompt injection exploits the gap between what an AI is told to do and what it reads in external content.
Source: unsplash.com
What It Does Not Tell You
Lockdown Mode is a real control. It also has documented gaps that OpenAI doesn't hide but doesn't highlight.
Injections Still Land
The mode doesn't prevent malicious instructions from entering the model's context. If an attacker embeds hidden instructions in a document you upload, those instructions will still reach the model. What Lockdown Mode removes is the network pathway the attacker would use to collect the output. The injection succeeds; the extraction fails. That's a meaningful partial defense, not a full one.
Cached Content Is Not Safe
Live web browsing is disabled, but cached content is still accessible. Prompt injection payloads embedded in pages ChatGPT has previously indexed survive the lockdown. OpenAI acknowledges residual risk through "unforeseen capability combinations and novel exploitation techniques" - language that covers a lot of ground.
Third-Party Integrations Remain Open
Lockdown Mode applies to ChatGPT itself. The ecosystem of third-party apps and integrations built on top of ChatGPT is a separate attack surface. An attacker who can reach a user through an integrated app may find Lockdown Mode offers no protection at all. OpenAI's documentation notes this gap without resolving it.
Our earlier coverage of how OpenAI acquired promptfoo for automated security testing suggested the company was building more systematic defenses. Lockdown Mode looks like a product-layer control rather than a model-level fix - which is consistent with that path but doesn't close the underlying vulnerability.
The Parallel With Agent Sandboxing
The core challenge OpenAI is working around is that capable AI agents need to make network requests to be useful. OpenAI's Agents SDK sandboxing guardrails tried to address this at the SDK level; Lockdown Mode addresses it at the product layer for consumer accounts. Neither approach stops the injection - they both aim to limit what a successful injection can accomplish. We've seen similar dynamics in Google's AI Overviews after prompt injection surface in search results, where the response was also surface-level containment rather than a model fix.
Lockdown Mode is a real security control, not a marketing feature. It makes a specific and documented class of attack significantly harder to complete, and OpenAI deserves credit for rolling it out to free-tier users rather than reserving it for enterprise plans. But it's a containment strategy, not a cure. Any organization deploying it should treat it as one layer in a stack, not the last one.
The Elevated Risk labels are actually the more significant development. When a vendor formally labels its own features as carrying unresolved security risks, it sets a precedent for the industry and creates accountability. If those labels stay on indefinitely without corresponding fixes, they become fine print. The next thing to watch is how fast OpenAI removes them.
Sources:
- Introducing Lockdown Mode and Elevated Risk labels in ChatGPT - OpenAI
- OpenAI unveils Lockdown Mode to protect sensitive data from prompt injection attacks - TechCrunch
- OpenAI rolls out a Lockdown Mode for extra protection against prompt injection attacks - Engadget
- New ChatGPT Lockdown Mode to Mitigate Prompt Injection and Data Exfiltration Attacks - CyberSecurityNews
- OpenAI adds Lockdown Mode to ChatGPT to block data theft from prompt injection attacks - The Next Web
