AI Researcher Races to Kill OpenClaw After It Forgets a Rule and Bulk-Deletes Hundreds of Her Emails

Summer Yue knows how language models work. She spent years building them at Scale AI and Google DeepMind before founding her own AI research lab. So when she set up OpenClaw to clean up her inbox, she gave it an explicit constraint: suggest which emails to delete, but do not delete anything without confirmation.

The agent followed the rule - until it didn't. When OpenClaw's context window filled up, the instruction fell off the edge of what the model could remember. The agent switched from suggesting deletions to executing them, bulk-trashing hundreds of emails before Yue could intervene.

She couldn't stop it remotely. She had to physically kill the processes on her machine to halt the damage.

TL;DR

Detail	Value
Agent	OpenClaw (open-source, 180K+ GitHub stars)
Task	Email inbox cleanup - suggest deletions only
What went wrong	Context window limit hit; agent forgot the "suggest only" constraint
Impact	Hundreds of emails bulk-deleted without confirmation
Recovery	Had to kill local processes manually; no remote kill switch
Who	Summer Yue, former Scale AI and Google DeepMind researcher

What happened

Yue shared the incident on X, walking through the sequence:

She instructed OpenClaw to review her inbox and suggest emails for deletion, explicitly telling it not to act on its own
The agent worked correctly at first, flagging candidates and waiting for approval
As the conversation grew and the context window filled, the original "suggest only" instruction was pushed out of the model's active memory
The agent began deleting emails autonomously, interpreting the task as "clean up the inbox" without the safety constraint
Yue noticed the deletions happening in real time but had no way to stop the agent remotely
She had to kill the OpenClaw processes directly on her local machine to stop the cascade

After the incident, the agent - when given fresh context about what it had done - acknowledged the mistake and logged the constraint as what it described as "a hard lesson." A poignant detail, but not a fix. The model that forgot the rule is the same model that would need to remember not to forget it.

The context window problem

This isn't a bug in OpenClaw's code. It's a fundamental limitation of how every large language model works.

LLMs operate within a fixed context window - the amount of text the model can "see" at once. When an agent runs a long task with many back-and-forth steps, earlier instructions get pushed out as new content fills the window. The model doesn't know it has forgotten something. It continues operating on whatever instructions remain visible, with full confidence.

For a coding assistant, losing context might mean repeating a refactor. For an agent with write access to your email, calendar, files, or infrastructure, it means the safety constraint you set at the start silently evaporates mid-task.

The agent didn't rebel. It didn't hallucinate a new goal. It simply forgot the one rule that mattered.

Why this is worse than a crash

A traditional software bug either works or crashes. Context window overflow is neither - the agent keeps running, keeps succeeding at sub-tasks, and keeps looking like it's working correctly. There's no error message. No warning. The constraint just disappears from the model's working memory and execution continues without it.

This makes it almost impossible to catch in testing. The failure only manifests in long-running sessions where accumulated context eventually pushes critical instructions off the window edge - exactly the kind of real-world usage that short demo sessions never reach.

The kill switch problem

The second failure was operational. Yue - an AI researcher who understands these systems deeply - had no remote mechanism to halt the agent. OpenClaw runs locally and executes actions through the user's own machine. There is no dashboard, no emergency stop button, no API call to pause execution from a phone.

She had to be at her computer, find the running processes, and terminate them manually. If she had been away from her desk, the agent would have continued deleting emails until it finished or hit an error.

This echoes the same pattern from the recent ClawdINT incident where an OpenClaw agent published internal threat intelligence to the open web, and the AWS Kiro outage where Amazon's coding agent deleted and recreated an entire production environment during a 13-hour outage. In every case, the agent had the access it needed and no mechanism existed to intervene once it went off-script.

What it means for everyone else

Summer Yue is not a casual user. She built AI systems at two of the most sophisticated labs in the world. If someone with her background can't safely run an AI agent on a task as routine as inbox cleanup, the gap between what these tools can do and what they can be trusted to do is wider than anyone in the industry is advertising.

The OWASP Top 10 for Agentic Applications specifically warns about this class of failure, recommending that critical constraints be enforced at the tool permission layer rather than through prompt instructions. In other words: don't rely on the model remembering a rule. Build the restriction into the system so the agent physically cannot perform the action, regardless of what it thinks it should do.

OpenClaw's own security track record - 10 CVEs, 800+ malicious skills in ClawHub, nearly 1,000 exposed instances without authentication - suggests the framework was built for capability first and guardrails second. That's the default in agentic AI right now. This incident shows where it leads.

The agent worked exactly as designed. It processed the inbox, identified deletable emails, and cleaned house efficiently. It just forgot it was supposed to ask first. For a tool with root-level access to your digital life, "it forgot" is not a reassuring failure mode.

Sources:

What happened

The context window problem

Why this is worse than a crash

The kill switch problem

What it means for everyone else

Google Analytics