Cloudflare Sandboxes Hit GA - Real Computers for AI Agents

TL;DR

Cloudflare Sandboxes and Containers are now generally available - persistent isolated environments where AI agents get a real shell, filesystem, and background processes
Agents can clone repos, run Python/JS, start dev servers, expose preview URLs, and debug via PTY terminal over WebSocket
State persists across requests: same sandbox ID from anywhere in the world returns the same environment with all files and variables intact
You only pay for active CPU cycles - idle time while waiting for LLM responses costs nothing
Figma is already using it for Figma Make; snapshots (full disk state preservation) coming soon

AI agents have a gap between what they can reason about and what they can actually do. They can write code but can't run it. They can plan a debugging session but can't open a terminal. Most agent frameworks work around this by shelling out to local processes or spinning up containers that die between requests.

Cloudflare's answer, now generally available, is to give agents a persistent computer.

What a Sandbox actually is

A Cloudflare Sandbox is an isolated computing environment powered by Cloudflare Containers. It has a shell, a filesystem, and background processes. Request a sandbox by name and you get the same environment back - from anywhere, every time. When nobody's using it, it sleeps. When a request arrives, it wakes.

The core SDK (@cloudflare/sandbox) exposes what amounts to a remote operating system API:

exec() runs shell commands with real-time streaming output
gitCheckout() clones repositories
createCodeContext() creates persistent Python/JS/TypeScript interpreters where variables survive between calls (think Jupyter notebooks, but headless)
startProcess() launches background services like dev servers
exposePort() generates public preview URLs for running services
terminal() opens a full PTY session over WebSocket, compatible with xterm.js
watch() monitors filesystem changes via Linux inotify in real-time

The practical effect: an agent can clone a repo, install dependencies, run the test suite, read the failures, edit code, and re-run tests - the same tight loop a human developer uses, without any of it touching the user's machine.

Persistent code interpreters

The code interpreter contexts are where this gets interesting for agent workflows. Variables and imports persist across execution calls:

# First call: load and process data
import pandas as pd
df = pd.read_csv("sales.csv")
margins = df["revenue"] - df["cost"]

# Second call: the dataframe is still there
summary = df.groupby("region").agg({"revenue": "sum"})

Output supports matplotlib charts, Pandas HTML tables, and structured JSON. For agents doing data analysis, this eliminates the constant context re-initialization that makes stateless code execution feel like Groundhog Day.

Security: credentials never enter the sandbox

The security model assumes the agent is untrusted. Credentials are injected at the network layer through a programmable egress proxy - the sandbox process never sees them. Outbound HTTP requests get authentication headers added by a Cloudflare Worker sitting between the sandbox and the internet.

outboundByHost: {
  "api.github.com": (req, env) => {
    req.headers.set("Authorization", `Bearer ${env.GITHUB_TOKEN}`);
    return req;
  }
}

This means an agent can push to GitHub without ever having access to the token. If the sandbox gets compromised, the credentials aren't there to steal.

Snapshots (coming soon)

The announced but not-yet-shipped feature that matters most: disk snapshots. When a sandbox sleeps, its full state - OS configuration, installed dependencies, modified files, data - gets persisted to R2 storage with tiered caching.

The performance difference is significant: cloning a repo and running npm install takes 30 seconds. Restoring from a snapshot takes 2 seconds. Future releases promise live memory snapshots that resume running processes exactly where they stopped.

Pricing and scale

The billing model charges only for active CPU time. When an agent is waiting for an LLM to respond - which is most of the time - the sandbox incurs zero cost. Current capacity: 15,000 concurrent lite instances, 6,000 basic, 1,000+ larger configurations.

Figma is already in production with it. Alex Mullans from Figma told Cloudflare: "Cloudflare Containers is that solution" for running untrusted agent and user code at scale. They use it to power Figma Make, their concept-to-production design tool.

What this enables

The broader context is Cloudflare's Agents Week, a suite of announcements positioning the company as the infrastructure layer for AI agents. Alongside Sandboxes:

cf CLI unifies ~3,000 Cloudflare API operations into a single command-line tool
Durable Object Facets provide isolated SQLite databases for stateful agent workers
Sandbox Auth adds identity-aware outbound networking

The pitch is clear: agents need more than a prompt and an API key. They need a computer. Cloudflare is selling the computer.

Sources:

Cloudflare Blog - Sandbox GA