Cursor Ships Cloud Agents That Build, Test, and Demo Code

Cursor just gave its AI coding agents their own computers.

On February 24, the $29.3 billion AI coding startup shipped Cloud Agents with Computer Use - a feature that lets autonomous agents run inside isolated virtual machines with full development environments. Two days later, Cursor followed up with Bugbot Autofix, which spawns cloud agents to automatically find and fix bugs in pull requests. The company says more than 30% of the pull requests it merges internally are now created by agents operating autonomously in cloud sandboxes.

That is not a typo. Nearly a third of the code changes shipping inside the company building one of the most popular AI coding assistants are written, tested, and submitted by the assistant itself.

TL;DR

Cursor cloud agents run in isolated VMs with full dev environments, producing merge-ready PRs with video recordings, screenshots, and logs
Bugbot Autofix spawns agents to automatically detect and fix PR issues - 35% of proposed fixes get merged
Over 30% of Cursor's own merged PRs are now created by autonomous agents
Available on web, desktop, mobile, Slack, and GitHub at $20-$200/month depending on plan

What Cloud Agents Actually Do

Previous Cursor agents ran locally inside your editor. They could write code, but they competed for your machine's resources and could not test what they built. Cloud agents remove both constraints.

Each agent gets its own isolated VM with a complete development environment. It can spin up servers, open browsers, navigate web pages, manipulate spreadsheets, run test suites, and interact with the software it creates - all without touching your local setup. When it finishes, it packages the result as a pull request with video recordings, screenshots, and logs so you can review the work without checking out the branch.

A code editor displaying multiple files in a modern development environment Cursor's cloud agents operate inside isolated VMs with full development environments, navigating browsers and testing software autonomously.

The VM Architecture

The isolation model is straightforward. Each agent operates in its own sandbox, preventing agents from interfering with each other or with your local workspace. This enables a key scaling property: you can run 10 to 20 agents in parallel, each working on a different task, without your machine slowing down.

Cursor also allows you to control the agent's remote desktop directly, making edits without pulling the branch locally. The agents are accessible from the Cursor editor, the web at cursor.com/agents, mobile apps, Slack, and GitHub.

Real-World Use Cases

Cursor's own engineering team has been using cloud agents for production work. In internal testing, agents handled marketplace plugin development, built GitHub source code linking, managed git operations including rebasing and conflict resolution, and performed thorough documentation site walkthroughs.

One remarkable example: a cloud agent reproduced a clipboard exfiltration vulnerability by building an exploit demonstration, starting backend servers, and documenting the complete attack flow - the kind of security testing that normally requires a dedicated engineer.

Bugbot Autofix Closes the Loop

Two days after cloud agents shipped, Cursor released Bugbot Autofix out of beta. Where the original Bugbot flagged issues in pull requests, Autofix goes further - it spawns cloud agents that independently fix the problems it finds.

The numbers are encouraging. Bugbot's issue detection rate has nearly doubled in the past six months, while its resolution rate climbed from 52% to 76%, suggesting fewer false positives. Of the fixes Autofix proposes, over 35% get merged directly into the base PR without modification.

Metric	Before Autofix	With Autofix
Resolution rate	52%	76%
Auto-proposed fixes	None	35% merge rate
Agent capabilities	Flagging only	Fix, test, and verify

Teams can configure Autofix through their Bugbot dashboard. Once enabled, every PR automatically receives proposed fixes alongside the review comments. Cursor says it plans to extend the system beyond code review into custom automations and continuous codebase monitoring.

A developer working at a monitor with code visible on screen Bugbot Autofix now handles the full cycle from detection to fix, with agents testing their own changes in isolated environments before submitting PRs.

Where It Falls Short

The 30% internal PR stat is impressive but comes with caveats. Cursor's own codebase is unusually well-suited for agent-driven development - the team has spent years building tooling and infrastructure that agents can leverage. A typical enterprise codebase with legacy dependencies, undocumented APIs, and complex deployment pipelines will likely see lower adoption rates initially.

Pricing is another consideration. Cloud agents require the Pro+ plan at $60/month or Ultra at $200/month. Background agents bill separately from subscription credits and require MAX mode, which adds a 20% surcharge on model usage. For a solo developer, the economics work out. For a 50-person team at the Business tier ($40/user/month), the gradual cost of running dozens of parallel agents could add up quickly.

There are also the standard concerns around AI-generated code - security vulnerabilities, hallucinated dependencies, and the gradual erosion of architectural understanding among developers who lean too heavily on agents. Cursor's approach of producing video recordings and detailed logs for every agent run is a sensible mitigation, but it shifts the bottleneck from writing code to reviewing code, and code review is already the slowest part of most development workflows.

The Competitive Landscape

Cursor isn't operating in a vacuum. GitHub Copilot recently opened Claude and Codex to all paid users, and Apple's Xcode 26.3 just shipped its own agentic coding capabilities. Mistral launched Vibe 2.0 to challenge Copilot directly, while Claude Code continues to gain traction among terminal-first developers.

What distinguishes Cursor's cloud agents is the computer use component. Most competing tools generate code and maybe run tests. Cursor's agents can open a browser, click through a UI, fill out forms, take screenshots, and record video - proving that the code works rather than just asserting it. That's a meaningful differentiator if the execution holds up at scale.

Feature	Cursor Cloud Agents	GitHub Copilot	Claude Code
Isolated VMs	Yes	No	No
Computer use	Browser, UI, screenshots	No	Terminal only
Video proof	Yes	No	No
Background execution	Up to 20 parallel	Limited	Background tasks
Starting price	$60/month (Pro+)	$10/month	$20/month

A terminal window showing code execution in a dark interface The competitive landscape for AI coding agents is growing, with Cursor, GitHub Copilot, Claude Code, and Mistral Vibe all vying for developer adoption.

What to Watch

Cursor hit $1.2 billion in annualized revenue in 2025, growing over 1,000% year-over-year. Its client list includes Coinbase, OpenAI, eBay, Datadog, and Sentry. The $2.3 billion Series D at a $29.3 billion valuation, backed by Thrive, a16z, NVIDIA, and Google, gives the company significant runway to iterate on the agent infrastructure.

The real test is whether cloud agents can handle the messiness of real-world codebases. Cursor's internal 30% PR stat is a signal, not a guarantee. If the number holds for external teams with diverse stacks and legacy systems, Cursor will have built a genuine moat in the AI coding agent market. If it doesn't, cloud agents become an expensive demo that works best on Cursor's own code.

Either way, the direction is clear. AI agents are moving from suggesting code to shipping code, and Cursor just made the most concrete bet yet on that transition.

Sources: