Item: Augment Cosmos
Author: Elena Marchetti

Augment Code has always had a grander theory of what AI-assisted engineering should look like. When the company shipped Intent back in February, the pitch was spec-driven multi-agent orchestration on a single developer's machine. Cosmos, which entered public preview on May 4, 2026, shifts the frame completely. This isn't a better coding assistant. It's an attempt to build the coordination layer that sits underneath all the agents your team is already running - a shared nervous system for the whole engineering organization.

TL;DR

7.5/10 - truly novel platform architecture, but $200 per developer per month is an aggressive ask for software that's still in public preview
The Context Engine and Expert Registry solve a real organizational problem: knowledge doesn't compound when every agent session starts from scratch
Public preview means rough edges, limited availability (MAX and Enterprise plans only), and a feature set that's still being built out
Right for enterprise teams managing large multi-repo codebases; wrong fit for solo developers, small teams, or anyone without compliance-driven budget authority

From IDE Plugin to Infrastructure

The simplest way to understand Cosmos is to understand what Augment thinks has gone wrong with AI coding adoption. Every engineering team now has agents. Some engineers use Claude Code, others use Cursor, others have wired up their own workflows. The code volume goes up. But when Augment looked at the organizational level, they found that productivity gains weren't compounding. Knowledge built up in one engineer's agent context reset when the session ended. Code review bottlenecks got worse, not better, as agents pushed more PRs into the queue. Quality signals degraded because no one had wired the agents into the full delivery loop.

Cosmos is the answer to that analysis. Instead of giving each developer a smarter assistant, it gives the team a platform: a shared runtime, a shared memory layer, and a set of coordinated agents that trigger across the software development lifecycle. The company describes it as "the operating system for agentic software development," which is the kind of marketing language I'd normally treat with skepticism. After spending time with the platform, though, the framing is closer to accurate than most vendor claims.

The Context Engine: Why the Numbers Hold Up

The first thing to understand about Augment's performance claims is where they come from. In April 2026, Augment's Auggie CLI agent scored 51.80% on Scale AI's SWE-bench Pro benchmark - the highest of any tested agent system at that time, ahead of Cursor at 50.21%, Claude Code at 49.75%, and OpenAI Codex at 46.47%. (The leaderboard has since moved on, with Claude Mythos Preview posting 77.8% in May 2026 under its own scaffolding, but the agent-system comparison remains instructive.)

The gap between Augment and its nearest competitors wasn't a model difference - both Augment and Cursor were routing through Claude Opus 4.7 at the time. The gap was the Context Engine.

Augment's Context Engine maintains a persistent, always-updated semantic index of your entire repository: code, dependencies, documentation, and commit history. On large codebases - the kinds with 400,000+ files and multi-repo interdependencies - this changes what an agent can actually do. A model with good context doesn't hallucinate the shape of your codebase. It reads it. The 51.80% score on SWE-bench Pro is, in Augment's own framing, "a context retrieval result, not a model result," and that's backed by the data.

Augment Cosmos product page showing the "Is your engineering org more productive yet?" headline Augment's Cosmos landing page pitches organizational transformation over individual productivity gains. Source: augmentcode.com

Expert Registry: Agents That Get Better

The novel architectural piece in Cosmos is the Expert Registry - a system for creating, storing, and sharing specialized agents across a team. Cosmos ships with four reference experts: Deep Code Review, PR Author, E2E Testing, and Incident Response. But the real promise is what happens when teams build their own.

An expert is defined by three things: a narrow task scope, domain-specific memory that builds up across sessions, and a feedback mechanism that converts corrections into persistent knowledge. The "Milo" case study from Augment's own engineering team is illustrative: their internal testing agent initially received comprehensive instructions upfront, but performance degraded as the interaction history grew. The fix was moving to a coaching model where engineers correct reasoning in real time, and those corrections persist across every future session for every team member.

The distinction Augment draws between task corrections and mental model corrections is where this gets interesting. If you tell an agent it got the wrong output format, you've fixed one task. If you teach it why your team structures API responses the way it does, you've changed how it reasons about every related problem from now on. Cosmos stores the second kind of correction in team-accessible memory, so the next engineer who triggers that expert inherits the built up context.

The next decade of engineering won't be won by the teams creating the most code. It'll be won by the teams running the best system.

In practice, this means the platform is truly self-improving in a way that individual session-based tools aren't. Whether the quality of those improvements holds up over months of real team use is something only post-GA data will confirm.

Prism: Multi-Model Routing

Cosmos is model-agnostic by design, and Prism is the mechanism that makes the claim real. Prism is a routing layer that dynamically assigns each task to the optimal model in a pool that currently includes Claude Opus 4.7, Claude Sonnet 4.6, Gemini Flash 3.0, GPT-5.5, GPT-5.4, and Kimi K2.6.

Augment's own data puts the savings at 20-30% per task versus always routing to the frontier model, at similar or better quality. Teams sending 10,000 user messages a month can expect roughly $20,000 in token cost savings per the company's published estimates. Given that Cosmos sits on the MAX plan at $200 per developer per month, every efficiency gain on the token cost side improves the total cost of ownership math.

The two primary Prism configurations are "GPT + Kimi" (targeting GPT-5.5 quality tasks) and "Claude + Gemini" (targeting Claude Opus 4.7 quality tasks). The routing logic is opaque to users - you don't choose per task, you pick a configuration and the system decides. For teams with strong preferences about which model touches which code, that opacity may be a limitation.

SWE-bench in Context: What the Numbers Mean

It's worth being careful about what the SWE-bench Pro results actually measure. The benchmark covers 1,865 long-horizon tasks from 41 real repositories across Python, Go, TypeScript, and JavaScript. Augment's 51.80% score was strong, but it came from Auggie CLI running against the public dataset. Real codebases are messier, weirder, and more idiosyncratic than benchmark repos.

The more significant finding is the gap that agent scaffolding creates. Cline in autonomous mode using Claude Sonnet 4.5 scores 59.8% on SWE-bench Verified, while the same model in a basic harness scores 43.2% - a 16-point gap from orchestration quality alone. Context retrieval is the bottleneck, and that's exactly what Augment has spent four years building.

What Cosmos adds, beyond the Context Engine, is the organizational layer: the knowledge doesn't disappear when an agent session ends. For teams generating hundreds of PRs per week through agents, that compounding effect is worth serious evaluation.

SWE-Bench Pro leaderboard from Scale AI - the benchmark Augment's Auggie CLI topped in April 2026 Scale AI's SWE-Bench Pro, the benchmark where Augment's agent system scored 51.80% in April 2026. Source: labs.scale.com

The $200 Question

Cosmos is exclusively available on Augment's MAX plan at $200 per developer per month, or on Enterprise pricing (custom). For comparison, Claude Code runs on Anthropic's standard API costs. GitHub Copilot Enterprise is around $39 per user per month. Cursor is $40 per month. Even Augment's own Standard plan - which includes the Context Engine and their coding agent - is $60 per developer per month.

The price gap reflects what Cosmos actually is: not a coding tool, but a platform. It includes the full agent runtime, Expert Registry, event bus, shared organizational memory, and the Prism routing layer on top of everything Standard already includes. The compliance certifications - SOC 2 Type II, ISO 42001, GDPR - are also enterprise-grade and priced accordingly.

For a team of five developers, that's $1,000 per month. For a team of twenty, it's $4,000. At that level, the question isn't whether Cosmos is impressive. It's whether the productivity gains are measurable and attributable. Augment doesn't publish customer ROI data beyond their own benchmarks, which is a gap worth noting for any serious evaluation.

The public preview caveat matters too. The company is explicit that there are rough edges. For most organizations, running mission-critical development workflows through preview software is a non-starter regardless of the potential.

Strengths

Context Engine sets a quality floor that's truly hard to match on large codebases - 400,000+ file indexing with semantic dependency tracking changes what agents can actually resolve
Expert Registry solves the knowledge decay problem that makes most agent deployments stall after initial novelty; institutional knowledge can now accumulate and compound
Prism routing delivers real cost savings (20-30% per task) without requiring developers to think about model selection
Workflow reduction from eight interruption points to three deliberate checkpoints reduces context-switching overhead in ways that show up in throughput, not just in demos
Compliance posture (SOC 2 Type II, ISO 42001, GDPR) is complete enough for enterprise deployment decisions without additional procurement work
Multi-platform - CLI, web, and mobile - with consistent agent behavior across all surfaces

Weaknesses

$200 per developer per month is steep for public preview software and requires significant enterprise budget authority to justify
Public preview limitations: rough edges, incomplete feature set, no GA timeline published
Model routing opacity - Prism doesn't expose per-task routing decisions, which matters for teams with compliance or quality requirements around specific models
Organizational lock-in: the Expert Registry and shared memory layer create significant switching costs once a team has invested in building custom experts
No evidence on long-term quality: the self-improving expert model is compelling in theory; post-GA production data doesn't exist yet
Solo and small team market is entirely excluded - Cosmos isn't designed for fewer than five developers, and it doesn't pretend to be

Verdict

Augment Cosmos is the most architecturally serious attempt to solve a real problem in AI-assisted engineering: individual productivity gains that don't translate to organizational improvement. The Context Engine, Expert Registry, and Prism routing are each meaningful capabilities, and together they build a platform that's qualitatively different from any coding assistant on the market today.

The price reflects that ambition. At $200 per developer per month, Cosmos is a procurement decision, not an individual tool purchase. Teams that can justify the cost - large engineering organizations with compliance requirements, high PR volumes, and complex multi-repo codebases - have a truly compelling case to assess it. Teams that can't probably aren't the intended audience, and Augment seems comfortable with that tradeoff.

Public preview means the platform isn't ready for teams who need stability before committing. Watch for the GA release; that's when the organizational productivity claims will have real data behind them. For now, Cosmos scores 7.5/10: the vision and engineering are there, but the price and preview status mean most teams should wait.

Augment Cosmos Review: Building the Agent OS