LongCat-2.0

Meituan's 1.6T-parameter open-source MoE coding model, trained end-to-end on 50,000 domestic Chinese ASICs, with native 1M token context and a 59.5 SWE-bench Pro score.

LongCat-2.0

TL;DR

  • Best-in-class SWE-bench Pro at 59.5, edging GPT-5.5 (58.6) and Gemini 3.1 Pro (54.2), though still behind Claude Opus 4.7/4.8 on broader agent tasks
  • 1.6T total parameters, ~48B active per token, native 1M context via LongCat Sparse Attention - MIT license, open weights pending
  • First trillion-parameter model trained and launched completely on domestic Chinese ASICs; ran as anonymous "Owl Alpha" on OpenRouter for two months before the reveal

Overview

LongCat-2.0 is Meituan's open-source coding model, released June 30, 2026. It's a 1.6-trillion-parameter Mixture-of-Experts system that activates roughly 48 billion parameters per token, with the active count swinging between 33B and 56B depending on query complexity. The context window is a native 1 million tokens, sustained through a sparse attention mechanism the team calls LongCat Sparse Attention (LSA) that keeps complexity linear rather than quadratic. License is MIT - permissive for commercial use.

What makes this release unusual is the backstory and the hardware. For two months before its public reveal, the model ran anonymously on OpenRouter under the alias "Owl Alpha," building up approximately 10.1 trillion monthly tokens and reaching first place on Hermes Agent workspace, second on Claude Code, and third across OpenClaw deployments by call volume. Meituan disclosed that ranking after the fact as evidence the model holds up under real developer load without the benefit of a marketing campaign. Training happened on a 50,000-card cluster of domestic Chinese ASICs with no NVIDIA hardware anywhere in the stack - the first time a trillion-parameter model has been trained and served end-to-end on domestic compute. That's a meaningful milestone in its own right regardless of benchmark position.

Competitively, LongCat-2.0 sits at the boundary between near-frontier and frontier. It clears GPT-5.5 on SWE-bench Pro by 0.9 points and beats Gemini 3.1 Pro by a wider margin. It trails Claude Opus 4.8 on broader general-agent benchmarks including FORTE and BrowseComp. For teams whose primary workload is long-context coding or agentic software engineering, the price-to-performance ratio is better than almost anything else available via API today.

Key Specifications

SpecificationDetails
ProviderMeituan
Model FamilyLongCat
ArchitectureMixture-of-Experts with LongCat Sparse Attention
Total Parameters1.6T
Active Parameters~48B per token (33-56B dynamic range)
N-gram Embeddings135B additional parameters for 5-gram token combinations
Context Window1M tokens (native)
Training Data30T+ tokens (code, Chinese, English, multilingual)
Training Compute50,000 domestic Chinese ASICs
Input Price (standard)$0.75 per million tokens
Output Price (standard)$2.95 per million tokens
Input Price (promo)$0.30 per million tokens
Output Price (promo)$1.20 per million tokens
Cached context readsFree
Release DateJune 30, 2026
LicenseMIT

Benchmark Performance

All scores below are vendor-reported from Meituan's internal evaluation suite. Independent reproduction hasn't landed yet at time of writing.

BenchmarkLongCat-2.0GPT-5.5Claude Opus 4.6Gemini 3.1 Pro
SWE-bench Pro59.558.6n/a54.2
SWE-bench Multilingual77.3n/an/an/a
Terminal-Bench 2.170.8n/an/an/a
FORTE73.277.873.2n/a
BrowseComp79.9n/an/an/a
RWSearch78.8n/an/an/a

The SWE-bench Pro lead over GPT-5.5 is 0.9 points - inside evaluation noise at this scale, so "narrowly ahead" is the right read, not "clearly superior." FORTE at 73.2 ties Claude Opus 4.6 but trails GPT-5.5 (77.8), which confirms the model's sweet spot is coding-specific tasks rather than general workflow simulation. BrowseComp at 79.9 and RWSearch at 78.8 are strong for an open-weight model, though the agentic AI benchmarks leaderboard tracks Claude Opus 4.8 scores on the same benchmarks that sit above this range.

The 59.5 on SWE-bench Pro is the headline number to watch for independent verification. See the SWE-bench coding agent leaderboard for ongoing ranked scores as labs submit independent results.

Competitor Pricing Context

ModelInputOutputSWE-bench Pro
LongCat-2.0 (standard)$0.75/M$2.95/M59.5
LongCat-2.0 (promo)$0.30/M$1.20/M59.5
GPT-5.5$5.00/M$30.00/M58.6
Claude Sonnet 5$2.00/M$10.00/Mn/a

Even at standard pricing, LongCat-2.0 is 6-7x cheaper per token than GPT-5.5 on input and 10x cheaper on output. The zero-cost cache reads make it especially attractive for long-context workflows where cached tokens dominate the bill.

Key Capabilities

Long-context coding. The 1M token native window is the core engineering claim, enabled by LongCat Sparse Attention. Standard transformer attention scales quadratically with context length; LSA selects only the most relevant tokens to attend to, dropping the scaling to linear. This isn't a sliding-window approximation - Meituan claims full 1M token access across all layers. For codebases measured in millions of tokens, that's the difference between summarization hacks and actual whole-repo comprehension.

Zero-computation experts. The ScMoE component routes simple tokens through minimal subnetworks while complex queries engage more expert capacity. The result is a dynamic per-token compute budget rather than fixed active parameters, which is what produces the 33-56B active parameter range. Meituan reports 1.5x MFU improvement through this mechanism versus their earlier models.

MOPD expert integration. Post-training splits across three expert clusters: Agent Experts (tool use and self-correction), Reasoning Experts (multi-hop logic and adaptive compute), and Interaction Experts (instruction following and hallucination reduction). These are distilled together via Multi-Teacher On-Policy Distillation rather than fine-tuned sequentially. The practical outcome is a single model that doesn't degrade on instruction following when pushed through agentic tool-call chains - a common failure mode in models optimized only for coding.

The Owl Alpha blind trial. The two-month anonymous period on OpenRouter is the most useful real-world signal available. Developers chose the model on quality alone with no brand recognition attached, driving it to top-3 by call volume across Hermes Agent workspace, Claude Code, and OpenClaw. That's harder to fake than a benchmark table. The open source LLM leaderboard will track it against GLM-5.1 and DeepSeek V4 as independent evaluations come in.

Pricing and Availability

LongCat-2.0 is accessible through three channels: the native platform at longcat.ai, OpenRouter (where it already ran as Owl Alpha), and the LongCat API at longcat.chat. Weights are listed as "coming soon" on Hugging Face and GitHub - the model is currently API-only despite the MIT license announcement.

Standard pricing is $0.75 per million input tokens and $2.95 per million output tokens, with cached context reads free. The launch promotion brings that to $0.30/$1.20 through an unspecified window. Cache hit pricing at zero is a significant advantage for long-context sessions where the same file tree or codebase gets passed repeatedly.

Flash-sale token packs release four times daily at Beijing time 10:00, 16:00, 21:00, and 23:00 - a somewhat unusual distribution mechanism for an API model, presumably tied to compute availability on the domestic ASIC cluster.

No enterprise pricing tier has been announced. Rate limits aren't publicly documented beyond the flash-sale structure.

Strengths and Weaknesses

Strengths

  • SWE-bench Pro leader at 59.5 among verified models, ahead of GPT-5.5 and Gemini 3.1 Pro
  • Native 1M token context with linear-complexity attention, not windowed approximation
  • Zero-cost cached context reads meaningfully reduce cost for long-context agentic loops
  • Proven real-world adoption via Owl Alpha blind trial on OpenRouter
  • MIT license permits commercial self-hosting once weights publish
  • Pricing undercuts GPT-5.5 by 6-10x at standard rates
  • First frontier-class model trained end-to-end on domestic Chinese compute

Weaknesses

  • Weights aren't published yet; MIT license means nothing without the actual files
  • All benchmark numbers are vendor-reported; independent third-party confirmation pending
  • SWE-bench Pro edge over GPT-5.5 is 0.9 points, inside noise margin
  • Trails Claude Opus 4.8 and GPT-5.5 on FORTE (73.2 vs 77.8)
  • Flash-sale token pack structure suggests limited compute capacity at launch
  • No public rate limits or enterprise SLA documentation
  • Self-hosting a 1.6T MoE still requires substantial multi-GPU infrastructure even with dynamic activation

Sources

✓ Last verified July 3, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.