Name: LongCat-2.0
Author: Meituan

TL;DR

Best-in-class SWE-bench Pro at 59.5, edging GPT-5.5 (58.6) and Gemini 3.1 Pro (54.2), though still behind Claude Opus 4.7/4.8 on broader agent tasks
1.6T total parameters, ~48B active per token, native 1M context via LongCat Sparse Attention - MIT license, open weights pending
First trillion-parameter model trained and launched completely on domestic Chinese ASICs; ran as anonymous "Owl Alpha" on OpenRouter for two months before the reveal

Overview

LongCat-2.0 is Meituan's open-source coding model, released June 30, 2026. It's a 1.6-trillion-parameter Mixture-of-Experts system that activates roughly 48 billion parameters per token, with the active count swinging between 33B and 56B depending on query complexity. The context window is a native 1 million tokens, sustained through a sparse attention mechanism the team calls LongCat Sparse Attention (LSA) that keeps complexity linear rather than quadratic. License is MIT - permissive for commercial use.

What makes this release unusual is the backstory and the hardware. For two months before its public reveal, the model ran anonymously on OpenRouter under the alias "Owl Alpha," building up approximately 10.1 trillion monthly tokens and reaching first place on Hermes Agent workspace, second on Claude Code, and third across OpenClaw deployments by call volume. Meituan disclosed that ranking after the fact as evidence the model holds up under real developer load without the benefit of a marketing campaign. Training happened on a 50,000-card cluster of domestic Chinese ASICs with no NVIDIA hardware anywhere in the stack - the first time a trillion-parameter model has been trained and served end-to-end on domestic compute. That's a meaningful milestone in its own right regardless of benchmark position.

Competitively, LongCat-2.0 sits at the boundary between near-frontier and frontier. It clears GPT-5.5 on SWE-bench Pro by 0.9 points and beats Gemini 3.1 Pro by a wider margin. It trails Claude Opus 4.8 on broader general-agent benchmarks including FORTE and BrowseComp. For teams whose primary workload is long-context coding or agentic software engineering, the price-to-performance ratio is better than almost anything else available via API today.

Key Specifications

Specification	Details
Provider	Meituan
Model Family	LongCat
Architecture	Mixture-of-Experts with LongCat Sparse Attention
Total Parameters	1.6T
Active Parameters	~48B per token (33-56B dynamic range)
N-gram Embeddings	135B additional parameters for 5-gram token combinations
Context Window	1M tokens (native)
Training Data	30T+ tokens (code, Chinese, English, multilingual)
Training Compute	50,000 domestic Chinese ASICs
Input Price (standard)	$0.75 per million tokens
Output Price (standard)	$2.95 per million tokens
Input Price (promo)	$0.30 per million tokens
Output Price (promo)	$1.20 per million tokens
Cached context reads	Free
Release Date	June 30, 2026
License	MIT

Benchmark Performance

All scores below are vendor-reported from Meituan's internal evaluation suite. Independent reproduction hasn't landed yet at time of writing.

Benchmark	LongCat-2.0	GPT-5.5	Claude Opus 4.6	Gemini 3.1 Pro
SWE-bench Pro	59.5	58.6	n/a	54.2
SWE-bench Multilingual	77.3	n/a	n/a	n/a
Terminal-Bench 2.1	70.8	n/a	n/a	n/a
FORTE	73.2	77.8	73.2	n/a
BrowseComp	79.9	n/a	n/a	n/a
RWSearch	78.8	n/a	n/a	n/a

The SWE-bench Pro lead over GPT-5.5 is 0.9 points - inside evaluation noise at this scale, so "narrowly ahead" is the right read, not "clearly superior." FORTE at 73.2 ties Claude Opus 4.6 but trails GPT-5.5 (77.8), which confirms the model's sweet spot is coding-specific tasks rather than general workflow simulation. BrowseComp at 79.9 and RWSearch at 78.8 are strong for an open-weight model, though the agentic AI benchmarks leaderboard tracks Claude Opus 4.8 scores on the same benchmarks that sit above this range.

The 59.5 on SWE-bench Pro is the headline number to watch for independent verification. See the SWE-bench coding agent leaderboard for ongoing ranked scores as labs submit independent results.

Competitor Pricing Context

Model	Input	Output	SWE-bench Pro
LongCat-2.0 (standard)	$0.75/M	$2.95/M	59.5
LongCat-2.0 (promo)	$0.30/M	$1.20/M	59.5
GPT-5.5	$5.00/M	$30.00/M	58.6
Claude Sonnet 5	$2.00/M	$10.00/M	n/a

Even at standard pricing, LongCat-2.0 is 6-7x cheaper per token than GPT-5.5 on input and 10x cheaper on output. The zero-cost cache reads make it especially attractive for long-context workflows where cached tokens dominate the bill.

Key Capabilities

Long-context coding. The 1M token native window is the core engineering claim, enabled by LongCat Sparse Attention. Standard transformer attention scales quadratically with context length; LSA selects only the most relevant tokens to attend to, dropping the scaling to linear. This isn't a sliding-window approximation - Meituan claims full 1M token access across all layers. For codebases measured in millions of tokens, that's the difference between summarization hacks and actual whole-repo comprehension.

Zero-computation experts. The ScMoE component routes simple tokens through minimal subnetworks while complex queries engage more expert capacity. The result is a dynamic per-token compute budget rather than fixed active parameters, which is what produces the 33-56B active parameter range. Meituan reports 1.5x MFU improvement through this mechanism versus their earlier models.

MOPD expert integration. Post-training splits across three expert clusters: Agent Experts (tool use and self-correction), Reasoning Experts (multi-hop logic and adaptive compute), and Interaction Experts (instruction following and hallucination reduction). These are distilled together via Multi-Teacher On-Policy Distillation rather than fine-tuned sequentially. The practical outcome is a single model that doesn't degrade on instruction following when pushed through agentic tool-call chains - a common failure mode in models optimized only for coding.

The Owl Alpha blind trial. The two-month anonymous period on OpenRouter is the most useful real-world signal available. Developers chose the model on quality alone with no brand recognition attached, driving it to top-3 by call volume across Hermes Agent workspace, Claude Code, and OpenClaw. That's harder to fake than a benchmark table. The open source LLM leaderboard will track it against GLM-5.1 and DeepSeek V4 as independent evaluations come in.

Pricing and Availability

LongCat-2.0 is accessible through three channels: the native platform at longcat.ai, OpenRouter (where it already ran as Owl Alpha), and the LongCat API at longcat.chat. Weights are listed as "coming soon" on Hugging Face and GitHub - the model is currently API-only despite the MIT license announcement.

Standard pricing is $0.75 per million input tokens and $2.95 per million output tokens, with cached context reads free. The launch promotion brings that to $0.30/$1.20 through an unspecified window. Cache hit pricing at zero is a significant advantage for long-context sessions where the same file tree or codebase gets passed repeatedly.

Flash-sale token packs release four times daily at Beijing time 10:00, 16:00, 21:00, and 23:00 - a somewhat unusual distribution mechanism for an API model, presumably tied to compute availability on the domestic ASIC cluster.

No enterprise pricing tier has been announced. Rate limits aren't publicly documented beyond the flash-sale structure.

Strengths and Weaknesses

Strengths

SWE-bench Pro leader at 59.5 among verified models, ahead of GPT-5.5 and Gemini 3.1 Pro
Native 1M token context with linear-complexity attention, not windowed approximation
Zero-cost cached context reads meaningfully reduce cost for long-context agentic loops
Proven real-world adoption via Owl Alpha blind trial on OpenRouter
MIT license permits commercial self-hosting once weights publish
Pricing undercuts GPT-5.5 by 6-10x at standard rates
First frontier-class model trained end-to-end on domestic Chinese compute

Weaknesses

Weights aren't published yet; MIT license means nothing without the actual files
All benchmark numbers are vendor-reported; independent third-party confirmation pending
SWE-bench Pro edge over GPT-5.5 is 0.9 points, inside noise margin
Trails Claude Opus 4.8 and GPT-5.5 on FORTE (73.2 vs 77.8)
Flash-sale token pack structure suggests limited compute capacity at launch
No public rate limits or enterprise SLA documentation
Self-hosting a 1.6T MoE still requires substantial multi-GPU infrastructure even with dynamic activation

SWE-bench coding agent leaderboard - Ranked scores across frontier and open models
Agentic AI benchmarks leaderboard - FORTE, BrowseComp, RWSearch rankings
Open source LLM leaderboard - Open-weight model rankings
Long context benchmarks leaderboard - 1M token capability comparisons
GLM-5.1 model profile - Closest Chinese open-weight competitor
GLM-5.1 trained on Huawei chips - Parallel story on domestic Chinese compute
China AI chip subsidies and self-sufficiency - Policy context behind domestic ASIC investment

Sources

LongCat-2.0 official model page - Architecture specs, benchmark tables, training details
LongCat-2.0 release announcement - Official release post
Novalogiq: Meituan open sources LongCat-2.0 - Technical breakdown with benchmark tables
Decrypt: LongCat-2.0, the stealth AI model that was quietly topping OpenRouter - Owl Alpha backstory and OpenRouter ranking detail
felloai: LongCat-2.0 China's 1.6T open-source coding model - Architecture and pricing comparison
Meituan LongCat on Hugging Face - Organization page (weights pending publication)
CryptoBriefing: Meituan reveals LongCat-2.0, undercuts GPT-5.5 and Claude Sonnet 5 on pricing - Pricing comparison
AI-Market-Watch: Meituan open sources LongCat-2.0 - Domestic compute angle