DeepSeek V4 Hits Frontier Benchmarks at One Tenth the Price

DeepSeek dropped two models on Thursday that should make OpenAI and Anthropic's finance teams uncomfortable. V4-Pro charges $3.48 per million output tokens. OpenAI charges $30. Anthropic charges $25. The performance gap, on at least one widely-used benchmark, is 0.2 points.

TL;DR

DeepSeek releases V4-Pro (1.6T parameters) and V4-Flash (284B parameters) on April 24, 2026, both with 1M token context and MIT licensing
V4-Pro prices at $3.48/M output tokens vs. $30 for OpenAI and $25 for Anthropic - an 88% cost discount for near-identical coding benchmark performance
V4-Flash at $0.28/M output tokens is the cheapest model in its tier, undercutting even OpenAI's lightest offerings
Both models were trained on Huawei Ascend 950 chips (manufactured by SMIC), a direct challenge to the logic behind US export controls
SMIC shares jumped 10% on the news; Chinese AI competitors MiniMax and Knowledge Atlas fell 9%

The Benchmark Picture

The numbers that landed on Thursday tell a story US labs have spent two years trying to avoid hearing.

Model	SWE-bench Verified	Output ($/M tokens)	Context	Open source
DeepSeek V4-Pro	80.6%	$3.48	1M	Yes (MIT)
Claude Opus 4.6	80.8%	$25.00	200K	No
GPT-5.4	~83%	$30.00	128K	No
Gemini 3.1-Pro	~84%	$18.00	2M	No
DeepSeek V4-Flash	n/a	$0.28	1M	Yes (MIT)
GPT-5.2	~74%	$15.00	128K	No

DeepSeek's own technical report is candid about where V4-Pro falls short: it "trails state-of-the-art frontier models by approximately 3 to 6 months" on world knowledge tasks, and "falls marginally short of GPT-5.4 and Gemini-3.1-Pro" on the highest-order reasoning tests. But on coding - where enterprise demand is concentrated - Claude Opus 4.6 holds a 0.2-point edge on SWE-bench Verified over a model that costs 86% less. That spread is commercially indefensible for any buyer running cost-sensitive workloads.

The Flash tier is starker. At $0.14 input and $0.28 output per million tokens, V4-Flash undercuts every comparable model in the market. DeepSeek's own documentation describes it as offering "similar reasoning to Pro but with faster response times" - a positioning that puts it directly against GPT-5.5 Nano and Gemini 3.1 Flash at a fraction of the listed price.

The Two Models

V4-Pro: A Trillion-Parameter Bet on Price

V4-Pro carries 1.6 trillion total parameters with 49 billion active at inference - a significant step up from DeepSeek V3.2's 685 billion total. The mixture-of-experts architecture means actual compute cost per query remains manageable despite the headline parameter count.

Efficiency gains are the architectural story here. Compared to V3.2 running over a one-million-token context, V4-Pro uses only 27% of single-token FLOPs and 10% of the KV cache footprint. That is not a rounding error. It reflects a deliberate architectural rebuild aimed at reducing per-query costs while expanding context length - the operational profile of an enterprise API product, not a research demo.

Both models carry MIT licenses. The weights are available today.

A smartphone on a desk representing mobile AI application access DeepSeek's V4 models are available via API and open-source weights - no closed license required. Source: pexels.com

V4-Flash: Cheap and Fast

V4-Flash at 284 billion total and 13 billion active parameters is engineered for latency-sensitive applications. At $0.28/M output tokens, it costs less than Kimi K2.5 and less than any Flash-tier model currently available from a US lab.

DeepSeek describes V4-Flash as reaching only 10% of V3.2's single-token FLOPs and 7% of the KV cache at 1M context - extraordinary efficiency numbers that, if they hold under independent evaluation, suggest the inference cost curve is moving faster than most analysts expected.

The Huawei Variable

This is where the geopolitical story gets uncomfortable for Washington. DeepSeek confirmed V4 was trained on Huawei's Ascend 950 processors, manufactured by SMIC using domestic Chinese fab technology. Huawei announced "full support" for the V4 series, framing the launch as validation of the Ascend ecosystem.

US export controls on Nvidia H100, H200, and Blackwell chips were designed exactly to slow this trajectory. The controls accelerated it instead. Facing restricted access to leading-edge silicon, Chinese developers were forced to optimize for whatever hardware they could access. The result is a model stack that hits frontier-class coding performance at a fraction of the inference cost, running on domestically manufactured chips.

SMIC shares jumped 10% on the news. The market read was clear: V4's release is evidence that the Chinese semiconductor supply chain is producing commercially viable AI infrastructure.

"China has effectively closed the performance gap with US rivals," according to Stanford's AI Index 2026, published this month. "America still produces more top-tier models and higher-impact patents, but the margin has compressed materially."

DeepSeek also disclosed that it is in discussions with Tencent and Alibaba about a funding round that would value the company at $20 billion. The timing isn't incidental - a landmark model release creates negotiating leverage.

Stock market trading screens showing financial data and indices SMIC shares jumped 10% on the day DeepSeek V4 launched, reflecting market confidence in China's domestic chip ecosystem. Source: pexels.com

The Counter-Argument

V4's limitations are real and worth pricing in.

No multimodal. Both V4-Pro and V4-Flash are text-only at launch. OpenAI's and Anthropic's premium tiers handle images, documents, audio, and video. For enterprise buyers who have built workflows around multimodal capabilities, V4 is not yet a drop-in replacement.

Knowledge benchmarks lag. DeepSeek's own disclosure is that V4-Pro trails GPT-5.4 and Gemini 3.1-Pro on factual world-knowledge tasks - the kind of retrieval and synthesis that powers legal, medical, and financial workloads. The 3-to-6-month estimate is the company's own framing and should be read as optimistic.

The distillation allegations remain unresolved. US officials have publicly accused DeepSeek of illegally distilling American frontier models - essentially using outputs from closed systems to train open ones. China's foreign ministry called the accusations "groundless." The legal and reputational questions aren't settled, and enterprise procurement teams in regulated industries will have to factor them in.

API availability is constrained. DeepSeek's direct API has historically struggled with demand surges. Third-party hosting via OpenRouter and other aggregators distributes load but introduces latency and pricing variability.

What the Market Is Missing

The framing of this release as a "Chinese model catching up to US labs" understates the structural pressure it creates. This isn't about whether V4-Pro can replace GPT-5.4 for every use case today. It cannot. The question is what the existence of a MIT-licensed, 80%-SWE-bench model at $3.48/M output tokens does to the pricing expectations of every enterprise buyer negotiating with OpenAI and Anthropic right now.

The answer is: it sets a floor. When a credible open-source alternative exists at 88% below list price for the same benchmark category, the premium for closed models has to be justified by capability gaps that users can actually measure. On coding - the category where AI ROI is clearest and most quantifiable - V4-Pro has made that justification materially harder.

GPT-5.5 launched the day before V4's preview. The timing, almost certainly deliberate, didn't blunt the story. DeepSeek's release landed in the same 24-hour news cycle and moved SMIC's stock by ten percentage points. The market is telling you what it thinks about who won the week.

Sources: