Name: DeepSeek V4
Author: DeepSeek

TL;DR

~1 trillion total parameters with ~32B active per token - a 50% increase in total model size over V3.2 while reducing active parameters from 37B to ~32B
Natively multimodal (text, image, video, audio) - a first for DeepSeek's flagship line
1 million token context window powered by Engram Conditional Memory
Optimized for Huawei Ascend and Cambricon chips, not Nvidia or AMD
Expected open-weight release (MIT or Apache 2.0), with leaked benchmarks suggesting frontier-competitive performance

Note: DeepSeek V4 has not been officially released. All specifications below are based on reporting from the Financial Times, Reuters, CNBC, leaked internal benchmarks, and the V4 Lite leak. This page will be updated with verified data once DeepSeek publishes official specifications.

Overview

DeepSeek V4 is the most anticipated model release of 2026 so far. Confirmed by the Financial Times on February 27, V4 will arrive in the first week of March - timed to coincide with China's annual Two Sessions parliamentary meetings starting March 4. Based on the available reporting, V4 represents a full generational leap from V3.2: a trillion-parameter Mixture-of-Experts model that is natively multimodal, processes up to 1 million tokens of context, and was optimized from the ground up for Chinese hardware rather than Nvidia GPUs.

The architecture builds on V3.2's foundation - Multi-head Latent Attention (MLA) is retained - but adds three new innovations previewed in DeepSeek's January 2026 research papers: Manifold-Constrained Hyper-Connections for training stability at trillion-parameter scale, Engram Conditional Memory for efficient retrieval from million-token contexts, and an enhanced DeepSeek Sparse Attention system with a new Lightning Indexer. The expert routing system scales from V3.2's top-2/top-4 selection to 16 expert pathways per token, drawn from hundreds of available experts per MoE layer.

The hardware story is the geopolitical headline. As we reported, DeepSeek deliberately excluded Nvidia and AMD from the pre-release optimization pipeline, building V4's inference stack around Huawei Ascend and Cambricon chips. This means V4 will run best on Chinese hardware at launch - a first for a frontier AI model and exactly the kind of parallel ecosystem that US export controls aimed to prevent.

Leaked benchmarks suggest V4 is competitive with the current frontier. HumanEval scores around 90% and SWE-bench Verified above 80% would put it in the same tier as Claude Opus 4.6 and GPT-5.3 Codex on coding tasks. The V4 Lite variant that leaked through inference providers under NDA showed breakthrough SVG code generation, with one tester describing it as producing "more optimized code than DeepSeek 3.2, Claude Opus 4.6, and Gemini 3.1." These claims remain unverified.

Key Specifications

Specification	Details
Provider	DeepSeek
Model Family	DeepSeek V4
Architecture	Transformer MoE with MLA, mHC, Engram Memory, DSA + Lightning Indexer
Total Parameters	~1T (leaked)
Active Parameters	~32B per token (leaked)
Expert Routing	16 experts active per token (up from V3.2's top-2/top-4)
Context Window	1,000,000 tokens
Input Price	~$0.14/M tokens (estimated)
Output Price	~$0.28/M tokens (estimated)
Release Date	Expected March 3-7, 2026
License	Expected MIT or Apache 2.0
Input Modalities	Text, Image, Video, Audio (native)
Output Modality	Text
Hardware Optimization	Huawei Ascend, Cambricon (primary); Nvidia (secondary/later)
Model ID	TBD

Benchmark Performance (Leaked/Estimated)

Benchmark	DeepSeek V4 (leaked)	DeepSeek V3.2	Claude Opus 4.6	GPT-5.3 Codex
HumanEval (code generation)	~90%	-	88%	93%
SWE-bench Verified (GitHub issues)	80%+	73.1%	80.8%	80.0%
MMLU-Pro (knowledge/reasoning)	TBD	85.0	85.8	82.9
GPQA Diamond (PhD-level science)	TBD	82.4	91.3	73.8
AIME 2025 (competition math)	TBD	93.1	87.2	94%
Codeforces (competitive programming)	TBD	2386	2100	-
BrowseComp (web research)	TBD	51.4-67.6	84.0	77.9

These leaked numbers come from internal testing and code repository analysis, not independent verification. DeepSeek plans to publish a technical note at launch with a comprehensive engineering report following approximately one month later. Until those reports are public, treat all V4 benchmark claims as unverified.

The two data points we have - HumanEval ~90% and SWE-bench Verified 80%+ - suggest V4 closes the coding gap that was V3.2's main weakness against the proprietary frontier. V3.2 scored 73.1% on SWE-bench Verified, trailing Claude Opus 4.6 by nearly 8 points. If V4 genuinely hits 80%+, that gap disappears.

Key Capabilities

Native Multimodality. V4 is described as multimodal from the ground up - trained on text, image, video, and audio data from the start rather than bolting vision and audio onto a text-only base model. This is a fundamental architectural departure from DeepSeek's previous approach, where multimodal capabilities were handled by separate models like DeepSeek-VL. How V4's native multimodal compares to Kimi K2.5's MoonViT-3D or Gemini 3.1 Pro's native multimodal pipeline remains to be seen.

Manifold-Constrained Hyper-Connections (mHC). Training stability at the trillion-parameter scale is notoriously difficult. DeepSeek's mHC system, detailed in a January 2026 paper, provides the theoretical framework for stable optimization across the full parameter space. This solves a problem that has historically required expensive trial-and-error on training runs at this scale.

Engram Conditional Memory. The 1M-token context window is not just a number - DeepSeek published a paper on January 13 describing Engram Conditional Memory, a system for efficient retrieval from extremely long contexts. DeepSeek silently expanded the context window on existing API models from 128K to 1M on February 11, which was widely interpreted as infrastructure preparation for V4.

DeepSeek Sparse Attention with Lightning Indexer. Building on V3.2's DSA, the Lightning Indexer adds a fast preprocessing step for million-token context processing. This was first previewed in V3.2-Exp and appears to be fully productionized in V4.

Chinese Hardware Optimization. V4's inference stack is built around Huawei Ascend and Cambricon chips. For the open-source community running V4 on Nvidia GPUs, performance may be suboptimal at launch. This is an unprecedented hardware bet for a frontier model - DeepSeek is building an inference ecosystem independent of American silicon.

Pricing and Availability

No official pricing has been announced. One analysis estimates input tokens at ~$0.14 per million and output tokens at ~$0.28 per million. For context, V3.2 charges $0.28 input (cache miss) and $0.42 output - so V4 could be roughly 50% cheaper on input and 33% cheaper on output despite being a significantly larger and more capable model.

Model	Input	Output
DeepSeek V4 (estimated)	~$0.14/M	~$0.28/M
DeepSeek V3.2	$0.28/M	$0.42/M
Kimi K2.5	$0.60/M	$3.00/M
Gemini 3.1 Pro	$2.00/M	$12.00/M
Claude Opus 4.6	$5.00/M	$25.00/M

If these estimates hold, V4 would be 36x cheaper than Claude Opus 4.6 on input and 89x cheaper on output - while potentially matching it on coding benchmarks. The economics of that gap are difficult to overstate for production deployments. See our cost efficiency leaderboard for broader comparisons.

DeepSeek has open-sourced every major model release under permissive licenses. V3.2 used MIT. Multiple sources expect V4 to follow the same pattern, with some suggesting a move to Apache 2.0. Either way, model weights are expected to be publicly available.

Strengths (Expected)

Trillion-parameter scale with only ~32B active parameters - efficient MoE inference economics
Native multimodal eliminates the need for separate vision/audio models
1M-token context window with purpose-built retrieval (Engram Conditional Memory)
Leaked SWE-bench 80%+ would close V3.2's biggest gap against the proprietary frontier
Expected to be the cheapest frontier-class model, continuing DeepSeek's cost leadership
Open-weight release makes it the most capable freely available model if performance claims hold
Three new architectural innovations (mHC, Engram Memory, Lightning Indexer) that advance the state of MoE design

Weaknesses (Expected/Potential)

Huawei Ascend optimization means Nvidia GPU users may see suboptimal performance at launch
All benchmark claims are unverified - leaked numbers are not independently confirmed
Agentic tool-use performance (V3.2's weakest area) has no leaked data
Trillion-parameter model requires massive GPU infrastructure to self-host
No official pricing, timeline, or model card - everything could change at launch
Chinese company origin may pose compliance concerns for some enterprise deployments
V3.2's weakness on BrowseComp (51.4-67.6%) and agentic tasks may persist if the architecture improvements focus on reasoning and coding

DeepSeek V4 Drops Next Week - Our breaking news coverage of the confirmed release timeline
DeepSeek V4 Lite Leaked - The smaller V4 variant that leaked through inference providers
DeepSeek Locks US Chipmakers Out of V4 - Nvidia and AMD exclusion from the pre-release pipeline
DeepSeek V3.2 Model Profile - V4's predecessor and current production model
DeepSeek V3.2 Review - Our hands-on review of the current generation
Open Source LLM Leaderboard - Current rankings for open-weight models