Models

DeepSeek V4

DeepSeek V4 is an unreleased trillion-parameter MoE model with ~32B active parameters, native multimodal capabilities, a 1M-token context window, and optimization for Huawei Ascend chips - expected in the first week of March 2026.

DeepSeek V4

TL;DR

  • ~1 trillion total parameters with ~32B active per token - a 50% increase in total model size over V3.2 while reducing active parameters from 37B to ~32B
  • Natively multimodal (text, image, video, audio) - a first for DeepSeek's flagship line
  • 1 million token context window powered by Engram Conditional Memory
  • Optimized for Huawei Ascend and Cambricon chips, not Nvidia or AMD
  • Expected open-weight release (MIT or Apache 2.0), with leaked benchmarks suggesting frontier-competitive performance

Note: DeepSeek V4 has not been officially released. All specifications below are based on reporting from the Financial Times, Reuters, CNBC, leaked internal benchmarks, and the V4 Lite leak. This page will be updated with verified data once DeepSeek publishes official specifications.

Overview

DeepSeek V4 is the most anticipated model release of 2026 so far. Confirmed by the Financial Times on February 27, V4 will arrive in the first week of March - timed to coincide with China's annual Two Sessions parliamentary meetings starting March 4. Based on the available reporting, V4 represents a full generational leap from V3.2: a trillion-parameter Mixture-of-Experts model that is natively multimodal, processes up to 1 million tokens of context, and was optimized from the ground up for Chinese hardware rather than Nvidia GPUs.

The architecture builds on V3.2's foundation - Multi-head Latent Attention (MLA) is retained - but adds three new innovations previewed in DeepSeek's January 2026 research papers: Manifold-Constrained Hyper-Connections for training stability at trillion-parameter scale, Engram Conditional Memory for efficient retrieval from million-token contexts, and an enhanced DeepSeek Sparse Attention system with a new Lightning Indexer. The expert routing system scales from V3.2's top-2/top-4 selection to 16 expert pathways per token, drawn from hundreds of available experts per MoE layer.

The hardware story is the geopolitical headline. As we reported, DeepSeek deliberately excluded Nvidia and AMD from the pre-release optimization pipeline, building V4's inference stack around Huawei Ascend and Cambricon chips. This means V4 will run best on Chinese hardware at launch - a first for a frontier AI model and exactly the kind of parallel ecosystem that US export controls aimed to prevent.

Leaked benchmarks suggest V4 is competitive with the current frontier. HumanEval scores around 90% and SWE-bench Verified above 80% would put it in the same tier as Claude Opus 4.6 and GPT-5.3 Codex on coding tasks. The V4 Lite variant that leaked through inference providers under NDA showed breakthrough SVG code generation, with one tester describing it as producing "more optimized code than DeepSeek 3.2, Claude Opus 4.6, and Gemini 3.1." These claims remain unverified.

Key Specifications

SpecificationDetails
ProviderDeepSeek
Model FamilyDeepSeek V4
ArchitectureTransformer MoE with MLA, mHC, Engram Memory, DSA + Lightning Indexer
Total Parameters~1T (leaked)
Active Parameters~32B per token (leaked)
Expert Routing16 experts active per token (up from V3.2's top-2/top-4)
Context Window1,000,000 tokens
Input Price~$0.14/M tokens (estimated)
Output Price~$0.28/M tokens (estimated)
Release DateExpected March 3-7, 2026
LicenseExpected MIT or Apache 2.0
Input ModalitiesText, Image, Video, Audio (native)
Output ModalityText
Hardware OptimizationHuawei Ascend, Cambricon (primary); Nvidia (secondary/later)
Model IDTBD

Benchmark Performance (Leaked/Estimated)

BenchmarkDeepSeek V4 (leaked)DeepSeek V3.2Claude Opus 4.6GPT-5.3 Codex
HumanEval (code generation)~90%-88%93%
SWE-bench Verified (GitHub issues)80%+73.1%80.8%80.0%
MMLU-Pro (knowledge/reasoning)TBD85.085.882.9
GPQA Diamond (PhD-level science)TBD82.491.373.8
AIME 2025 (competition math)TBD93.187.294%
Codeforces (competitive programming)TBD23862100-
BrowseComp (web research)TBD51.4-67.684.077.9

These leaked numbers come from internal testing and code repository analysis, not independent verification. DeepSeek plans to publish a technical note at launch with a comprehensive engineering report following approximately one month later. Until those reports are public, treat all V4 benchmark claims as unverified.

The two data points we have - HumanEval ~90% and SWE-bench Verified 80%+ - suggest V4 closes the coding gap that was V3.2's main weakness against the proprietary frontier. V3.2 scored 73.1% on SWE-bench Verified, trailing Claude Opus 4.6 by nearly 8 points. If V4 genuinely hits 80%+, that gap disappears.

Key Capabilities

Native Multimodality. V4 is described as multimodal from the ground up - trained on text, image, video, and audio data from the start rather than bolting vision and audio onto a text-only base model. This is a fundamental architectural departure from DeepSeek's previous approach, where multimodal capabilities were handled by separate models like DeepSeek-VL. How V4's native multimodal compares to Kimi K2.5's MoonViT-3D or Gemini 3.1 Pro's native multimodal pipeline remains to be seen.

Manifold-Constrained Hyper-Connections (mHC). Training stability at the trillion-parameter scale is notoriously difficult. DeepSeek's mHC system, detailed in a January 2026 paper, provides the theoretical framework for stable optimization across the full parameter space. This solves a problem that has historically required expensive trial-and-error on training runs at this scale.

Engram Conditional Memory. The 1M-token context window is not just a number - DeepSeek published a paper on January 13 describing Engram Conditional Memory, a system for efficient retrieval from extremely long contexts. DeepSeek silently expanded the context window on existing API models from 128K to 1M on February 11, which was widely interpreted as infrastructure preparation for V4.

DeepSeek Sparse Attention with Lightning Indexer. Building on V3.2's DSA, the Lightning Indexer adds a fast preprocessing step for million-token context processing. This was first previewed in V3.2-Exp and appears to be fully productionized in V4.

Chinese Hardware Optimization. V4's inference stack is built around Huawei Ascend and Cambricon chips. For the open-source community running V4 on Nvidia GPUs, performance may be suboptimal at launch. This is an unprecedented hardware bet for a frontier model - DeepSeek is building an inference ecosystem independent of American silicon.

Pricing and Availability

No official pricing has been announced. One analysis estimates input tokens at ~$0.14 per million and output tokens at ~$0.28 per million. For context, V3.2 charges $0.28 input (cache miss) and $0.42 output - so V4 could be roughly 50% cheaper on input and 33% cheaper on output despite being a significantly larger and more capable model.

ModelInputOutput
DeepSeek V4 (estimated)~$0.14/M~$0.28/M
DeepSeek V3.2$0.28/M$0.42/M
Kimi K2.5$0.60/M$3.00/M
Gemini 3.1 Pro$2.00/M$12.00/M
Claude Opus 4.6$5.00/M$25.00/M

If these estimates hold, V4 would be 36x cheaper than Claude Opus 4.6 on input and 89x cheaper on output - while potentially matching it on coding benchmarks. The economics of that gap are difficult to overstate for production deployments. See our cost efficiency leaderboard for broader comparisons.

DeepSeek has open-sourced every major model release under permissive licenses. V3.2 used MIT. Multiple sources expect V4 to follow the same pattern, with some suggesting a move to Apache 2.0. Either way, model weights are expected to be publicly available.

Strengths (Expected)

  • Trillion-parameter scale with only ~32B active parameters - efficient MoE inference economics
  • Native multimodal eliminates the need for separate vision/audio models
  • 1M-token context window with purpose-built retrieval (Engram Conditional Memory)
  • Leaked SWE-bench 80%+ would close V3.2's biggest gap against the proprietary frontier
  • Expected to be the cheapest frontier-class model, continuing DeepSeek's cost leadership
  • Open-weight release makes it the most capable freely available model if performance claims hold
  • Three new architectural innovations (mHC, Engram Memory, Lightning Indexer) that advance the state of MoE design

Weaknesses (Expected/Potential)

  • Huawei Ascend optimization means Nvidia GPU users may see suboptimal performance at launch
  • All benchmark claims are unverified - leaked numbers are not independently confirmed
  • Agentic tool-use performance (V3.2's weakest area) has no leaked data
  • Trillion-parameter model requires massive GPU infrastructure to self-host
  • No official pricing, timeline, or model card - everything could change at launch
  • Chinese company origin may pose compliance concerns for some enterprise deployments
  • V3.2's weakness on BrowseComp (51.4-67.6%) and agentic tasks may persist if the architecture improvements focus on reasoning and coding

Sources

DeepSeek V4
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.