DeepSeek V3.2 is the model that broke the pricing floor in late 2025 - 671 billion parameters, 37 billion active per token, MIT licensed, and API pricing so aggressive it forced every competitor to rethink their economics. Now DeepSeek V4 is arriving in the first week of March with roughly 1 trillion parameters, native multimodal, and a 1 million token context window.

This is not a point release. V4 is a generational leap - new architecture, new modalities, new hardware target, and potentially even cheaper pricing. Every major dimension of the model has changed.

Note: V4 has not been officially released. This comparison uses leaked benchmarks, reporting from FT/Reuters/CNBC, and the V4 Lite leak. We will update this article with verified data after DeepSeek publishes official specifications.

TL;DR

V3.2 is the known quantity - fully released, benchmarked, MIT licensed, and the cheapest frontier API available. If you need a production model today, V3.2 is shipping.
V4 is the upgrade worth waiting days for - if the leaked benchmarks hold, it closes V3.2's biggest gaps (SWE-bench, multimodal, context length) while potentially being even cheaper.

Quick Comparison

Feature	DeepSeek V3.2	DeepSeek V4 (pre-release)
Status	Released (Dec 2025)	Expected March 3-7, 2026
Architecture	MoE + MLA + DSA	MoE + MLA + mHC + Engram Memory + DSA/Lightning
Total Parameters	671B	~1T
Active Parameters	37B	~32B
Expert Routing	Top-2/top-4	16 experts per token
Context Window	128K	1M
Input Modalities	Text only	Text, Image, Video, Audio
API Pricing (Input)	$0.28/M (cache miss)	~$0.14/M (estimated)
API Pricing (Output)	$0.42/M	~$0.28/M (estimated)
License	MIT	Expected MIT or Apache 2.0
Hardware	Nvidia primary	Huawei Ascend primary
SWE-bench Verified	73.1%	80%+ (leaked)
HumanEval	-	~90% (leaked)
AIME 2025	93.1	TBD
Codeforces	2386	TBD
MMLU-Pro	85.0	TBD
BrowseComp	51.4-67.6%	TBD

V3.2: The Current Champion

V3.2 landed in September 2025 (experimental) and December 2025 (official release), and it reshaped the industry's assumptions about what open-weight models could cost. At $0.28 per million input tokens on cache miss - dropping to $0.028 on cache hit - it is the cheapest frontier-quality API in the world. The automatic context caching alone makes it the obvious choice for any workload with repeated system prompts or overlapping context.

The benchmark profile is strong but has clear gaps. On competition math (AIME 93.1, Codeforces 2386) and coding evaluation (LiveCodeBench 83.3), V3.2 competes with or beats proprietary models costing 10-30x more. On SWE-bench Verified (73.1%), it trails Claude Opus 4.6 by nearly 8 points - a meaningful gap for production code repair. On agentic tasks like BrowseComp (51.4-67.6%), the gap to the proprietary frontier is even larger. For our full assessment, see the V3.2 review.

V3.2 is text-only. No image input, no video, no audio. If your workload involves any visual understanding, V3.2 is out of the running and you need a separate model - DeepSeek-VL, Kimi K2.5, or one of the proprietary options.

V4: What the Leaks Tell Us

V4 changes nearly everything except the core philosophy of maximizing capability per dollar.

The architecture scales from 671B to approximately 1 trillion total parameters while actually reducing active parameters from 37B to ~32B per token. This means inference costs per token should be comparable to or lower than V3.2, despite the model being 50% larger overall. The expert routing system expands from V3.2's top-2/top-4 selection to 16 expert pathways per token, drawn from hundreds of available experts per MoE layer.

Three new architectural innovations address V3.2's limitations:

Manifold-Constrained Hyper-Connections (mHC) solves the training stability challenges that historically made trillion-parameter models unreliable to optimize
Engram Conditional Memory provides the retrieval mechanism for the 8x expansion from 128K to 1M tokens of context
DeepSeek Sparse Attention with Lightning Indexer extends V3.2's DSA to handle million-token sequences efficiently

The multimodal capability is the most significant functional change. V3.2 is text-only. V4 is described as natively multimodal - trained from the start on text, image, video, and audio data. This means V4 can replace both V3.2 and DeepSeek-VL in a single model, simplifying infrastructure for teams that currently run separate text and vision models.

The Coding Gap May Close

V3.2's most cited weakness is its SWE-bench Verified score of 73.1%, which trails Claude Opus 4.6 (80.8%) and GPT-5.3 Codex (80.0%) by 7-8 points. For teams building automated code repair pipelines, that gap matters.

The leaked V4 benchmarks suggest SWE-bench Verified above 80%. If that holds, V4 would match the proprietary frontier on the single most important real-world coding benchmark - while costing a fraction of the price. HumanEval at ~90% corroborates this picture.

The V4 Lite leak provides additional evidence. The smaller variant demonstrated breakthrough SVG code generation and was described by one inference provider as producing "more optimized code than DeepSeek 3.2, Claude Opus 4.6, and Gemini 3.1." If the smaller variant already surpasses V3.2 on code quality, V4 full should be a significant step up.

Benchmark Comparison

Benchmark	DeepSeek V3.2	DeepSeek V4 (leaked)	Delta
SWE-bench Verified	73.1%	80%+	V4 +7 points or more
HumanEval	-	~90%	V4 (no V3.2 baseline)
AIME 2025	93.1	TBD	-
MMLU-Pro	85.0	TBD	-
GPQA Diamond	82.4	TBD	-
Codeforces	2386	TBD	-
LiveCodeBench	83.3	TBD	-
BrowseComp	51.4-67.6%	TBD	-
Context Window	128K	1M	V4 (8x longer)
Active Params	37B	~32B	V4 (fewer, more efficient)
Total Params	671B	~1T	V4 (50% larger)
Modalities	Text	Text, Image, Video, Audio	V4 (native multimodal)

The comparison is frustratingly incomplete because most V4 benchmark data hasn't leaked. What we can say: on the two leaked benchmarks (SWE-bench and HumanEval), V4 appears to be a substantial improvement. On the architectural dimensions - context length, modality, parameter efficiency - V4 is unambiguously superior.

Pricing Analysis

Cost Factor	DeepSeek V3.2	DeepSeek V4 (estimated)
Input (per 1M tokens)	$0.28 (cache miss)	~$0.14
Input (cache hit)	$0.028	TBD
Output (per 1M tokens)	$0.42	~$0.28
License	MIT	Expected MIT or Apache 2.0

If the estimated pricing holds, V4 would be roughly 50% cheaper on input and 33% cheaper on output than V3.2 - despite being a larger, multimodal model with 8x the context window. DeepSeek's efficiency at reducing per-token costs with each generation has been the single most disruptive force in AI pricing. V4 would continue that trend.

For a team processing 10 million output tokens per day, the annual cost drops from ~$1,533 (V3.2) to ~$1,022 (V4 estimated). The savings are meaningful but not dramatic at the per-token level - the real value is getting multimodal, 1M context, and better coding for less money.

DeepSeek V3.2: Pros and Cons

Pros:

Released, verified, and production-stable - known quantity with months of real-world use
Cheapest frontier API available today ($0.028/M on cache hit)
MIT license with massive community and third-party ecosystem
AIME 93.1 and Codeforces 2386 - best-in-class competition math and coding
Extensive documentation, model card, and technical report available
Runs on Nvidia GPUs with optimized inference tooling

Cons:

Text-only - no vision, video, or audio input
128K context window is 8x smaller than V4's expected 1M
SWE-bench 73.1% trails the proprietary frontier by 7-8 points
BrowseComp 51.4-67.6% shows real weakness on agentic tasks
Will likely be superseded by V4 within days

DeepSeek V4: Pros and Cons

Pros (expected):

~1T parameters with only ~32B active - potentially better performance per FLOP than V3.2
Native multimodal replaces the need for separate vision models
1M context window with purpose-built Engram Conditional Memory
Leaked SWE-bench 80%+ would match the proprietary frontier on coding
Potentially 33-50% cheaper than V3.2 on per-token pricing
Three new architectural innovations that advance MoE design

Cons (expected/potential):

Not yet released - all claims are unverified
Huawei Ascend optimization means potentially slower Nvidia inference at launch
No leaked data on agentic tasks, math, or reasoning benchmarks
Trillion-parameter model requires even more infrastructure to self-host than V3.2
Community tooling and third-party integrations will take time to develop
V3.2's BrowseComp weakness may persist if the improvements focus on coding

Verdict

If you need a model today, use V3.2. It is released, stable, MIT licensed, thoroughly benchmarked, and the cheapest frontier API available. The community tooling is mature, the documentation is extensive, and it works on Nvidia GPUs without friction.

If you can wait days, wait for V4. The generational improvements - native multimodal, 8x context window, leaked SWE-bench closing the proprietary gap, potentially cheaper pricing - are substantial enough that deploying V3.2 for a new project right now only makes sense if your timeline cannot tolerate even a one-week delay.

If multimodal matters, the choice is already clear. V3.2 is text-only. V4 is not. If your workload involves any visual or audio input, V4 eliminates the need to run a separate model entirely.

The biggest unknown is whether V4's agentic performance improves over V3.2. BrowseComp and sustained tool-use tasks were V3.2's weakest areas. If V4 addresses that gap alongside coding, it would be the most complete open-weight model ever released. If V4 focuses narrowly on reasoning and coding while leaving agentic performance flat, there will still be compelling reasons to pair it with proprietary models for complex workflows. We'll know in a matter of days.

DeepSeek V3.2 vs V4 - What Changes With a Trillion Parameters

Quick Comparison

V3.2: The Current Champion

V4: What the Leaks Tell Us

The Coding Gap May Close

Benchmark Comparison

Pricing Analysis

DeepSeek V3.2: Pros and Cons

DeepSeek V4: Pros and Cons

Verdict

Sources

Quick Comparison

V3.2: The Current Champion

V4: What the Leaks Tell Us

The Coding Gap May Close

Benchmark Comparison

Pricing Analysis

DeepSeek V3.2: Pros and Cons

DeepSeek V4: Pros and Cons

Verdict

Sources

Google Analytics