Name: DeepSeek V3.2
Author: DeepSeek

TL;DR

671B total / 37B active MoE model under MIT license - fully open weights
Frontier-class reasoning (MMLU-Pro 85.0, GPQA Diamond 82.4, AIME 2025 93.1) and coding (SWE-bench Verified 73.1%, Codeforces 2386)
Cheapest frontier API available: $0.028/M tokens on cache hit, $0.28/M cache miss input, $0.42/M output
DeepSeek Sparse Attention enables efficient 128K context processing with linear scaling characteristics

Overview

DeepSeek V3.2 landed on September 29, 2025 as the experimental successor to V3.1-Terminus, and the official non-experimental release followed on December 1, 2025. The headline capability is DeepSeek Sparse Attention (DSA) - a fine-grained sparse attention mechanism that cuts inference costs on long sequences while keeping output quality virtually identical to the dense attention baseline. The numbers back this up. On AIME 2025, V3.2 scores 93.1 (up from 88.4 on V3.1-Terminus). On Codeforces competitive programming, it rates 2386. On SWE-bench Verified, it hits 73.1%. These aren't gradual gains - they represent a model that competes directly with GPT-5 on reasoning while costing 10-30x less to run.

The pricing story is the real disruptor. At $0.028 per million input tokens on a cache hit and $0.28 on a cache miss, DeepSeek V3.2 is cheaper than every other frontier model by a wide margin. Claude Opus 4.6 charges $5.00/M input tokens. GPT-5.3 Codex is $1.25/M. Even Gemini 3.1 Pro at $2.00/M is roughly 7x more expensive on input. The automatic context caching makes repeated or partially-overlapping prompts even cheaper - if you're building an agent that sends similar system prompts, your effective input cost drops to near zero.

Where V3.2 falls short is in agentic execution and tool use. BrowseComp scores of 51.4-67.6 and a MCP-Mark of 38.0 lag behind what Claude Opus 4.6 and GPT-5.2 deliver on sustained multi-step tasks. If your workload is interactive chat, code generation, or reasoning-heavy analysis, V3.2 is arguably the best cost-adjusted option available. If you need a model that can reliably orchestrate complex tool chains over many steps, the proprietary frontier models still hold an edge. Read our full review for hands-on testing.

Key Specifications

Specification	Details
Provider	DeepSeek
Model Family	DeepSeek V3
Architecture	Transformer MoE with DeepSeek Sparse Attention (DSA)
Total Parameters	671B
Active Parameters	37B per token
Experts	256 total
Context Window	128,000 tokens
Input Price	$0.028/M tokens (cache hit), $0.28/M tokens (cache miss)
Output Price	$0.42/M tokens
Release Date	September 29, 2025 (Exp), December 1, 2025 (Official)
License	MIT
Input Modalities	Text
Output Modality	Text
Quantization	FP8 supported
Model ID	`deepseek-chat` / `deepseek-reasoner`

Benchmark Performance

Benchmark	DeepSeek V3.2	Claude Opus 4.6	GPT-5.2	Gemini 3.1 Pro
MMLU-Pro (knowledge/reasoning)	85.0	85.8	86.2	90.1
GPQA Diamond (PhD-level science)	82.4	91.3	93.2	94.3
AIME 2025 (competition math)	93.1	87.2	88.5	91.0
Codeforces (competitive programming)	2386	2100	2150	2439
SWE-bench Verified (GitHub issues)	73.1%	80.8%	80.0%	76.2%
LiveCodeBench	83.3	78.5	79.2	81.0
BrowseComp (web research)	51.4-67.6	84.0	77.9	59.2
HMMT Feb 2025 (math olympiad)	92.5	-	-	93.8

The pattern is clear: DeepSeek V3.2 trades blows with the proprietary frontier on reasoning and competitive coding, wins outright on several math benchmarks, but drops off on agentic and tool-use tasks. On AIME 2025 (93.1) and Codeforces (2386), it's genuinely best-in-class among non-reasoning-mode models. On SWE-bench (73.1%), it trails Claude and GPT by 7-8 points - meaningful for production code repair pipelines. On BrowseComp (51.4-67.6), the gap to Claude's 84.0 is too large to ignore for web research workloads.

The high-compute variant, V3.2-Speciale, pushes further: 99.2% on HMMT Feb 2025, 96.0% on AIME, and gold medals at IMO and IOI 2025. If your use case demands maximum mathematical reasoning, the Speciale mode is worth the additional compute.

Key Capabilities

DeepSeek Sparse Attention. The core architectural innovation is DSA - a fine-grained sparse attention mechanism that operates within the Multi-head Latent Attention (MLA) module. Unlike standard full attention which scales quadratically with context length, DSA selectively attends to relevant positions, delivering sizable improvements in both training and inference efficiency for long-context workloads. The practical result is that the 128K context window doesn't come with the latency and cost penalties you'd expect from a 671B model.

Cost-Optimized Inference. DeepSeek's automatic context caching means that if you send a prompt that partially overlaps with a previous one - common in agentic loops, chatbot sessions, or batch processing - you pay the cache hit rate of $0.028/M tokens instead of the full $0.28/M. For production workloads with system prompts and repeated context, this effectively makes V3.2 the cheapest frontier model by an order of magnitude. The 37B active parameter count also means inference is fundamentally cheaper per token than dense models of comparable quality.

Competitive Coding and Math. The Codeforces rating of 2386 and AIME 2025 score of 93.1 put V3.2 in the top tier for algorithmic and mathematical problem-solving. The LiveCodeBench score of 83.3 confirms this is not benchmark-specific overfitting - performance holds on held-out coding evaluations. For teams using LLMs for competitive programming assistance, algorithmic research, or math-heavy applications, V3.2 delivers performance on par with models costing 10-30x more.

Pricing and Availability

Tier	Input	Output
Cache Hit	$0.028/M tokens	-
Cache Miss	$0.28/M tokens	$0.42/M tokens

DeepSeek V3.2 is available through the DeepSeek API (powering both deepseek-chat and deepseek-reasoner endpoints), Google Cloud Vertex AI, Microsoft Azure Foundry, and NVIDIA NIM. The model weights are MIT licensed and available on HuggingFace for self-hosting, though the full 671B parameter model requires significant GPU infrastructure (FP8 serving on multi-GPU nodes).

For cost comparison: processing 1 million input tokens through Claude Opus 4.6 costs $5.00. Through DeepSeek V3.2, the same volume costs $0.28 on cache miss or $0.028 on cache hit. That is a 18x to 178x cost advantage on input. Even accounting for the quality gap on agentic tasks, the economics are hard to argue with for reasoning and coding workloads. See our open source vs proprietary AI guide for a broader framework on when self-hosting makes sense.

Strengths

Cheapest frontier API by a wide margin - $0.028/M on cache hit is unmatched
Best-in-class competitive math (AIME 93.1) and coding (Codeforces 2386, LiveCodeBench 83.3)
MIT license allows unrestricted commercial and research use
671B/37B MoE architecture delivers high capability at low per-token compute
Automatic context caching slashes costs for production workloads with repeated context
DeepSeek Sparse Attention enables efficient long-context processing

Weaknesses

Agentic tool use (BrowseComp 51.4-67.6, MCP-Mark 38.0) trails Claude and GPT notably
SWE-bench Verified (73.1%) falls 7-8 points behind the leading proprietary models
Text-only - no image or multimodal input support
Self-hosting the full 671B model requires major GPU infrastructure
GPQA Diamond (82.4%) lags behind the top proprietary models by 9-12 points
Chinese company origin may pose compliance concerns for some enterprise deployments

DeepSeek V3.2 Review - Our full hands-on review with coding, reasoning, and tool-use testing
Open Source LLM Leaderboard - Current rankings for open-weight models
Coding Benchmarks Leaderboard - SWE-bench, LiveCodeBench, and Codeforces rankings
Open Source vs Proprietary AI - When to choose open-weight models over APIs