DeepSeek V3.2
DeepSeek V3.2 is a 671B-parameter MoE model activating 37B per token that delivers frontier-class reasoning and coding at the lowest API price in the industry - $0.14/$0.28 input, $0.42 output per million tokens.

TL;DR
- 671B total / 37B active MoE model under MIT license - fully open weights
- Frontier-class reasoning (MMLU-Pro 85.0, GPQA Diamond 82.4, AIME 2025 93.1) and coding (SWE-bench Verified 73.1%, Codeforces 2386)
- Cheapest frontier API available: $0.028/M tokens on cache hit, $0.28/M cache miss input, $0.42/M output
- DeepSeek Sparse Attention enables efficient 128K context processing with linear scaling characteristics
Overview
DeepSeek V3.2 landed on September 29, 2025 as the experimental successor to V3.1-Terminus, and the official non-experimental release followed on December 1, 2025. The headline capability is DeepSeek Sparse Attention (DSA) - a fine-grained sparse attention mechanism that cuts inference costs on long sequences while keeping output quality virtually identical to the dense attention baseline. The numbers back this up. On AIME 2025, V3.2 scores 93.1 (up from 88.4 on V3.1-Terminus). On Codeforces competitive programming, it rates 2386. On SWE-bench Verified, it hits 73.1%. These are not incremental gains - they represent a model that competes directly with GPT-5 on reasoning while costing 10-30x less to run.
The pricing story is the real disruptor. At $0.028 per million input tokens on a cache hit and $0.28 on a cache miss, DeepSeek V3.2 is cheaper than every other frontier model by a wide margin. Claude Opus 4.6 charges $5.00/M input tokens. GPT-5.3 Codex is $1.25/M. Even Gemini 3.1 Pro at $2.00/M is roughly 7x more expensive on input. The automatic context caching makes repeated or partially-overlapping prompts even cheaper - if you are building an agent that sends similar system prompts, your effective input cost drops to near zero.
Where V3.2 falls short is in agentic execution and tool use. BrowseComp scores of 51.4-67.6 and an MCP-Mark of 38.0 lag behind what Claude Opus 4.6 and GPT-5.2 deliver on sustained multi-step tasks. If your workload is interactive chat, code generation, or reasoning-heavy analysis, V3.2 is arguably the best cost-adjusted option available. If you need a model that can reliably orchestrate complex tool chains over many steps, the proprietary frontier models still hold an edge. Read our full review for hands-on testing.
Key Specifications
| Specification | Details |
|---|---|
| Provider | DeepSeek |
| Model Family | DeepSeek V3 |
| Architecture | Transformer MoE with DeepSeek Sparse Attention (DSA) |
| Total Parameters | 671B |
| Active Parameters | 37B per token |
| Experts | 256 total |
| Context Window | 128,000 tokens |
| Input Price | $0.028/M tokens (cache hit), $0.28/M tokens (cache miss) |
| Output Price | $0.42/M tokens |
| Release Date | September 29, 2025 (Exp), December 1, 2025 (Official) |
| License | MIT |
| Input Modalities | Text |
| Output Modality | Text |
| Quantization | FP8 supported |
| Model ID | deepseek-chat / deepseek-reasoner |
Benchmark Performance
| Benchmark | DeepSeek V3.2 | Claude Opus 4.6 | GPT-5.2 | Gemini 3.1 Pro |
|---|---|---|---|---|
| MMLU-Pro (knowledge/reasoning) | 85.0 | 85.8 | 86.2 | 90.1 |
| GPQA Diamond (PhD-level science) | 82.4 | 91.3 | 93.2 | 94.3 |
| AIME 2025 (competition math) | 93.1 | 87.2 | 88.5 | 91.0 |
| Codeforces (competitive programming) | 2386 | 2100 | 2150 | 2439 |
| SWE-bench Verified (GitHub issues) | 73.1% | 80.8% | 80.0% | 76.2% |
| LiveCodeBench | 83.3 | 78.5 | 79.2 | 81.0 |
| BrowseComp (web research) | 51.4-67.6 | 84.0 | 77.9 | 59.2 |
| HMMT Feb 2025 (math olympiad) | 92.5 | - | - | 93.8 |
The pattern is clear: DeepSeek V3.2 trades blows with the proprietary frontier on reasoning and competitive coding, wins outright on several math benchmarks, but drops off on agentic and tool-use tasks. On AIME 2025 (93.1) and Codeforces (2386), it is genuinely best-in-class among non-reasoning-mode models. On SWE-bench (73.1%), it trails Claude and GPT by 7-8 points - meaningful for production code repair pipelines. On BrowseComp (51.4-67.6), the gap to Claude's 84.0 is too large to ignore for web research workloads.
The high-compute variant, V3.2-Speciale, pushes further: 99.2% on HMMT Feb 2025, 96.0% on AIME, and gold medals at IMO and IOI 2025. If your use case demands maximum mathematical reasoning, the Speciale mode is worth the additional compute.
Key Capabilities
DeepSeek Sparse Attention. The core architectural innovation is DSA - a fine-grained sparse attention mechanism that operates within the Multi-head Latent Attention (MLA) module. Unlike standard full attention which scales quadratically with context length, DSA selectively attends to relevant positions, delivering substantial improvements in both training and inference efficiency for long-context workloads. The practical result is that the 128K context window does not come with the latency and cost penalties you would expect from a 671B model.
Cost-Optimized Inference. DeepSeek's automatic context caching means that if you send a prompt that partially overlaps with a previous one - common in agentic loops, chatbot sessions, or batch processing - you pay the cache hit rate of $0.028/M tokens instead of the full $0.28/M. For production workloads with system prompts and repeated context, this effectively makes V3.2 the cheapest frontier model by an order of magnitude. The 37B active parameter count also means inference is fundamentally cheaper per token than dense models of comparable quality.
Competitive Coding and Math. The Codeforces rating of 2386 and AIME 2025 score of 93.1 put V3.2 in the top tier for algorithmic and mathematical problem-solving. The LiveCodeBench score of 83.3 confirms this is not benchmark-specific overfitting - performance holds on held-out coding evaluations. For teams using LLMs for competitive programming assistance, algorithmic research, or math-heavy applications, V3.2 delivers performance on par with models costing 10-30x more.
Pricing and Availability
| Tier | Input | Output |
|---|---|---|
| Cache Hit | $0.028/M tokens | - |
| Cache Miss | $0.28/M tokens | $0.42/M tokens |
DeepSeek V3.2 is available through the DeepSeek API (powering both deepseek-chat and deepseek-reasoner endpoints), Google Cloud Vertex AI, Microsoft Azure Foundry, and NVIDIA NIM. The model weights are MIT licensed and available on HuggingFace for self-hosting, though the full 671B parameter model requires significant GPU infrastructure (FP8 serving on multi-GPU nodes).
For cost comparison: processing 1 million input tokens through Claude Opus 4.6 costs $5.00. Through DeepSeek V3.2, the same volume costs $0.28 on cache miss or $0.028 on cache hit. That is an 18x to 178x cost advantage on input. Even accounting for the quality gap on agentic tasks, the economics are hard to argue with for reasoning and coding workloads. See our open source vs proprietary AI guide for a broader framework on when self-hosting makes sense.
Strengths
- Cheapest frontier API by a wide margin - $0.028/M on cache hit is unmatched
- Best-in-class competitive math (AIME 93.1) and coding (Codeforces 2386, LiveCodeBench 83.3)
- MIT license allows unrestricted commercial and research use
- 671B/37B MoE architecture delivers high capability at low per-token compute
- Automatic context caching slashes costs for production workloads with repeated context
- DeepSeek Sparse Attention enables efficient long-context processing
Weaknesses
- Agentic tool use (BrowseComp 51.4-67.6, MCP-Mark 38.0) trails Claude and GPT significantly
- SWE-bench Verified (73.1%) falls 7-8 points behind the leading proprietary models
- Text-only - no image or multimodal input support
- Self-hosting the full 671B model requires substantial GPU infrastructure
- GPQA Diamond (82.4%) lags behind the top proprietary models by 9-12 points
- Chinese company origin may pose compliance concerns for some enterprise deployments
Related Coverage
- DeepSeek V3.2 Review - Our full hands-on review with coding, reasoning, and tool-use testing
- Open Source LLM Leaderboard - Current rankings for open-weight models
- Coding Benchmarks Leaderboard - SWE-bench, LiveCodeBench, and Codeforces rankings
- Open Source vs Proprietary AI - When to choose open-weight models over APIs
Sources
- DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models (arXiv)
- DeepSeek V3.2-Exp Model Card (HuggingFace)
- DeepSeek API Pricing Documentation
- DeepSeek V3.2 Release Notes
- DeepSeek V3.2-Exp Cuts API Pricing in Half - VentureBeat
- 2025 LLM Review: GPT-5.2, Gemini 3, Claude 4.5, DeepSeek V3.2 - Atoms.dev
