Models

DeepSeek V3.2

DeepSeek V3.2 is a 671B-parameter MoE model activating 37B per token that delivers frontier-class reasoning and coding at the lowest API price in the industry - $0.14/$0.28 input, $0.42 output per million tokens.

DeepSeek V3.2

TL;DR

  • 671B total / 37B active MoE model under MIT license - fully open weights
  • Frontier-class reasoning (MMLU-Pro 85.0, GPQA Diamond 82.4, AIME 2025 93.1) and coding (SWE-bench Verified 73.1%, Codeforces 2386)
  • Cheapest frontier API available: $0.028/M tokens on cache hit, $0.28/M cache miss input, $0.42/M output
  • DeepSeek Sparse Attention enables efficient 128K context processing with linear scaling characteristics

Overview

DeepSeek V3.2 landed on September 29, 2025 as the experimental successor to V3.1-Terminus, and the official non-experimental release followed on December 1, 2025. The headline capability is DeepSeek Sparse Attention (DSA) - a fine-grained sparse attention mechanism that cuts inference costs on long sequences while keeping output quality virtually identical to the dense attention baseline. The numbers back this up. On AIME 2025, V3.2 scores 93.1 (up from 88.4 on V3.1-Terminus). On Codeforces competitive programming, it rates 2386. On SWE-bench Verified, it hits 73.1%. These are not incremental gains - they represent a model that competes directly with GPT-5 on reasoning while costing 10-30x less to run.

The pricing story is the real disruptor. At $0.028 per million input tokens on a cache hit and $0.28 on a cache miss, DeepSeek V3.2 is cheaper than every other frontier model by a wide margin. Claude Opus 4.6 charges $5.00/M input tokens. GPT-5.3 Codex is $1.25/M. Even Gemini 3.1 Pro at $2.00/M is roughly 7x more expensive on input. The automatic context caching makes repeated or partially-overlapping prompts even cheaper - if you are building an agent that sends similar system prompts, your effective input cost drops to near zero.

Where V3.2 falls short is in agentic execution and tool use. BrowseComp scores of 51.4-67.6 and an MCP-Mark of 38.0 lag behind what Claude Opus 4.6 and GPT-5.2 deliver on sustained multi-step tasks. If your workload is interactive chat, code generation, or reasoning-heavy analysis, V3.2 is arguably the best cost-adjusted option available. If you need a model that can reliably orchestrate complex tool chains over many steps, the proprietary frontier models still hold an edge. Read our full review for hands-on testing.

Key Specifications

SpecificationDetails
ProviderDeepSeek
Model FamilyDeepSeek V3
ArchitectureTransformer MoE with DeepSeek Sparse Attention (DSA)
Total Parameters671B
Active Parameters37B per token
Experts256 total
Context Window128,000 tokens
Input Price$0.028/M tokens (cache hit), $0.28/M tokens (cache miss)
Output Price$0.42/M tokens
Release DateSeptember 29, 2025 (Exp), December 1, 2025 (Official)
LicenseMIT
Input ModalitiesText
Output ModalityText
QuantizationFP8 supported
Model IDdeepseek-chat / deepseek-reasoner

Benchmark Performance

BenchmarkDeepSeek V3.2Claude Opus 4.6GPT-5.2Gemini 3.1 Pro
MMLU-Pro (knowledge/reasoning)85.085.886.290.1
GPQA Diamond (PhD-level science)82.491.393.294.3
AIME 2025 (competition math)93.187.288.591.0
Codeforces (competitive programming)2386210021502439
SWE-bench Verified (GitHub issues)73.1%80.8%80.0%76.2%
LiveCodeBench83.378.579.281.0
BrowseComp (web research)51.4-67.684.077.959.2
HMMT Feb 2025 (math olympiad)92.5--93.8

The pattern is clear: DeepSeek V3.2 trades blows with the proprietary frontier on reasoning and competitive coding, wins outright on several math benchmarks, but drops off on agentic and tool-use tasks. On AIME 2025 (93.1) and Codeforces (2386), it is genuinely best-in-class among non-reasoning-mode models. On SWE-bench (73.1%), it trails Claude and GPT by 7-8 points - meaningful for production code repair pipelines. On BrowseComp (51.4-67.6), the gap to Claude's 84.0 is too large to ignore for web research workloads.

The high-compute variant, V3.2-Speciale, pushes further: 99.2% on HMMT Feb 2025, 96.0% on AIME, and gold medals at IMO and IOI 2025. If your use case demands maximum mathematical reasoning, the Speciale mode is worth the additional compute.

Key Capabilities

DeepSeek Sparse Attention. The core architectural innovation is DSA - a fine-grained sparse attention mechanism that operates within the Multi-head Latent Attention (MLA) module. Unlike standard full attention which scales quadratically with context length, DSA selectively attends to relevant positions, delivering substantial improvements in both training and inference efficiency for long-context workloads. The practical result is that the 128K context window does not come with the latency and cost penalties you would expect from a 671B model.

Cost-Optimized Inference. DeepSeek's automatic context caching means that if you send a prompt that partially overlaps with a previous one - common in agentic loops, chatbot sessions, or batch processing - you pay the cache hit rate of $0.028/M tokens instead of the full $0.28/M. For production workloads with system prompts and repeated context, this effectively makes V3.2 the cheapest frontier model by an order of magnitude. The 37B active parameter count also means inference is fundamentally cheaper per token than dense models of comparable quality.

Competitive Coding and Math. The Codeforces rating of 2386 and AIME 2025 score of 93.1 put V3.2 in the top tier for algorithmic and mathematical problem-solving. The LiveCodeBench score of 83.3 confirms this is not benchmark-specific overfitting - performance holds on held-out coding evaluations. For teams using LLMs for competitive programming assistance, algorithmic research, or math-heavy applications, V3.2 delivers performance on par with models costing 10-30x more.

Pricing and Availability

TierInputOutput
Cache Hit$0.028/M tokens-
Cache Miss$0.28/M tokens$0.42/M tokens

DeepSeek V3.2 is available through the DeepSeek API (powering both deepseek-chat and deepseek-reasoner endpoints), Google Cloud Vertex AI, Microsoft Azure Foundry, and NVIDIA NIM. The model weights are MIT licensed and available on HuggingFace for self-hosting, though the full 671B parameter model requires significant GPU infrastructure (FP8 serving on multi-GPU nodes).

For cost comparison: processing 1 million input tokens through Claude Opus 4.6 costs $5.00. Through DeepSeek V3.2, the same volume costs $0.28 on cache miss or $0.028 on cache hit. That is an 18x to 178x cost advantage on input. Even accounting for the quality gap on agentic tasks, the economics are hard to argue with for reasoning and coding workloads. See our open source vs proprietary AI guide for a broader framework on when self-hosting makes sense.

Strengths

  • Cheapest frontier API by a wide margin - $0.028/M on cache hit is unmatched
  • Best-in-class competitive math (AIME 93.1) and coding (Codeforces 2386, LiveCodeBench 83.3)
  • MIT license allows unrestricted commercial and research use
  • 671B/37B MoE architecture delivers high capability at low per-token compute
  • Automatic context caching slashes costs for production workloads with repeated context
  • DeepSeek Sparse Attention enables efficient long-context processing

Weaknesses

  • Agentic tool use (BrowseComp 51.4-67.6, MCP-Mark 38.0) trails Claude and GPT significantly
  • SWE-bench Verified (73.1%) falls 7-8 points behind the leading proprietary models
  • Text-only - no image or multimodal input support
  • Self-hosting the full 671B model requires substantial GPU infrastructure
  • GPQA Diamond (82.4%) lags behind the top proprietary models by 9-12 points
  • Chinese company origin may pose compliance concerns for some enterprise deployments

Sources

DeepSeek V3.2
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.