DeepSeek V3.2 Review: GPT-5 Performance at a Fraction of the Cost
A thorough review of DeepSeek V3.2, the 671B parameter MoE model that delivers frontier-level performance at dramatically lower cost with an MIT license.

Every so often, a model comes along that rewrites the economics of AI. DeepSeek V3.2 is that model. At $0.028 per million input tokens, it costs roughly 20 times less than GPT-5.2 while delivering performance that is genuinely competitive on the hardest benchmarks in existence. Released under the MIT license, it represents the most significant challenge yet to the proprietary model business model.
Architecture: Mixture of Experts at Scale
DeepSeek V3.2 is a 671 billion parameter Mixture of Experts (MoE) model, but only a fraction of those parameters are active for any given token. This architectural choice is the key to its cost efficiency. By routing each input to a specialized subset of experts, the model achieves the knowledge capacity of a massive dense model while using the compute budget of a much smaller one.
The DeepSeek Sparse Attention mechanism is a notable innovation. Rather than computing full attention across the entire context, it identifies and attends to the most relevant portions of the input, dramatically reducing memory and compute requirements for long sequences. In practice, this means the model can handle long documents without the quadratic cost explosion that plagues standard attention.
Benchmark Performance
The numbers are remarkable for an open-source model. DeepSeek V3.2 achieves 96% on AIME, the American Invitational Mathematics Examination, placing it within striking distance of GPT-5.2's perfect score. Even more impressive, it earns a gold medal equivalent on the International Mathematical Olympiad (IMO), a feat that requires not just computation but genuine mathematical creativity and proof construction.
On standard language benchmarks, V3.2 trades blows with models costing 10-20 times more. Its coding performance is particularly strong, with competitive scores on HumanEval, MBPP, and real-world coding tasks. The model writes clean, idiomatic code across multiple languages and handles complex algorithmic problems with confidence.
Thinking with Tools
One of V3.2's most practical innovations is its thinking with tools capability. Rather than treating tool use as a separate step, the model integrates tool calls into its chain-of-thought reasoning. It might start solving a math problem symbolically, realize it needs numerical verification, call a calculator or code interpreter, incorporate the result, and continue reasoning, all in a single fluid pass.
This is not just a convenience feature. It fundamentally changes how the model approaches complex problems. Instead of committing to a single solution strategy, it can fluidly switch between analytical reasoning and computational verification. The result is more reliable answers on problems that mix symbolic reasoning with numerical computation.
Real-World Performance
We put V3.2 through extensive real-world testing. For coding tasks, it performed admirably on medium-complexity projects. It generated a working REST API with authentication, database integration, and proper error handling in a single session. It correctly refactored a tangled JavaScript codebase into clean, modular components. On very large codebases (30,000+ lines), it occasionally lost track of dependencies between files, but this is a common limitation shared by most models.
For research and analysis, V3.2 produced summaries and literature reviews that were thorough and well-organized. Its mathematical reasoning is its crown jewel: we presented it with graduate-level problems from topology, abstract algebra, and analysis, and it solved the majority correctly with clear, well-structured proofs.
For general conversation and writing, the model is competent but not exceptional. It tends toward a somewhat dry, academic tone that works well for technical content but falls flat for creative writing or casual chat. This is a clear area where the proprietary models, with their extensive RLHF tuning, maintain an advantage.
The MIT License Factor
DeepSeek V3.2's MIT license is arguably as significant as its performance. Organizations can download, modify, fine-tune, and deploy this model without licensing fees, usage restrictions, or data-sharing requirements. For companies with privacy concerns, regulatory constraints, or simply a desire for full control over their AI stack, this is enormously valuable.
The self-hosting economics are compelling. On modern GPU clusters, V3.2 can be served at costs that make the $0.028/M input token API price look expensive by comparison. For high-volume applications, the total cost of ownership drops to levels that would have been unimaginable a year ago.
Strengths and Weaknesses
Strengths:
- Extraordinary price-to-performance ratio that redefines value
- Near-frontier math and reasoning capabilities (96% AIME, IMO gold)
- MIT license enables unrestricted commercial use and self-hosting
- DeepSeek Sparse Attention enables efficient long-context processing
- Thinking with tools integrates computation into reasoning seamlessly
- Strong coding performance across multiple languages
Weaknesses:
- Creative writing and conversational tone lag behind proprietary models
- Very large model requires significant GPU resources to self-host
- Safety guardrails are less sophisticated than Anthropic or OpenAI offerings
- Long-context retrieval accuracy drops off before competitors at the high end
- Documentation and community support trail the big labs
- Occasional inconsistencies in following complex multi-step instructions
Verdict: 9.1/10
DeepSeek V3.2 is the best value-for-money AI model, period. It proves that frontier-level intelligence does not require frontier-level pricing. For mathematical reasoning, coding, and technical analysis, it competes directly with models that cost 20 times more. The MIT license and self-hosting potential make it the obvious choice for organizations that need powerful AI without vendor lock-in. Its weaknesses in creative writing and safety tooling mean it is not the best choice for every use case, but for technical workloads where cost matters, nothing else comes close to this combination of capability and affordability.