DeepSeek V4 Drops Next Week - 1 Trillion Parameters on Chinese Chips

DeepSeek will release its V4 flagship model in the first week of March, the Financial Times reported Friday, confirming months of speculation with the most concrete timeline yet. The release is timed to coincide with China's annual "Two Sessions" parliamentary meetings starting March 4 - a deliberate staging that positions V4 as a symbol of China's AI self-sufficiency.

TL;DR

DeepSeek V4 is expected between March 3-7, a trillion-parameter Mixture-of-Experts model with ~32B active parameters per token
Natively multimodal (text, image, video, audio) with a 1 million token context window
Optimized for Huawei Ascend and Cambricon chips - Nvidia and AMD were deliberately excluded from pre-release access
Expected to be open-weight, following DeepSeek's MIT-licensed V3 release
A V4 Lite variant already leaked through inference providers under NDA, showing breakthrough code generation

What We Know

The Financial Times report, corroborated by CNBC and Reuters reporting from the preceding week, outlines a model that represents a full generational leap from DeepSeek V3.2.

Architecture

V4 uses a sparse Mixture-of-Experts architecture scaled to approximately 1 trillion total parameters - up from V3's 671 billion. Despite the 50% increase in total model size, active parameters per token drop from V3's 37 billion to roughly 32 billion, meaning inference costs should remain competitive with or below V3's already aggressive pricing.

The expert routing system has been significantly expanded. V4 selects from 16 expert pathways per token, up from V3's top-2/top-4 routing, drawn from hundreds of available experts per MoE layer. The model retains V3's Multi-head Latent Attention (MLA) while adding three architectural innovations that DeepSeek previewed in research papers published in January 2026:

Manifold-Constrained Hyper-Connections (mHC) - Solves training stability at the trillion-parameter scale
Engram Conditional Memory - Efficient retrieval from contexts exceeding 1 million tokens
DeepSeek Sparse Attention with Lightning Indexer - Million-token context processing first previewed in V3.2-Exp

Multimodal From the Ground Up

Unlike competitors that bolt vision and audio capabilities onto text-only base models, V4 is described as natively multimodal - trained from the start on text, image, video, and audio data. This is a significant architectural departure. DeepSeek's previous models were text-only at the base level, with multimodal capabilities handled by separate models like DeepSeek-VL.

The 1 Million Token Context Window

DeepSeek silently expanded the context window on its existing API models from 128K to 1M tokens on February 11 - widely interpreted as infrastructure preparation for V4. The Engram Conditional Memory system, detailed in DeepSeek's January 13 paper, provides the theoretical basis for efficient retrieval at this scale.

The Hardware Story

As we reported last week, DeepSeek has optimized V4 for Huawei Ascend and Cambricon chips rather than Nvidia GPUs. Reuters confirmed that Nvidia and AMD were excluded from the pre-release optimization pipeline - a first for a major AI lab.

This is not a symbolic gesture. It means V4's inference performance will be best on Chinese hardware from day one, with Nvidia optimization coming later (if at all). For the open-source community running V4 on Nvidia GPUs, performance may be suboptimal at launch.

The geopolitical implications are hard to overstate. DeepSeek is building a parallel software ecosystem that doesn't depend on American silicon - exactly the outcome US export controls were designed to prevent.

Leaked Benchmarks

Internal benchmark numbers have leaked but remain unverified. The reported figures suggest V4 is competitive with the current frontier:

Benchmark	DeepSeek V4 (leaked)	Current Leaders
HumanEval (code)	~90%	Claude Opus 4.5: 88%
SWE-bench Verified	80%+	Claude Opus 4.5: 80.9%

These numbers come from internal testing and code repository analysis, not independent verification. Treat them accordingly. DeepSeek plans to publish a technical note at launch with a comprehensive engineering report following approximately one month later.

The V4 Lite leak from last week provided more tangible evidence. The smaller variant demonstrated breakthrough SVG code generation capabilities and was described by one inference provider testing it under NDA as producing "more optimized code than DeepSeek 3.2, Claude Opus 4.6, and Gemini 3.1."

Pricing

No official pricing has been announced. One analysis estimates input tokens at $0.14 per million and output tokens at $0.28 per million - roughly 20-50x cheaper than OpenAI's comparable models. DeepSeek V3's API pricing is already among the cheapest in the industry, and the company has consistently prioritized cost leadership.

Open Source Expectations

DeepSeek has open-sourced every major model release under permissive licenses. V3 used the MIT license. Multiple sources expect V4 to follow the same pattern, with some suggesting a move to Apache 2.0. Either way, the model weights are expected to be publicly available.

For the open-source AI community, a trillion-parameter open-weight model that matches frontier proprietary systems would be the most significant release since V3 disrupted the industry in late 2024.

The Timing

The release is strategically timed around China's Two Sessions, the country's most important annual political gathering. By launching V4 alongside the meetings, DeepSeek positions the model as evidence that China's AI capabilities are not only surviving US export controls but advancing despite them.

DeepSeek has maintained its "characteristic operational silence" throughout. The company has not responded to requests for comment from any outlet. There is no official announcement on deepseek.com, no V4 repository on GitHub, and the API changelog still shows V3.2 as the latest release.

The silence will break sometime in the next seven days. When it does, every benchmark leaderboard in the industry will need updating.

Sources: