Open-Source LLM Leaderboard: February 2026
Rankings of the best open-weight and open-source large language models in February 2026, including DeepSeek V3.2, Qwen 3.5, Llama 4 Maverick, GLM-5, and Mistral 3.

The open-source AI revolution is no longer a promise. It is a reality. In February 2026, open-weight models routinely match or exceed the performance of proprietary models from just twelve months ago, and in some specialized benchmarks, they compete with the very best closed models available today. For developers, researchers, and organizations that need control over their AI infrastructure, the options have never been better.
This leaderboard ranks open-weight and open-source models exclusively, covering models whose weights are publicly available for download, fine-tuning, and self-hosting.
Open-Source LLM Rankings
| Rank | Model | Organization | Parameters | License | MMLU-Pro | GPQA Diamond | SWE-Bench Verified | Chatbot Arena Elo |
|---|---|---|---|---|---|---|---|---|
| 1 | DeepSeek V3.2-Speciale | DeepSeek | 685B (MoE) | MIT | 85.9% | 85.3% | 77.8% | 1361 |
| 2 | Qwen 3.5 | Alibaba | 405B (MoE) | Apache 2.0 | 84.6% | 82.1% | 62.5% | 1342 |
| 3 | Llama 4 Maverick | Meta | 402B (MoE) | Llama 4 License | 83.2% | 78.5% | 55.8% | 1320 |
| 4 | GLM-5 | Zhipu AI | 320B (MoE) | Apache 2.0 | 81.5% | 76.8% | 77.8% | 1298 |
| 5 | Mistral 3 | Mistral AI | 240B (MoE) | Apache 2.0 | 82.8% | 79.3% | 54.1% | 1315 |
| 6 | DeepSeek V3.2 | DeepSeek | 685B (MoE) | MIT | 84.1% | 83.8% | 72.4% | 1348 |
| 7 | Qwen 3 235B | Alibaba | 235B (MoE) | Apache 2.0 | 81.2% | 78.4% | 55.2% | 1305 |
| 8 | Llama 4 Scout | Meta | 109B (MoE) | Llama 4 License | 78.5% | 72.1% | 42.3% | 1278 |
| 9 | Mistral 3 Small | Mistral AI | 24B | Apache 2.0 | 74.8% | 65.3% | 38.1% | 1245 |
| 10 | Qwen 3 32B | Alibaba | 32B | Apache 2.0 | 73.5% | 64.8% | 36.5% | 1238 |
The Leaders
DeepSeek V3.2-Speciale: The Open-Source King
DeepSeek V3.2-Speciale is, by a comfortable margin, the most capable open-weight model available. Its performance on MMLU-Pro (85.9%) and GPQA Diamond (85.3%) puts it within striking distance of GPT-5.2 Pro and Claude Opus 4.6, and its 77.8% on SWE-Bench Verified actually leads the entire field, including closed models. The MIT license means you can use it for anything, commercial or otherwise, without restriction.
The catch is size. At 685 billion parameters in a mixture-of-experts architecture, running DeepSeek V3.2-Speciale requires serious hardware. You will need multiple high-end GPUs for inference, though the MoE architecture means that only a fraction of parameters are active for any given token, keeping actual compute costs manageable.
Qwen 3.5: The Most Permissive Frontier Model
Alibaba's Qwen 3.5 earns the second spot with a strong all-around performance profile. Its Apache 2.0 license is as permissive as it gets, and at 405 billion parameters (MoE), it is more practical to deploy than DeepSeek V3.2-Speciale while still delivering competitive performance. Qwen 3.5 is particularly strong in multilingual tasks, outperforming most competitors in Chinese, Japanese, Korean, and Arabic.
Llama 4 Maverick: Meta's Best
Llama 4 Maverick represents a significant step up from the Llama 3 generation. At 402 billion parameters with a MoE architecture, it delivers strong general-purpose performance. The Llama 4 License is more permissive than its predecessor but still includes some restrictions for very large commercial deployments (above 700 million monthly active users). For the vast majority of organizations, this is a non-issue.
GLM-5: The Coding Specialist
GLM-5 from Zhipu AI deserves special mention for its extraordinary coding performance. Its 77.8% on SWE-Bench Verified ties with DeepSeek V3.2-Speciale for the top spot on that benchmark across all models, open or closed. If your primary use case is code generation and software engineering, GLM-5 under its Apache 2.0 license is a compelling choice.
Mistral 3: European Excellence
Mistral AI continues to punch above its weight. Mistral 3 delivers performance comparable to models with nearly twice its parameter count, reflecting strong training data curation and architectural decisions. As the leading European AI lab, Mistral also offers advantages for organizations with EU data residency requirements.
Understanding Open-Source Licenses
Not all "open" models are equally open. Here is what the licenses in this leaderboard actually mean:
| License | Commercial Use | Modification | Redistribution | Notable Restrictions |
|---|---|---|---|---|
| MIT | Yes | Yes | Yes | None |
| Apache 2.0 | Yes | Yes | Yes | Patent grant included |
| Llama 4 License | Yes | Yes | Yes | Restrictions above 700M MAU |
MIT (used by DeepSeek) is the most permissive. You can do anything with the model weights. Apache 2.0 (used by Qwen, GLM, Mistral) is similarly permissive but includes an explicit patent grant, which some legal teams prefer. Llama 4 License is permissive for most use cases but includes a usage threshold that large social media platforms would need to negotiate separately.
The Self-Hosting Economics
One of the strongest arguments for open-weight models is cost. Running DeepSeek V3.2-Speciale on your own infrastructure costs roughly $0.028 per million input tokens when amortized over reasonable utilization. The equivalent API call to a frontier proprietary model costs $2-15 per million input tokens. That is a 70x to 500x cost difference.
Of course, self-hosting requires upfront investment in GPU hardware, engineering expertise, and operational overhead. But for organizations processing millions of tokens daily, the payback period is measured in weeks, not years.
Smaller Models Worth Watching
Not every deployment needs a 400B+ parameter model. Mistral 3 Small (24B) and Qwen 3 32B deliver impressive performance for their size class, running comfortably on a single high-end GPU. For latency-sensitive applications or edge deployment, these smaller models offer the best balance of capability and efficiency.
The open-source LLM ecosystem is advancing at a pace that consistently surprises even optimistic observers. Models that would have been state-of-the-art twelve months ago are now freely downloadable. The question is no longer whether open-source can compete with proprietary models, but how long before the gap closes entirely.