Open-Source LLM Leaderboard: February 2026

The open-source AI revolution is no longer a promise. It is a reality. In February 2026, open-weight models routinely match or exceed the performance of proprietary models from just twelve months ago, and in some specialized benchmarks, they compete with the very best closed models available today. For developers, researchers, and organizations that need control over their AI infrastructure, the options have never been better.

This leaderboard ranks open-weight and open-source models exclusively, covering models whose weights are publicly available for download, fine-tuning, and self-hosting.

Open-Source LLM Rankings

Rank	Model	Organization	Parameters	License	MMLU-Pro	GPQA Diamond	SWE-Bench Verified	Chatbot Arena Elo
1	DeepSeek V3.2-Speciale	DeepSeek	685B (MoE)	MIT	85.9%	85.3%	77.8%	1361
2	Qwen 3.5	Alibaba	405B (MoE)	Apache 2.0	84.6%	82.1%	62.5%	1342
3	Llama 4 Maverick	Meta	402B (MoE)	Llama 4 License	83.2%	78.5%	55.8%	1320
4	GLM-5	Zhipu AI	320B (MoE)	Apache 2.0	81.5%	76.8%	77.8%	1298
5	Mistral 3	Mistral AI	240B (MoE)	Apache 2.0	82.8%	79.3%	54.1%	1315
6	DeepSeek V3.2	DeepSeek	685B (MoE)	MIT	84.1%	83.8%	72.4%	1348
7	Qwen 3 235B	Alibaba	235B (MoE)	Apache 2.0	81.2%	78.4%	55.2%	1305
8	Llama 4 Scout	Meta	109B (MoE)	Llama 4 License	78.5%	72.1%	42.3%	1278
9	Mistral 3 Small	Mistral AI	24B	Apache 2.0	74.8%	65.3%	38.1%	1245
10	Qwen 3 32B	Alibaba	32B	Apache 2.0	73.5%	64.8%	36.5%	1238

The Leaders

DeepSeek V3.2-Speciale: The Open-Source King

DeepSeek V3.2-Speciale is, by a comfortable margin, the most capable open-weight model available. Its performance on MMLU-Pro (85.9%) and GPQA Diamond (85.3%) puts it within striking distance of GPT-5.2 Pro and Claude Opus 4.6, and its 77.8% on SWE-Bench Verified actually leads the entire field, including closed models. The MIT license means you can use it for anything, commercial or otherwise, without restriction.

The catch is size. At 685 billion parameters in a mixture-of-experts architecture, running DeepSeek V3.2-Speciale requires serious hardware. You will need multiple high-end GPUs for inference, though the MoE architecture means that only a fraction of parameters are active for any given token, keeping actual compute costs manageable.

Qwen 3.5: The Most Permissive Frontier Model

Alibaba's Qwen 3.5 earns the second spot with a strong all-around performance profile. Its Apache 2.0 license is as permissive as it gets, and at 405 billion parameters (MoE), it is more practical to deploy than DeepSeek V3.2-Speciale while still delivering competitive performance. Qwen 3.5 is particularly strong in multilingual tasks, outperforming most competitors in Chinese, Japanese, Korean, and Arabic.

Llama 4 Maverick: Meta's Best

Llama 4 Maverick represents a significant step up from the Llama 3 generation. At 402 billion parameters with a MoE architecture, it delivers strong general-purpose performance. The Llama 4 License is more permissive than its predecessor but still includes some restrictions for very large commercial deployments (above 700 million monthly active users). For the vast majority of organizations, this is a non-issue.

GLM-5: The Coding Specialist

GLM-5 from Zhipu AI deserves special mention for its extraordinary coding performance. Its 77.8% on SWE-Bench Verified ties with DeepSeek V3.2-Speciale for the top spot on that benchmark across all models, open or closed. If your primary use case is code generation and software engineering, GLM-5 under its Apache 2.0 license is a compelling choice.

Mistral 3: European Excellence

Mistral AI continues to punch above its weight. Mistral 3 delivers performance comparable to models with nearly twice its parameter count, reflecting strong training data curation and architectural decisions. As the leading European AI lab, Mistral also offers advantages for organizations with EU data residency requirements.

Understanding Open-Source Licenses

Not all "open" models are equally open. Here is what the licenses in this leaderboard actually mean:

License	Commercial Use	Modification	Redistribution	Notable Restrictions
MIT	Yes	Yes	Yes	None
Apache 2.0	Yes	Yes	Yes	Patent grant included
Llama 4 License	Yes	Yes	Yes	Restrictions above 700M MAU

MIT (used by DeepSeek) is the most permissive. You can do anything with the model weights. Apache 2.0 (used by Qwen, GLM, Mistral) is similarly permissive but includes an explicit patent grant, which some legal teams prefer. Llama 4 License is permissive for most use cases but includes a usage threshold that large social media platforms would need to negotiate separately.

The Self-Hosting Economics

One of the strongest arguments for open-weight models is cost. Running DeepSeek V3.2-Speciale on your own infrastructure costs roughly $0.028 per million input tokens when amortized over reasonable utilization. The equivalent API call to a frontier proprietary model costs $2-15 per million input tokens. That is a 70x to 500x cost difference.

Of course, self-hosting requires upfront investment in GPU hardware, engineering expertise, and operational overhead. But for organizations processing millions of tokens daily, the payback period is measured in weeks, not years.

Smaller Models Worth Watching

Not every deployment needs a 400B+ parameter model. Mistral 3 Small (24B) and Qwen 3 32B deliver impressive performance for their size class, running comfortably on a single high-end GPU. For latency-sensitive applications or edge deployment, these smaller models offer the best balance of capability and efficiency.

The open-source LLM ecosystem is advancing at a pace that consistently surprises even optimistic observers. Models that would have been state-of-the-art twelve months ago are now freely downloadable. The question is no longer whether open-source can compete with proprietary models, but how long before the gap closes entirely.