The open-source AI revolution is no longer a promise. It is a reality. In April 2026, open-weight models routinely match or exceed the performance of proprietary models from twelve months ago, and in specialized benchmarks they compete with the very best closed models available today. For developers, researchers, and organizations that need control over their AI infrastructure, the options have never been better.

This leaderboard ranks open-weight and open-source models exclusively, covering models whose weights are publicly available for download, fine-tuning, and self-hosting. It is refreshed on a monthly cadence - the current snapshot reflects releases through 22 April 2026.

Open-Source LLM Rankings

Rank	Model	Organization	Parameters	License	MMLU-Pro	GPQA Diamond	SWE-Bench Verified	Chatbot Arena Elo
1	DeepSeek V4	DeepSeek	~1T (dense)	MIT	88.2%	87.6%	83.7%	1394
2	GLM-5.1	Zhipu AI	355B (MoE)	Apache 2.0	83.1%	80.2%	79.5%	1336
3	DeepSeek V3.2-Speciale	DeepSeek	685B (MoE)	MIT	85.9%	85.3%	77.8%	1361
4	Qwen 3.6-35B-A3B	Alibaba	35B / 3B active (MoE)	Apache 2.0	85.2%	86.0%	73.4%	1339
5	Qwen 3.5	Alibaba	405B (MoE)	Apache 2.0	84.6%	82.1%	62.5%	1342
6	Mistral 3	Mistral AI	240B (MoE)	Apache 2.0	82.8%	79.3%	54.1%	1315
7	Llama 4 Maverick	Meta	402B (MoE)	Llama 4 License	83.2%	78.5%	55.8%	1320
8	Qwen 3 235B	Alibaba	235B (MoE)	Apache 2.0	81.2%	78.4%	55.2%	1305
9	Gemma 4 31B	Google	31B (dense)	Gemma License	79.1%	73.8%	49.3%	1282
10	Llama 4 Scout	Meta	109B (MoE)	Llama 4 License	78.5%	72.1%	42.3%	1278

The Leaders

DeepSeek V4: The New Open-Source King

DeepSeek V4 took the top spot from V3.2-Speciale in March 2026 and has held it through the April refresh. The ~1T-parameter dense model sweeps every benchmark where it has been independently evaluated, and its 83.7 on SWE-Bench Verified leads the open-source coding field by a clear margin. The MIT license means you can use it for anything, commercial or otherwise, without restriction.

The catch is deployment. V4 is a dense model, not MoE - every parameter activates on every token. Self-hosting realistically requires multi-node tensor parallel across a cluster of H100s or equivalents, which puts it outside the reach of anyone who isn't operating production infrastructure already. For most engineering teams, the practical alternative is one of the smaller MoE entries below.

GLM-5.1: The Coding Specialist, Refreshed

GLM-5.1 from Zhipu AI is a pointed bump over February's GLM-5. Its 79.5 on SWE-Bench Verified keeps it inside the top tier of open-source coding models, and the Apache 2.0 license - combined with a parameter count that fits on a single high-end inference node - makes it the coding-first model that actually lands in production stacks. For organizations whose primary use case is code generation and agentic software engineering, GLM-5.1 is frequently the best value in this table.

DeepSeek V3.2-Speciale: Still Elite, Now Cheaper

The V3.2-Speciale release that led our February 2026 leaderboard hasn't gotten worse - the models above it have simply raised the ceiling. It remains an excellent choice for teams that already built infrastructure around its MoE architecture, and with V4's release, V3.2-Speciale is now available at a noticeable discount from hosted-inference providers.

Qwen 3.6-35B-A3B: The Small-Footprint Winner

The release that reset expectations in this size class. At 35 billion total parameters routed through a 256-expert MoE with 3 billion active per token, Qwen 3.6-35B-A3B delivers 73.4 on SWE-Bench Verified - a score that would have required a 400B+ dense model twelve months ago. The 4-bit quantised checkpoint fits on a single RTX 4090 with working room for context. It is the clearest example yet that architectural efficiency, not raw parameter count, is where the open-source frontier is now moving.

The 262,144-token native context (extensible to 1M via YaRN) also makes it the practical choice for repo-scale agentic coding workflows.

Qwen 3.5: The Multilingual Workhorse

Alibaba's 405B MoE earlier in the Qwen 3 series continues to deliver strong all-around performance, with its Apache 2.0 license as permissive as it gets. Qwen 3.5 is especially strong in multilingual tasks, outperforming most competitors in Chinese, Japanese, Korean, and Arabic. For English-only coding workloads, Qwen 3.6-35B-A3B is now the better pick in the Alibaba lineup - but for multilingual production use, 3.5 remains the reference.

Llama 4 Maverick: Meta's Flagship Holds Steady

Llama 4 Maverick's benchmark line has drifted in exactly the way you'd expect from a late-2025 release: no regressions, no leapfrogs. The 402B MoE is still a capable general-purpose model, and the Llama 4 License (with its 700M-MAU restriction) is permissive enough to be a non-issue for most organizations. For teams already standardised on the Llama tooling ecosystem, Maverick remains a solid choice. For new greenfield deployments picking an open-source stack today, the Chinese labs (DeepSeek, Alibaba, Zhipu) are shipping faster and reaching higher on the benchmarks.

Gemma 4 31B: Google Joins the Open-Weights Table

Google's Gemma 4 31B is a new entry to this leaderboard. A dense 31B model with the Gemma License (permissive, but with a use-policy carve-out rather than a pure Apache/MIT grant), it is designed for the single-GPU deployment target. Performance on general benchmarks sits a tier below the MoE heavyweights, but the dense architecture gives it more predictable inference behaviour and makes it easier to fine-tune on modest hardware.

Understanding Open-Source Licenses

Not all "open" models are equally open. Here is what the licenses in this leaderboard actually mean:

License	Commercial Use	Modification	Redistribution	Notable Restrictions
MIT	Yes	Yes	Yes	None
Apache 2.0	Yes	Yes	Yes	Patent grant included
Llama 4 License	Yes	Yes	Yes	Restrictions above 700M MAU
Gemma License	Yes	Yes	Yes	Prohibited-use policy applies

MIT (used by DeepSeek) is the most permissive. You can do anything with the model weights. Apache 2.0 (used by Qwen, GLM, Mistral) is similarly permissive but includes an explicit patent grant, which some legal teams prefer. Llama 4 License is permissive for most use cases but includes a usage threshold that large social media platforms would need to negotiate separately. Gemma License is permissive for commercial use but ties compliance to Google's prohibited-use policy, which is updated unilaterally and therefore adds a small ongoing review cost.

The Self-Hosting Economics

One of the strongest arguments for open-weight models is cost. Running DeepSeek V3.2-Speciale on your own infrastructure costs roughly $0.028 per million input tokens when amortized over reasonable use. The equivalent API call to a frontier proprietary model costs $2 to $15 per million input tokens. That's a 70x to 500x cost difference.

The calculus shifted in April 2026 with the arrival of Qwen 3.6-35B-A3B. The 3B-active MoE architecture runs at interactive latency on a single consumer GPU, and the Apache 2.0 license removes all commercial-use concerns. For organizations processing millions of tokens daily on coding or retrieval workloads, this is the first open-weight model where the self-hosting payback period is measured in days rather than months.

Self-hosting still requires upfront investment in GPU hardware, engineering expertise, and operational overhead. But the floor has moved - you no longer need a multi-node cluster to run a frontier-grade open model on your own stack.

Smaller Models Worth Watching

Not every deployment needs a 400B+ parameter model. Two models in this tier deserve specific callouts:

Qwen 3.6-35B-A3B (ranked #4 above) is the first open-weight model to deliver 70+ SWE-Bench Verified scores on a single-GPU deployment target. For agentic coding on consumer hardware, this is the new baseline.
Gemma 4 31B (ranked #9) is a dense model that fine-tunes cleanly on a single H100 and produces predictable inference latency - important for applications where p99 tail latency matters more than peak throughput.

Mistral 3 Small (24B) and Qwen 3 32B from the February snapshot remain capable for latency-sensitive applications or edge deployment, but the April releases above have raised the baseline meaningfully.

What Changed Since February 2026

Three shifts matter for readers returning to this leaderboard:

DeepSeek V4 displaced V3.2-Speciale at the top of the table - the first time in six months the open-source leader has changed hands
Qwen 3.6-35B-A3B introduced a new shape of frontier model - sparse enough to run on consumer hardware, capable enough to compete with mid-tier proprietary offerings on coding
GLM-5.1 surpassed DeepSeek V3.2-Speciale on coding benchmarks while sitting in a more deployable parameter class

The open-source LLM ecosystem is advancing at a pace that consistently surprises even optimistic observers. Models that would have been state-of-the-art twelve months ago are now freely downloadable. The question is no longer whether open-source can compete with proprietary models, but which axis of the frontier each new release targets - and which deployment tier finally opens up as a result.

Compare against our overall LLM rankings to see how these open-source leaders sit against the frontier of proprietary models.