Cost Efficiency Leaderboard: Best AI Performance Per Dollar

Two years ago, GPT-4 was the undisputed best model available, and it cost $30 per million input tokens and $60 per million output tokens. Today, you can get equivalent or better performance for less than $0.10 per million tokens. The collapse in AI pricing is one of the most dramatic cost curves in the history of technology, and understanding which models deliver the best performance per dollar is essential for anyone building AI-powered products.

This leaderboard ranks models not by raw capability but by the ratio of performance to cost. The best model is not the one that scores highest on benchmarks; it is the one that scores highest relative to what you pay.

Cost Efficiency Rankings

Rank	Model	Provider	Input / Output (per 1M tokens)	MMLU-Pro	Chatbot Arena Elo	Efficiency Score
1	DeepSeek V3.2	DeepSeek	$0.028 / $0.11	84.1%	1348	98.5
2	Qwen 3 32B (self-hosted)	Alibaba	~$0.02 / ~$0.02	73.5%	1238	95.2
3	Llama 4 Scout (self-hosted)	Meta	~$0.01 / ~$0.01	78.5%	1278	94.8
4	DeepSeek V3.2-Speciale	DeepSeek	$0.28 / $1.10	85.9%	1361	91.3
5	Gemini 2.5 Flash	Google DeepMind	$0.15 / $0.60	80.2%	1335	90.7
6	Mistral 3 Small	Mistral AI	$0.10 / $0.30	74.8%	1245	88.2
7	Qwen 3.5	Alibaba	$0.50 / $2.00	84.6%	1342	85.6
8	Gemini 3 Pro	Google DeepMind	$1.25 / $5.00	89.8%	1389	78.4
9	GPT-5.2	OpenAI	$2.50 / $10.00	86.3%	1380	62.1
10	Claude Opus 4.6	Anthropic	$15.00 / $75.00	88.2%	1398	35.8

Efficiency Score is a normalized composite of (benchmark performance / cost). Higher is better. Self-hosted costs assume amortized GPU costs at moderate utilization.

The Price-Performance Frontier

The most striking fact about AI pricing in 2026 is the sheer range. The cheapest model on our list (Llama 4 Scout self-hosted at roughly $0.01 per million tokens) costs approximately 1,500 times less than the most expensive (Claude Opus 4.6 at $15 per million input tokens). Yet the performance gap between them is perhaps 15-20% on most benchmarks. You pay a massive premium for that last increment of quality.

This creates a clear decision framework:

The "Good Enough" Tier ($0.01-$0.10 per million tokens): DeepSeek V3.2, Qwen 3 32B, Llama 4 Scout, and Mistral 3 Small all deliver performance that would have been considered frontier-class in early 2024. For the vast majority of applications, including chatbots, content generation, summarization, basic coding assistance, and data extraction, these models are more than sufficient. DeepSeek V3.2 at $0.028 per million input tokens is the standout, offering 84.1% on MMLU-Pro at a price that makes high-volume applications economically viable.

The "Premium Efficient" Tier ($0.15-$2.00 per million tokens): Gemini 2.5 Flash, Qwen 3.5, and DeepSeek V3.2-Speciale occupy the sweet spot for applications that need near-frontier performance without frontier pricing. Gemini 2.5 Flash is particularly notable: its 1335 Arena Elo at $0.15 per million input tokens makes it perhaps the best value in the entire market for conversational AI.

The "Frontier" Tier ($1.25-$15.00 per million tokens): Gemini 3 Pro, GPT-5.2, and Claude Opus 4.6 are for when you need the absolute best results and cost is secondary. Research applications, high-stakes decision support, complex coding tasks, and any scenario where a 5% improvement in accuracy has meaningful business value.

The Two-Year Price Collapse

To appreciate how dramatically pricing has shifted, consider this comparison:

Capability Level	Cost in Feb 2024	Cost in Feb 2026	Reduction
GPT-4 equivalent	$30.00 / $60.00 per 1M tokens	$0.028 / $0.11 per 1M tokens	~99.9%
GPT-4 Turbo equivalent	$10.00 / $30.00 per 1M tokens	$0.01 / $0.01 per 1M tokens	~99.9%
Frontier (best available)	$30.00 / $60.00 per 1M tokens	$15.00 / $75.00 per 1M tokens	~50%

The most dramatic savings come from matching the performance of older frontier models. Getting GPT-4 level performance now costs roughly one-thousandth of what it did two years ago. Even at the frontier, prices have roughly halved despite substantially better performance.

This price collapse is driven by three factors: architectural efficiency (mixture-of-experts models like DeepSeek V3.2 activate only a fraction of their parameters per token), hardware improvements (newer GPUs deliver more inference throughput per dollar), and competition (the entry of DeepSeek, Qwen, and others has forced all providers to cut margins).

Self-Hosting vs. API: When to Make the Switch

For high-volume applications, self-hosting open-weight models can be dramatically cheaper than API access. Here is a rough comparison:

Monthly Volume	Best API Option	API Cost/Month	Best Self-Hosted	Self-Hosted Cost/Month
10M tokens	DeepSeek V3.2	$0.28	Not worth it	Higher (overhead)
100M tokens	DeepSeek V3.2	$2.80	Not worth it	Higher (overhead)
1B tokens	DeepSeek V3.2	$28.00	Qwen 3 32B	~$20.00
10B tokens	DeepSeek V3.2	$280.00	Llama 4 Scout	~$100.00
100B tokens	DeepSeek V3.2	$2,800.00	Llama 4 Scout	~$800.00

The break-even point depends on your engineering capacity and operational requirements, but broadly, self-hosting starts making financial sense around 1-10 billion tokens per month. Below that volume, the convenience and reliability of API access usually outweighs the cost savings.

Recommendations

For startups and prototypes: Start with DeepSeek V3.2 or Gemini 2.5 Flash via API. The cost is negligible, and the performance is excellent.

For production applications at scale: Evaluate DeepSeek V3.2-Speciale or Qwen 3.5 via API, or self-host Llama 4 Scout or Qwen 3 32B.

For quality-critical applications: Use Gemini 3 Pro for the best price-to-performance at the frontier tier, or Claude Opus 4.6 and GPT-5.2 Pro when maximum quality justifies the premium.

The most expensive model is not always the best choice. In fact, for the vast majority of real-world applications, it usually is not.