LLM API Pricing Comparison - March 2026

TL;DR

DeepSeek V3.2 costs $0.28/$0.42 per million tokens (input/output) and handles most production workloads - the clear value pick
Cheapest usable option: Mistral Nemo at $0.02/$0.04 per million tokens, though limited to simpler tasks
GPT-5.4 ($2.50/$15.00) and Claude Opus 4.6 ($5.00/$25.00) remain expensive but lead on reasoning quality
Every major provider now offers batch API discounts of 50%, and prompt caching can cut input costs by 90%

The Bottom Line

If you need the cheapest input price and your task is straightforward (classification, extraction, simple Q&A), Mistral Nemo at $0.02 per million input tokens is hard to beat. For anything requiring real reasoning ability, DeepSeek V3.2 at $0.28/$0.42 delivers quality that rivals models 10x its price. Production teams building agents or handling complex analysis will still reach for GPT-5.4, Claude Opus 4.6, or Gemini 3.1 Pro - but the gap between premium and budget models is narrowing fast.

Full Pricing Table

All prices are per million tokens (MTok) in USD. Sorted by input price, cheapest first. Prices verified against official documentation on March 11, 2026.

Model	Provider	Input (/1M)	Output (/1M)	Context	Notes
Mistral Nemo	Mistral	$0.02	$0.04	128K	Lightweight, fast
Ministral 3B	Mistral	$0.04	$0.04	128K	Edge/mobile use cases
Llama 3.1 8B	Groq	$0.05	$0.08	128K	840 tok/s on Groq LPU
Mistral Small 3.1	Mistral	$0.10	$0.30	128K	Good for structured tasks
Llama 4 Scout	Groq	$0.11	$0.34	128K	MoE, 17Bx16E
GPT-4o mini	OpenAI	$0.15	$0.60	128K	Legacy, still popular
Llama 4 Maverick	Groq	$0.20	$0.60	128K	MoE, 17Bx128E
Grok 4.1 Fast	xAI	$0.20	$0.50	2M	Largest context window
DeepSeek V3.2	DeepSeek	$0.28	$0.42	128K	Cache hit: $0.028
Qwen3 32B	Groq	$0.29	$0.59	131K	Open-weight, strong multilingual
Gemini 2.5 Flash-Lite	Google	$0.10	$0.40	1M	Budget option, long context
Gemini 2.5 Flash	Google	$0.30	$2.50	1M	Solid mid-tier
GPT-4.1 mini	OpenAI	$0.40	$1.60	128K	Newer GPT-4 replacement
Mistral Large 3	Mistral	$0.50	$1.50	128K	Strong multilingual
Gemini 3 Flash	Google	$0.50	$3.00	1M	Latest Google Flash
Llama 3.3 70B	Groq	$0.59	$0.79	128K	Open-weight workhorse
Kimi K2	Groq	$1.00	$3.00	256K	MoonShot AI flagship
o4-mini	OpenAI	$1.10	$4.40	128K	Reasoning model, budget
Gemini 2.5 Pro	Google	$1.25	$10.00	1M	Pro tier, long context
GPT-5	OpenAI	$1.25	$10.00	128K	Standard GPT-5
GPT-5.2	OpenAI	$1.75	$14.00	128K	Improved reasoning
GPT-5.3 Chat	OpenAI	$1.75	$14.00	128K	Latest chat-optimized
o3	OpenAI	$2.00	$8.00	200K	Heavy reasoning
GPT-4.1	OpenAI	$2.00	$8.00	1M	Strong code, long context
Grok 4.20	xAI	$2.00	$6.00	256K	Flagship Grok
Gemini 3.1 Pro	Google	$2.00	$12.00	1M	Google flagship
GPT-5.4	OpenAI	$2.50	$15.00	272K	OpenAI flagship
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	200K	Best mid-tier for code
Claude Opus 4.6	Anthropic	$5.00	$25.00	200K	Top reasoning, agentic
o1	OpenAI	$15.00	$60.00	128K	Legacy reasoning
GPT-5.4 Pro	OpenAI	$30.00	$180.00	272K	Ultra-premium

For more on how these models compare on benchmarks, see our overall LLM rankings and the cost-efficiency leaderboard.

Hidden Costs Most Comparisons Miss

Rate Limits and Throttling

Raw token price means nothing if your requests get throttled. OpenAI gates access by spend tier - Tier 1 (new accounts) caps at 500 RPM for GPT-5.4, while Tier 4 gets 10,000 RPM. Anthropic uses a similar system with four tiers. DeepSeek doesn't publish formal tiers but has been known to queue requests during peak hours, adding latency that inflates effective cost for real-time applications.

Batch API Discounts

Every major provider now offers 50% off for batch (async) processing. OpenAI, Anthropic, Google, and xAI all match at exactly 50%. If your workload tolerates 24-hour turnaround, this is free money. DeepSeek's automatic caching achieves similar savings without requiring a separate API endpoint.

Context Window Surcharges

Anthropic charges 2x for input and 1.5x for output when Claude Opus 4.6 or Sonnet 4.6 requests exceed 200K tokens. Google doubles input pricing for Pro models above 200K tokens. OpenAI's GPT-5.4 hits 2x input and 1.5x output beyond 272K tokens. Budget accordingly if you're processing long documents.

Prompt Caching

Cache hits cost roughly 10% of base input price across OpenAI, Anthropic, and Google. DeepSeek's automatic caching drops input costs to $0.028 per MTok - effectively 90% off. xAI offers cached input at $0.05 for Grok 4.1 Fast (75% off). For repetitive workloads with shared system prompts, caching alone can cut your bill by 80-90%.

Tool Use Overhead

Anthropic adds 313-346 tokens per request when tools are enabled. OpenAI's function calling consumes tokens for the schema definition. These add up in agentic workflows making hundreds of tool calls per session. Anthropic's web search costs $10 per 1,000 searches on top of token costs.

Free Tier Comparison

Provider	Free Credits	Models Available	Rate Limits	Expiration
Google (Gemini)	Unlimited free tier	Flash-Lite, Flash, 2.5 Pro	5-15 RPM, 100-1,000 RPD	None
Groq	Free tier available	All hosted models	Varies by model	None
DeepSeek	5M tokens on signup	All models	Standard limits	No expiration stated
xAI	$25 signup credits	All Grok models	Standard limits	Not specified
OpenAI	~$5 trial credits	GPT-4o mini, GPT-3.5	3 RPM (free tier)	3 months
Anthropic	~$5 trial credits	All models	Tier 1 limits	Few months
Mistral	Free tier (some models)	Nemo, Small	Limited RPM	None

Google's free tier stands out by a wide margin. You get access to Gemini 2.5 Pro for zero cost, with rate limits that are workable for prototyping. Groq's free tier is similarly generous, letting you test Llama 4 models at hardware-accelerated speeds. The rest offer small one-time credit allocations that burn through quickly during development.

Price History

Feb 2026 - Anthropic released Claude Opus 4.6 at $5/$25, a 67% cut from Opus 4.1's $15/$75 pricing.
Feb 2026 - OpenAI launched GPT-5.4 at $2.50/$15.00 and GPT-5.4 Pro at $30/$180.
Jan 2026 - xAI released Grok 4.20 at $2.00/$6.00 with multi-agent capabilities.
Dec 2025 - Google quietly reduced Gemini free tier rate limits by 50-80% across most models.
Sep 2025 - DeepSeek unified V3.2 pricing at $0.28/$0.42, roughly halving V3.1 rates ($0.60/$1.70).
Jul 2025 - OpenAI launched o4-mini at $1.10/$4.40, making reasoning models affordable for the first time.

The trend is clear: flagship model prices are dropping 40-60% per generation while capabilities improve. DeepSeek and the open-weight ecosystem are putting sustained downward pressure on the entire market. If you locked in pricing assumptions six months ago, they're likely already stale.

FAQ

Which LLM API is cheapest per million tokens?

Mistral Nemo at $0.02/$0.04 per MTok is the cheapest commercial API. For cache-heavy workloads, DeepSeek V3.2's cache hit price of $0.028 input is competitive even with smaller models.

What's the best value LLM API for production?

DeepSeek V3.2 at $0.28/$0.42 per MTok. It scores within 5% of GPT-5 on most benchmarks at roughly one-fifth the price, and its automatic caching drops effective costs further.

Are there free LLM APIs for development?

Google's Gemini API offers the most generous free tier with no credit card required and access to 2.5 Pro. Groq also provides free access to Llama 4 and other open-weight models.

How much does it cost to process 1 million tokens?

Depends on the model. At the low end, $0.06 total (Mistral Nemo). Mid-range: $0.70 (DeepSeek V3.2). Premium: $17.50 (GPT-5.4) to $30.00 (Claude Opus 4.6). Most production apps process 10-50M tokens per day.

Do batch APIs really save 50%?

Yes. OpenAI, Anthropic, Google, and xAI all offer exactly 50% off for async batch processing with 24-hour SLAs. Combined with prompt caching, total savings can reach 90%+ on repetitive workloads.

How do reasoning model costs compare?

OpenAI's o4-mini ($1.10/$4.40) is the cheapest dedicated reasoning model. O3 costs $2.00/$8.00. The premium o1 at $15/$60 is hard to justify unless you need its specific capabilities. Claude Opus 4.6 offers strong reasoning at $5/$25 without a separate "reasoning" tier.

Sources: