LLM API Pricing Comparison - April 2026

TL;DR

Cheapest input: Mistral Nemo at $0.02/MTok - unchanged
Claude Opus 4.7 launched April 16 at $5/$25 (same sticker price as 4.6), but a new tokenizer adds up to 35% more tokens for the same text - an effective price increase
Claude Haiku 3 retired April 19; Google locked Pro models behind a paywall April 1
DeepSeek V4 still isn't on the API - official docs still route to V3.2

The Bottom Line

Mistral Nemo holds the floor at $0.02 input per million tokens. For a step up in quality, the new Mistral Small 3.2 at $0.07/$0.20 is now the cheapest sub-$0.10 commercial model with solid benchmark scores. Our March correction showed Mistral Small 3.1 at $0.03/$0.11 - multiple sources now consistently list that model at $0.10/$0.30, suggesting Mistral quietly normalized pricing between March and April.

DeepSeek V3.2 at $0.28/$0.42 stays the best value pick. DeepSeek V4 was expected on the API weeks ago and still isn't there. The official DeepSeek docs explicitly confirm the API routes to V3.2 with a 128K context limit. V4's 1M context at projected $0.30/$0.50 keeps getting delayed.

The headline this period is Claude Opus 4.7, which Anthropic launched on April 16. The price tag - $5/$25 per MTok - is identical to Opus 4.6. The catch: Opus 4.7 ships with a new tokenizer that uses up to 35% more tokens for the same text. That's not a price cut. At 35% token inflation, a workload that cost $5.00 on Opus 4.6 could run $6.75 on Opus 4.7 for identical input.

Full Pricing Table

All prices in USD per million tokens (MTok). Verified against official documentation April 20, 2026. Sorted by input price, cheapest first.

Model	Provider	Input (/1M)	Output (/1M)	Context	Notes
Mistral Nemo	Mistral	$0.02	$0.04	128K	Cheapest commercial option
Llama 3.1 8B	Groq	$0.05	$0.08	128K	840+ tok/s on LPU
GPT-5 Nano	OpenAI	$0.05	$0.40	400K	Largest context under $0.10
Mistral Small 3.2	Mistral	$0.07	$0.20	128K	New April 2026; cheapest Mistral above Nemo
GPT OSS 20B	Groq	$0.075	$0.30	128K	Open-weight; LPU-accelerated
GPT-4.1 Nano	OpenAI	$0.10	$0.40	1M	Classification, routing tasks
Gemini 2.5 Flash-Lite	Google	$0.10	$0.40	1M	Free tier available
Llama 4 Scout	Groq	$0.11	$0.34	128K	New; 17B active / 16E MoE
GPT OSS 120B	Groq	$0.15	$0.60	128K	Replaced Llama 4 Maverick
GPT-4o mini	OpenAI	$0.15	$0.60	128K	Legacy; still widely deployed
Mistral Small 3.1	Mistral	$0.10	$0.30	128K	Price revised up from March
Grok 4.1 Fast	xAI	$0.20	$0.50	2M	Largest context window in this tier
GPT-5 Mini	OpenAI	$0.25	$2.00	128K	Budget GPT-5 generation
Gemini 3.1 Flash-Lite	Google	$0.25	$1.50	1M	Preview; free tier retained
DeepSeek V3.2	DeepSeek	$0.28	$0.42	128K	Cache hit: $0.028; best value pick
Qwen3 32B	Groq	$0.29	$0.59	131K	Open-weight; multilingual
Gemini 2.5 Flash	Google	$0.30	$2.50	1M	Free tier; solid mid-range
GPT-4.1 mini	OpenAI	$0.40	$1.60	1M	Mid-range, 1M context
Mistral Medium 3	Mistral	$0.40	$2.00	128K	Balanced reasoning
Gemini 3 Flash Preview	Google	$0.50	$3.00	1M	New; preview pricing
Llama 3.3 70B	Groq	$0.59	$0.79	128K	70B dense; strong instruction following
Kimi K2.6	Moonshot AI	$0.60	$2.50	256K	Down from $1.00/$3.00 for K2-0905
Claude Haiku 3.5	Anthropic	$0.80	$4.00	200K
Claude Haiku 4.5	Anthropic	$1.00	$5.00	200K	Cheapest active Anthropic model
o4-mini	OpenAI	$1.10	$4.40	200K	Cheapest dedicated reasoning model
Gemini 2.5 Pro	Google	$1.25	$10.00	1M	Free tier ended April 1
GPT-5	OpenAI	$1.25	$10.00	128K
GPT-5.2	OpenAI	$1.75	$14.00	128K
o3	OpenAI	$2.00	$8.00	200K
GPT-4.1	OpenAI	$2.00	$8.00	1M
Grok 4.20	xAI	$2.00	$6.00	2M	Multi-agent beta
Gemini 3.1 Pro	Google	$2.00	$12.00	1M	$4.00/$18.00 above 200K tokens
GPT-5.4	OpenAI	$2.50	$15.00	1.1M	OpenAI flagship
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	1M
Grok 4	xAI	$3.00	$15.00	256K	xAI flagship
Claude Opus 4.6	Anthropic	$5.00	$25.00	1M	Fast Mode available at $30/$150
Claude Opus 4.7	Anthropic	$5.00	$25.00	1M	New; tokenizer inflates effective cost
GPT-5.4 Pro	OpenAI	$30.00	$180.00	1.1M	Ultra-premium tier

For benchmark rankings behind these numbers, see the cost-efficiency leaderboard.

A calculator and pen on a notebook, representing the cost math behind API pricing decisions Working out real API costs requires more than reading the headline price - caching, batching, context surcharges, and tokenizer differences all shift the final number. Source: unsplash.com

April 2026 Changes

Four things moved since our March 30 table:

Claude Opus 4.7 (April 16) - Anthropic's new flagship costs $5/$25 on the sticker, identical to Opus 4.6. Don't let that fool you. Anthropic's own documentation notes the new tokenizer "may use up to 35% more tokens for the same fixed text." Run that math: a 1M-token request on Opus 4.6 becomes up to 1.35M tokens on Opus 4.7 at the same per-token rate. That's $6.75 input instead of $5.00. Opus 4.7 may be worth it on quality grounds, but budget accordingly.

Claude Haiku 3 retired (April 19) - The cheapest Anthropic model ever at $0.25/$1.25 is gone as of today. Haiku 4.5 at $1.00/$5.00 is four times the price. If you were routing lightweight tasks through Haiku 3, you need to reroute now.

Google tightened the free tier (April 1) - Gemini 2.5 Pro is no longer free. Google restricted Pro-tier models behind a payment requirement starting April 1, with mandatory spending caps across all billing tiers. Flash and Flash-Lite remain free, but Pro access now requires a paid account.

Mistral Small 3.2 (April 2026) - A new model at $0.07/$0.20 slots in between Nemo and Small 3.1. Multiple independent sources consistently list it at that price. We also noticed that Mistral Small 3.1, which we reported as corrected to $0.03/$0.11 in March, is now consistently listed at $0.10/$0.30 across multiple sources. The cheaper price may have been a short-lived promotional rate. We'll note this discrepancy and track it.

Hidden Costs

The Opus 4.7 Tokenizer Problem

This deserves its own section. A 35% token inflation means every comparison table that shows $5/MTok for Opus 4.7 is understating the real cost. If you're migrating an existing Opus 4.6 workload, the safe assumption is a 15-35% cost increase even at identical task descriptions. Benchmark against your specific prompts before committing production traffic.

Rate Limits and Spend Tiers

OpenAI gates throughput by spend tier. New accounts (Tier 1) cap at 500 RPM on GPT-5.4; Tier 4 gets 10,000 RPM. Anthropic uses a four-tier system with similar structure. DeepSeek queues requests during peak hours - no published tiers, just latency variability.

Batch API Discounts

OpenAI, Anthropic, Google, and xAI all offer 50% off for async batch processing with 24-hour SLAs. Groq offers 50% off batch jobs with a 24-hour to 7-day processing window. DeepSeek's automatic prompt caching delivers comparable savings without a formal batch endpoint. If your workload isn't latency-sensitive, batch mode halves your bill right away.

Prompt Caching

Cache hit pricing across the major providers:

Anthropic: 10% of standard input rate (cache hits cost $0.50/MTok for Opus 4.7)
OpenAI: 10% of standard input rate (automatic, no setup required)
Google: 10% of standard input rate, plus storage fees ($1.00/1M tokens/hour for Flash)
xAI: 10% of standard input rate for Grok 4 and Grok 4.1 Fast
DeepSeek: Automatic caching; cache hits at $0.028/MTok on V3.2 (90% off)
Groq: 50% off cached input tokens (less aggressive but no storage fee)

For workloads with shared system prompts or repeated document context, caching can cut effective input costs by 80-90%.

Context Window Surcharges

Anthropic's Opus 4.7 and Sonnet 4.6 include the full 1M context at standard rates - no tiered pricing regardless of request size. Gemini 3.1 Pro doubles input pricing above 200K tokens ($2 becomes $4, output goes $12 to $18). Gemini 2.5 Pro similarly steps up above 200K. OpenAI's GPT-5.4 applies 2x input and 1.5x output pricing above 272K tokens.

xAI's Grok 4.1 Fast at 2M context is a genuine outlier - $0.20 input with no surcharge across the full window.

Tool Use Overhead

Anthropic adds 313-346 tokens per request when tools are enabled. OpenAI's function calling consumes tokens for schema definitions. These add up in agentic pipelines. Anthropic's web search costs $10 per 1,000 searches on top of token costs.

Electronic shelf labels showing real-time price tags in a retail setting LLM API prices shift regularly - the Opus 4.7 tokenizer change this week shows how a "same price" launch can quietly increase real costs. Source: commons.wikimedia.org

Free Tier Comparison

Provider	Free Credits	Models Available	Rate Limits	Notes
Google (Gemini)	Unlimited free tier	Flash-Lite, Flash, 3 Flash Preview	5-15 RPM, 100-1,000 RPD	Pro models now paid-only (April 1)
Groq	Free tier available	All hosted models	Varies by model	No card required
DeepSeek	5M tokens on signup	All models	Standard limits	Expires unspecified
xAI	$25 signup credits	All Grok models	Standard limits	Not recurring
OpenAI	~$5 trial credits	GPT-4o mini, limited	3 RPM (free tier)	3-month expiry
Anthropic	~$5 trial credits	All models	Tier 1 limits	Few months expiry
Mistral	Free tier (some)	Nemo	Limited RPM	No card required

Google's free tier tightened in April but remains viable for prototyping. Flash models (including the new Gemini 3 Flash Preview) are still zero-cost with manageable rate limits. Groq keeps offering hardware-accelerated inference on Llama, Qwen, and the GPT OSS series without any upfront payment, and the new Llama 4 Scout on Groq is available on the free tier.

Price History

Apr 2026 - Claude Opus 4.7 launches at $5/$25 per MTok (same as Opus 4.6), but a new tokenizer uses up to 35% more tokens for equivalent inputs. Effective cost is higher despite the unchanged headline rate.
Apr 2026 - Claude Haiku 3 retired April 19. The cheapest Anthropic model in the table drops out. Haiku 4.5 at $1.00/$5.00 is the new Anthropic budget entry point.
Apr 2026 - Google restricted Gemini Pro models (2.5 Pro and up) behind a payment requirement starting April 1. Flash and Flash-Lite retain free tiers.
Apr 2026 - Mistral Small 3.2 launches at $0.07/$0.20, undercutting Small 3.1. Mistral Nemo still cheaper at $0.02 but Small 3.2 now offers the best quality-per-dollar in the Mistral lineup for sub-$0.10 work.
Apr 2026 - Kimi K2.6 replaces K2-0905 at $0.60/$2.50 per MTok, down from $1.00/$3.00. Moonshot AI continues cutting prices as the Kimi K2.5 family matures.
Apr 2026 - Llama 4 Scout (17B active / 16E MoE) added to Groq's lineup at $0.11/$0.34. It's the only Llama 4 architecture currently available on Groq after Maverick's deprecation in February.
Mar 2026 - DeepSeek V4 expected on API; as of April 20 the official API still routes to V3.2 with a 128K context limit. No official ETA from DeepSeek.
Mar 2026 - Anthropic added Fast Mode for Claude Opus 4.6 at $30/$150 per MTok. Remains exclusive to Opus 4.6 - not available on Opus 4.7 at launch.
Feb 2026 - Claude Opus 4.6 launched at $5/$25, a 67% reduction from Opus 4.1's $15/$75.
Feb 2026 - Groq deprecated Llama 4 Maverick, replaced by GPT OSS 120B at $0.15/$0.60.

The "same price" model launch is becoming a pricing tactic. Opus 4.7 is the clearest example yet - identical sticker, new tokenizer, quietly higher effective cost.

FAQ

Which LLM API is cheapest per million tokens?

Mistral Nemo at $0.02/$0.04 per MTok is the cheapest commercial option. Mistral Small 3.2 at $0.07/$0.20 is the next step up and substantially more capable. DeepSeek V3.2 cache hits at $0.028 compete directly for workloads with repeated context.

What's the best value LLM API for production?

DeepSeek V3.2 at $0.28/$0.42 per MTok. It matches frontier models on most benchmarks at roughly one-fifth the price of GPT-5.4. Automatic caching drops effective input to $0.028/MTok. DeepSeek V4 is expected to improve on this when it eventually reaches the API.

Did Claude Opus 4.7 change API pricing?

The nominal price is unchanged at $5/$25. But the new tokenizer uses up to 35% more tokens for the same text. A workload that cost $5.00 on Opus 4.6 could cost up to $6.75 on Opus 4.7.

Are there free LLM APIs for development?

Google's Gemini API offers the most generous free tier - Flash and Flash-Lite models remain free after the April 1 tightening. Groq provides free inference on Llama, Qwen, and GPT OSS models at LPU-accelerated speeds.

Do batch APIs actually save 50%?

Yes. OpenAI, Anthropic, Google, xAI, and Groq all offer 50% off for async batch processing. Combined with prompt caching, total savings can reach 90% on repetitive workloads with a 24-hour processing window.

What happened to Google's free tier?

Starting April 1, 2026, Google restricted Gemini Pro models (2.5 Pro and above) to paid accounts only. Gemini Flash, Flash-Lite, and the new Gemini 3 Flash Preview remain available on the free tier with reduced daily quotas.

Sources: