LLM API Pricing Comparison - April 2026
Current LLM API prices verified April 2026: Mistral Nemo at $0.02/MTok cheapest, DeepSeek V3.2 best value, Claude Opus 4.7 launches with a hidden 35% tokenizer cost increase.

TL;DR
- Cheapest input: Mistral Nemo at $0.02/MTok - unchanged
- Claude Opus 4.7 launched April 16 at $5/$25 (same sticker price as 4.6), but a new tokenizer adds up to 35% more tokens for the same text - an effective price increase
- Claude Haiku 3 retired April 19; Google locked Pro models behind a paywall April 1
- DeepSeek V4 still isn't on the API - official docs still route to V3.2
The Bottom Line
Mistral Nemo holds the floor at $0.02 input per million tokens. For a step up in quality, the new Mistral Small 3.2 at $0.07/$0.20 is now the cheapest sub-$0.10 commercial model with solid benchmark scores. Our March correction showed Mistral Small 3.1 at $0.03/$0.11 - multiple sources now consistently list that model at $0.10/$0.30, suggesting Mistral quietly normalized pricing between March and April.
DeepSeek V3.2 at $0.28/$0.42 stays the best value pick. DeepSeek V4 was expected on the API weeks ago and still isn't there. The official DeepSeek docs explicitly confirm the API routes to V3.2 with a 128K context limit. V4's 1M context at projected $0.30/$0.50 keeps getting delayed.
The headline this period is Claude Opus 4.7, which Anthropic launched on April 16. The price tag - $5/$25 per MTok - is identical to Opus 4.6. The catch: Opus 4.7 ships with a new tokenizer that uses up to 35% more tokens for the same text. That's not a price cut. At 35% token inflation, a workload that cost $5.00 on Opus 4.6 could run $6.75 on Opus 4.7 for identical input.
Full Pricing Table
All prices in USD per million tokens (MTok). Verified against official documentation April 20, 2026. Sorted by input price, cheapest first.
| Model | Provider | Input (/1M) | Output (/1M) | Context | Notes |
|---|---|---|---|---|---|
| Mistral Nemo | Mistral | $0.02 | $0.04 | 128K | Cheapest commercial option |
| Llama 3.1 8B | Groq | $0.05 | $0.08 | 128K | 840+ tok/s on LPU |
| GPT-5 Nano | OpenAI | $0.05 | $0.40 | 400K | Largest context under $0.10 |
| Mistral Small 3.2 | Mistral | $0.07 | $0.20 | 128K | New April 2026; cheapest Mistral above Nemo |
| GPT OSS 20B | Groq | $0.075 | $0.30 | 128K | Open-weight; LPU-accelerated |
| GPT-4.1 Nano | OpenAI | $0.10 | $0.40 | 1M | Classification, routing tasks |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M | Free tier available | |
| Llama 4 Scout | Groq | $0.11 | $0.34 | 128K | New; 17B active / 16E MoE |
| GPT OSS 120B | Groq | $0.15 | $0.60 | 128K | Replaced Llama 4 Maverick |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | 128K | Legacy; still widely deployed |
| Mistral Small 3.1 | Mistral | $0.10 | $0.30 | 128K | Price revised up from March |
| Grok 4.1 Fast | xAI | $0.20 | $0.50 | 2M | Largest context window in this tier |
| GPT-5 Mini | OpenAI | $0.25 | $2.00 | 128K | Budget GPT-5 generation |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1M | Preview; free tier retained | |
| DeepSeek V3.2 | DeepSeek | $0.28 | $0.42 | 128K | Cache hit: $0.028; best value pick |
| Qwen3 32B | Groq | $0.29 | $0.59 | 131K | Open-weight; multilingual |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | Free tier; solid mid-range | |
| GPT-4.1 mini | OpenAI | $0.40 | $1.60 | 1M | Mid-range, 1M context |
| Mistral Medium 3 | Mistral | $0.40 | $2.00 | 128K | Balanced reasoning |
| Gemini 3 Flash Preview | $0.50 | $3.00 | 1M | New; preview pricing | |
| Llama 3.3 70B | Groq | $0.59 | $0.79 | 128K | 70B dense; strong instruction following |
| Kimi K2.6 | Moonshot AI | $0.60 | $2.50 | 256K | Down from $1.00/$3.00 for K2-0905 |
| Claude Haiku 3.5 | Anthropic | $0.80 | $4.00 | 200K | |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 200K | Cheapest active Anthropic model |
| o4-mini | OpenAI | $1.10 | $4.40 | 200K | Cheapest dedicated reasoning model |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | Free tier ended April 1 | |
| GPT-5 | OpenAI | $1.25 | $10.00 | 128K | |
| GPT-5.2 | OpenAI | $1.75 | $14.00 | 128K | |
| o3 | OpenAI | $2.00 | $8.00 | 200K | |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | 1M | |
| Grok 4.20 | xAI | $2.00 | $6.00 | 2M | Multi-agent beta |
| Gemini 3.1 Pro | $2.00 | $12.00 | 1M | $4.00/$18.00 above 200K tokens | |
| GPT-5.4 | OpenAI | $2.50 | $15.00 | 1.1M | OpenAI flagship |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 1M | |
| Grok 4 | xAI | $3.00 | $15.00 | 256K | xAI flagship |
| Claude Opus 4.6 | Anthropic | $5.00 | $25.00 | 1M | Fast Mode available at $30/$150 |
| Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | 1M | New; tokenizer inflates effective cost |
| GPT-5.4 Pro | OpenAI | $30.00 | $180.00 | 1.1M | Ultra-premium tier |
For benchmark rankings behind these numbers, see the cost-efficiency leaderboard.
Working out real API costs requires more than reading the headline price - caching, batching, context surcharges, and tokenizer differences all shift the final number.
Source: unsplash.com
April 2026 Changes
Four things moved since our March 30 table:
Claude Opus 4.7 (April 16) - Anthropic's new flagship costs $5/$25 on the sticker, identical to Opus 4.6. Don't let that fool you. Anthropic's own documentation notes the new tokenizer "may use up to 35% more tokens for the same fixed text." Run that math: a 1M-token request on Opus 4.6 becomes up to 1.35M tokens on Opus 4.7 at the same per-token rate. That's $6.75 input instead of $5.00. Opus 4.7 may be worth it on quality grounds, but budget accordingly.
Claude Haiku 3 retired (April 19) - The cheapest Anthropic model ever at $0.25/$1.25 is gone as of today. Haiku 4.5 at $1.00/$5.00 is four times the price. If you were routing lightweight tasks through Haiku 3, you need to reroute now.
Google tightened the free tier (April 1) - Gemini 2.5 Pro is no longer free. Google restricted Pro-tier models behind a payment requirement starting April 1, with mandatory spending caps across all billing tiers. Flash and Flash-Lite remain free, but Pro access now requires a paid account.
Mistral Small 3.2 (April 2026) - A new model at $0.07/$0.20 slots in between Nemo and Small 3.1. Multiple independent sources consistently list it at that price. We also noticed that Mistral Small 3.1, which we reported as corrected to $0.03/$0.11 in March, is now consistently listed at $0.10/$0.30 across multiple sources. The cheaper price may have been a short-lived promotional rate. We'll note this discrepancy and track it.
Hidden Costs
The Opus 4.7 Tokenizer Problem
This deserves its own section. A 35% token inflation means every comparison table that shows $5/MTok for Opus 4.7 is understating the real cost. If you're migrating an existing Opus 4.6 workload, the safe assumption is a 15-35% cost increase even at identical task descriptions. Benchmark against your specific prompts before committing production traffic.
Rate Limits and Spend Tiers
OpenAI gates throughput by spend tier. New accounts (Tier 1) cap at 500 RPM on GPT-5.4; Tier 4 gets 10,000 RPM. Anthropic uses a four-tier system with similar structure. DeepSeek queues requests during peak hours - no published tiers, just latency variability.
Batch API Discounts
OpenAI, Anthropic, Google, and xAI all offer 50% off for async batch processing with 24-hour SLAs. Groq offers 50% off batch jobs with a 24-hour to 7-day processing window. DeepSeek's automatic prompt caching delivers comparable savings without a formal batch endpoint. If your workload isn't latency-sensitive, batch mode halves your bill right away.
Prompt Caching
Cache hit pricing across the major providers:
- Anthropic: 10% of standard input rate (cache hits cost $0.50/MTok for Opus 4.7)
- OpenAI: 10% of standard input rate (automatic, no setup required)
- Google: 10% of standard input rate, plus storage fees ($1.00/1M tokens/hour for Flash)
- xAI: 10% of standard input rate for Grok 4 and Grok 4.1 Fast
- DeepSeek: Automatic caching; cache hits at $0.028/MTok on V3.2 (90% off)
- Groq: 50% off cached input tokens (less aggressive but no storage fee)
For workloads with shared system prompts or repeated document context, caching can cut effective input costs by 80-90%.
Context Window Surcharges
Anthropic's Opus 4.7 and Sonnet 4.6 include the full 1M context at standard rates - no tiered pricing regardless of request size. Gemini 3.1 Pro doubles input pricing above 200K tokens ($2 becomes $4, output goes $12 to $18). Gemini 2.5 Pro similarly steps up above 200K. OpenAI's GPT-5.4 applies 2x input and 1.5x output pricing above 272K tokens.
xAI's Grok 4.1 Fast at 2M context is a genuine outlier - $0.20 input with no surcharge across the full window.
Tool Use Overhead
Anthropic adds 313-346 tokens per request when tools are enabled. OpenAI's function calling consumes tokens for schema definitions. These add up in agentic pipelines. Anthropic's web search costs $10 per 1,000 searches on top of token costs.
LLM API prices shift regularly - the Opus 4.7 tokenizer change this week shows how a "same price" launch can quietly increase real costs.
Source: commons.wikimedia.org
Free Tier Comparison
| Provider | Free Credits | Models Available | Rate Limits | Notes |
|---|---|---|---|---|
| Google (Gemini) | Unlimited free tier | Flash-Lite, Flash, 3 Flash Preview | 5-15 RPM, 100-1,000 RPD | Pro models now paid-only (April 1) |
| Groq | Free tier available | All hosted models | Varies by model | No card required |
| DeepSeek | 5M tokens on signup | All models | Standard limits | Expires unspecified |
| xAI | $25 signup credits | All Grok models | Standard limits | Not recurring |
| OpenAI | ~$5 trial credits | GPT-4o mini, limited | 3 RPM (free tier) | 3-month expiry |
| Anthropic | ~$5 trial credits | All models | Tier 1 limits | Few months expiry |
| Mistral | Free tier (some) | Nemo | Limited RPM | No card required |
Google's free tier tightened in April but remains viable for prototyping. Flash models (including the new Gemini 3 Flash Preview) are still zero-cost with manageable rate limits. Groq keeps offering hardware-accelerated inference on Llama, Qwen, and the GPT OSS series without any upfront payment, and the new Llama 4 Scout on Groq is available on the free tier.
Price History
Apr 2026 - Claude Opus 4.7 launches at $5/$25 per MTok (same as Opus 4.6), but a new tokenizer uses up to 35% more tokens for equivalent inputs. Effective cost is higher despite the unchanged headline rate.
Apr 2026 - Claude Haiku 3 retired April 19. The cheapest Anthropic model in the table drops out. Haiku 4.5 at $1.00/$5.00 is the new Anthropic budget entry point.
Apr 2026 - Google restricted Gemini Pro models (2.5 Pro and up) behind a payment requirement starting April 1. Flash and Flash-Lite retain free tiers.
Apr 2026 - Mistral Small 3.2 launches at $0.07/$0.20, undercutting Small 3.1. Mistral Nemo still cheaper at $0.02 but Small 3.2 now offers the best quality-per-dollar in the Mistral lineup for sub-$0.10 work.
Apr 2026 - Kimi K2.6 replaces K2-0905 at $0.60/$2.50 per MTok, down from $1.00/$3.00. Moonshot AI continues cutting prices as the Kimi K2.5 family matures.
Apr 2026 - Llama 4 Scout (17B active / 16E MoE) added to Groq's lineup at $0.11/$0.34. It's the only Llama 4 architecture currently available on Groq after Maverick's deprecation in February.
Mar 2026 - DeepSeek V4 expected on API; as of April 20 the official API still routes to V3.2 with a 128K context limit. No official ETA from DeepSeek.
Mar 2026 - Anthropic added Fast Mode for Claude Opus 4.6 at $30/$150 per MTok. Remains exclusive to Opus 4.6 - not available on Opus 4.7 at launch.
Feb 2026 - Claude Opus 4.6 launched at $5/$25, a 67% reduction from Opus 4.1's $15/$75.
Feb 2026 - Groq deprecated Llama 4 Maverick, replaced by GPT OSS 120B at $0.15/$0.60.
The "same price" model launch is becoming a pricing tactic. Opus 4.7 is the clearest example yet - identical sticker, new tokenizer, quietly higher effective cost.
FAQ
Which LLM API is cheapest per million tokens?
Mistral Nemo at $0.02/$0.04 per MTok is the cheapest commercial option. Mistral Small 3.2 at $0.07/$0.20 is the next step up and substantially more capable. DeepSeek V3.2 cache hits at $0.028 compete directly for workloads with repeated context.
What's the best value LLM API for production?
DeepSeek V3.2 at $0.28/$0.42 per MTok. It matches frontier models on most benchmarks at roughly one-fifth the price of GPT-5.4. Automatic caching drops effective input to $0.028/MTok. DeepSeek V4 is expected to improve on this when it eventually reaches the API.
Did Claude Opus 4.7 change API pricing?
The nominal price is unchanged at $5/$25. But the new tokenizer uses up to 35% more tokens for the same text. A workload that cost $5.00 on Opus 4.6 could cost up to $6.75 on Opus 4.7.
Are there free LLM APIs for development?
Google's Gemini API offers the most generous free tier - Flash and Flash-Lite models remain free after the April 1 tightening. Groq provides free inference on Llama, Qwen, and GPT OSS models at LPU-accelerated speeds.
Do batch APIs actually save 50%?
Yes. OpenAI, Anthropic, Google, xAI, and Groq all offer 50% off for async batch processing. Combined with prompt caching, total savings can reach 90% on repetitive workloads with a 24-hour processing window.
What happened to Google's free tier?
Starting April 1, 2026, Google restricted Gemini Pro models (2.5 Pro and above) to paid accounts only. Gemini Flash, Flash-Lite, and the new Gemini 3 Flash Preview remain available on the free tier with reduced daily quotas.
Sources:
- Anthropic Claude Pricing (Official)
- Claude Opus 4.7 Launch Announcement
- Google Gemini API Pricing (Official)
- DeepSeek API Pricing (Official)
- xAI Models and Pricing (Official)
- Groq API Pricing (Official)
- Mistral AI Pricing
- Claude Opus 4.7 Tokenizer Cost Analysis
- Google Gemini Free Tier Changes April 2026
- Kimi K2.6 API Pricing
✓ Last verified April 20, 2026
