LLM API Pricing Comparison - June 2026
Verified June 8: Ministral 3B cheapest at $0.04/MTok, DeepSeek V4 Flash best value at $0.14, Claude Opus 4.8 Fast Mode cut to $10/$50, Mistral Large 3 corrected to $0.50/$1.50.

TL;DR
- Cheapest standard input: Ministral 3B (legacy endpoint) at $0.04/MTok; current 2512 version is $0.10
- Claude Opus 4.8 Fast Mode drops to $10/$50 - down 67% from $30/$150 for Opus 4.7/4.6
- DeepSeek V4 Pro 75% discount is now permanent; $0.435/$0.87 is the standard list price
- Mistral Large 3 corrected to $0.50/$1.50 - our May table had the old Mistral Large 2 price
The Bottom Line
DeepSeek V4 Flash remains the best value at $0.14/$0.28 per MTok. Nothing else delivers frontier-class architecture at that price, and the 98% cache discount at $0.0028 makes it even cheaper on repeated-context workloads. The V4 Pro at $0.435/$0.87 is now permanently priced at the promotional rate - DeepSeek cancelled the May 31 expiry and made it the standard list price.
The headline model this month is Claude Opus 4.8. Base pricing matches Opus 4.7 at $5/$25, but the Fast Mode cost fell sharply from $30/$150 to $10/$50 per MTok. That's a 67% reduction for latency-sensitive workloads. Anthropic also cut the tool use overhead for 4.8 to 290 tokens (vs 675 for Opus 4.7 in auto mode), so agentic pipelines get a meaningful secondary discount on every call.
The Mistral table needed a major correction. Our May 25 table listed Mistral Large 3 at $2.00/$6.00, which was Mistral Large 2's price. Mistral Large 3 (the 2512 release from December 2025) launched at $0.50/$1.50 and has been priced there since. We also fixed Mistral Small 4 from $0.15/$0.60 to $0.10/$0.30. Both were stale values.
Full Pricing Table
All prices in USD per million tokens (MTok). Verified against official documentation June 8, 2026. Sorted by input price, cheapest first.
| Model | Provider | Input (/1M) | Output (/1M) | Context | Notes |
|---|---|---|---|---|---|
| Ministral 3B | Mistral | $0.04 | $0.04 | 256K | Legacy endpoint; 2512 version at $0.10 |
| Llama 3.1 8B | Groq | $0.05 | $0.08 | 128K | 840+ tok/s on LPU |
| GPT OSS 20B | Groq | $0.075 | $0.30 | 128K | Open-weight; LPU-accelerated |
| GPT-4.1 nano | OpenAI | $0.10 | $0.40 | 1M | Routing and classification |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M | Free tier available | |
| Mistral Small 4 | Mistral | $0.10 | $0.30 | 128K | Updated from $0.15/$0.60 |
| Ministral 3 3B (2512) | Mistral | $0.10 | $0.10 | 256K | Current recommended 3B |
| Llama 4 Scout | Groq | $0.11 | $0.34 | 128K | 17B active / 16E MoE |
| DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 | 1M | Cache hit: $0.0028; best value pick |
| Ministral 3-8B | Mistral | $0.15 | $0.15 | 256K | New in Ministral 3 lineup |
| GPT OSS 120B | Groq | $0.15 | $0.60 | 128K | Open-weight on LPU |
| GPT-5.4 nano | OpenAI | $0.20 | $1.25 | 1M | Budget OpenAI option |
| Ministral 3-14B | Mistral | $0.20 | $0.20 | 256K | New in Ministral 3 lineup |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1M | GA May 7; free tier retained | |
| Qwen3 32B | Groq | $0.29 | $0.59 | 131K | Open-weight; multilingual |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | Free tier; solid mid-range | |
| GPT-4.1 mini | OpenAI | $0.40 | $1.60 | 1M | 1M context |
| Devstral 2 | Mistral | $0.40 | $2.00 | 256K | Coding/dev agent specialized |
| DeepSeek V4 Pro | DeepSeek | $0.435 | $0.87 | 1M | Permanent pricing; was promo through May 31 |
| Mistral Large 3 | Mistral | $0.50 | $1.50 | 262K | Corrected from $2/$6 (Large 2 price) |
| Llama 3.3 70B | Groq | $0.59 | $0.79 | 128K | Dense 70B; strong instruction following |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 200K | Cheapest active Anthropic model |
| Grok Build 0.1 | xAI | $1.00 | $2.00 | 256K | Coding/agentic; public beta Jun 2026 |
| o4-mini | OpenAI | $1.10 | $4.40 | 200K | Cheapest dedicated reasoning model |
| Grok 4.3 | xAI | $1.25 | $2.50 | 1M | Down from $3/$15 |
| Grok 4.20 | xAI | $1.25 | $2.50 | 1M | Reasoning variant; same price as 4.3 |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | $2.50/$15 above 200K tokens | |
| GPT-5 | OpenAI | $1.25 | $10.00 | 128K | |
| Gemini 3.5 Flash | $1.50 | $9.00 | 1M | Free tier | |
| Mistral Medium 3.5 | Mistral | $1.50 | $7.50 | 128K | Balanced mid-range |
| GPT-5.2 | OpenAI | $1.75 | $14.00 | 128K | |
| o3 | OpenAI | $2.00 | $8.00 | 200K | 87% cut from o1; reasoning model |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | 1M | |
| Gemini 3.1 Pro Preview | $2.00 | $12.00 | 1M | $4.00/$18.00 above 200K tokens | |
| GPT-5.4 | OpenAI | $2.50 | $15.00 | 1M | |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 1M | |
| Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | 1M | Superseded by Opus 4.8 for new projects |
| Claude Opus 4.8 | Anthropic | $5.00 | $25.00 | 1M | Fast Mode at $10/$50; new Anthropic flagship |
| GPT-5.5 | OpenAI | $5.00 | $30.00 | 128K | OpenAI flagship |
| GPT-5.4 Pro / GPT-5.5 Pro | OpenAI | $30.00 | $180.00 | 1M | Ultra-premium research tier |
For benchmark context behind these prices, see the cost-efficiency leaderboard.
Working out real API costs requires more than reading the headline price - caching, batching, context surcharges, and tokenizer differences all shift the final number.
Source: unsplash.com
June 2026 Changes
Five standout changes since the May 25 update.
Claude Opus 4.8 launches (May 28, 2026) - Claude Opus 4.8 carries the same $5/$25 base price as Opus 4.7 but cuts Fast Mode costs from $30/$150 to $10/$50. That's a one-third cost for latency-sensitive workloads compared to running Opus 4.7 in Fast Mode. The tool use system prompt overhead also dropped to 290 tokens (from 675 for Opus 4.7 in auto mode), so agentic pipelines compound the savings on every turn.
DeepSeek V4 Pro permanent pricing (Jun 1, 2026) - The 75% promotional discount didn't expire. DeepSeek made $0.435/$0.87 the new standard list price. The original $1.74/$3.48 rate is a dead reference point. V4 Pro cache hits sit at $0.003625/MTok, and the 1M context window runs at standard pricing with no surcharge - competitive with anything at this price point.
Grok Build 0.1 public beta - xAI released Grok Build 0.1 at $1.00/$2.00 per MTok with 256K context. It targets agentic coding with native MCP tool support and runs at 100+ tokens/second. At $1/$2, it's cheaper on input than Claude Haiku 4.5 and sits near Devstral 2 ($0.40/$2.00) on output. If your workload involves agentic code generation with tool calls, both are worth running a real-workload comparison.
Mistral Large 3 table correction - Our May 25 table had Mistral Large 3 at $2.00/$6.00. That was Mistral Large 2's price. Mistral Large 3 (the 2512 December release) has been $0.50/$1.50 since launch, with a 262K context window. We also corrected Mistral Small 4 to $0.10/$0.30 - the $0.15/$0.60 figure in the May table was stale. Both corrections were errors in sourcing, not price cuts.
New Ministral 3 family - Mistral's Ministral 3 lineup now has three active sizes. The legacy Ministral 3B at $0.04 is still accessible via legacy endpoint. The current 2512 generation runs at $0.10/$0.10 (3B), $0.15/$0.15 (8B), and $0.20/$0.20 (14B). The 8B and 14B support 256K context with optional reasoning mode via the same API endpoint.
Hidden Costs
Claude Opus 4.8 Fast Mode
Fast Mode for Opus 4.8 costs $10/$50 - versus $30/$150 for Opus 4.6/4.7. At standard pricing all Opus 4.x models cost $5/$25, so Fast Mode on 4.8 is a 2x markup while on 4.7 it's a 6x markup on input. If your application already routes through Opus 4.7 in Fast Mode and performance is comparable on 4.8, the cost difference is meaningful at scale. Anthropic hasn't marked Fast Mode as generally available (it's "research preview") but the pricing asymmetry between 4.7 and 4.8 already justifies testing.
GPT-5.5 Output Cost Creep
GPT-5.5 input ($5/MTok) looks identical to Claude Opus 4.8. The output ($30/MTok) is 20% higher than Opus 4.8's $25/MTok. For reasoning-heavy or document-generation workloads that produce long outputs, that 20% gap compounds across millions of output tokens. Run a real-workload estimate before migrating from GPT-5.4 to GPT-5.5.
Rate Limits and Spend Tiers
OpenAI gates throughput by spend tier. New accounts cap at 500 RPM on frontier models; Tier 4 gets 10,000 RPM. Anthropic uses a four-tier structure. DeepSeek V4 Flash has no published tiers but queues under high load - latency spikes are common during peak hours on the free-tier endpoints.
Batch API Discounts
OpenAI, Anthropic, Google, and xAI all offer 50% off async batch processing with 24-hour SLAs. Groq offers 50% off batch jobs with a 24-hour to 7-day window. DeepSeek's automatic prompt caching at $0.0028 per cache-hit MTok competes with formal batch discounts without requiring a separate API endpoint. Gemini 3.5 Flash batch tier at $0.75/$4.50 is a clean 50% reduction.
Prompt Caching
Cache hit pricing across major providers:
- DeepSeek V4 Flash: $0.0028/MTok (98% off standard input)
- DeepSeek V4 Pro: $0.003625/MTok (99.2% off list price)
- Anthropic: 10% of standard input ($0.50/MTok for Opus 4.8/4.7, $0.30/MTok for Sonnet 4.6)
- OpenAI: 10% of standard input (automatic, no setup)
- Google: 10% of standard input, plus storage fees ($0.15-$1.00/1M tokens/hour)
- xAI: 10% of standard input for Grok 4.3 and Grok 4.20
- Groq: 50% off cached input tokens
LLM API prices continued moving in June 2026 - Opus 4.8 Fast Mode drops sharply, DeepSeek V4 Pro goes permanent, and Mistral's table gets a long-overdue correction.
Source: commons.wikimedia.org
Context Window Surcharges
Anthropic Opus 4.8/4.7 and Sonnet 4.6 include full 1M context at standard rates with no surcharge. Gemini 3.1 Pro doubles input pricing above 200K tokens ($2 becomes $4, output goes $12 to $18). Gemini 2.5 Pro steps similarly ($1.25 to $2.50). OpenAI GPT-5.4 applies 2x input and 1.5x output above 272K tokens.
Grok 4.3 and Grok 4.20 offer 1M context at a flat $1.25/$2.50 - no tiered surcharges. Grok Build 0.1 offers 256K at a flat $1/$2. Mistral Large 3 covers 262K context at standard rates.
Free Tier Comparison
| Provider | Free Credits | Models Available | Rate Limits | Notes |
|---|---|---|---|---|
| Google (Gemini) | Unlimited free tier | Flash-Lite, 2.5 Flash, 3.5 Flash | 5-15 RPM, 100-1,000 RPD | Pro models paid-only |
| Groq | Free tier available | All hosted models | Varies by model | No card required |
| DeepSeek | 5M tokens on signup | V4 Flash, V4 Pro | Standard limits | Non-renewable |
| xAI | $25 signup credits | All Grok models | Standard limits | Not recurring |
| OpenAI | ~$5 trial credits | GPT-4o mini, limited | 3 RPM (free tier) | 3-month expiry |
| Anthropic | ~$5 trial credits | All models | Tier 1 limits | Few months expiry |
| Mistral | Free tier (limited) | Ministral 3B legacy | Rate-limited | No card required |
Google's free tier is still the most generous. Flash models including Gemini 3.5 Flash are free with manageable rate limits. Groq covers all hosted models on the free tier including Llama and Qwen families. The new Ministral 3 8B and 14B are paid-only from launch.
Price History
Jun 1, 2026 - DeepSeek V4 Pro promotional pricing ($0.435/$0.87) becomes permanent. The original list price of $1.74/$3.48 is effectively retired.
May 28, 2026 - Claude Opus 4.8 launches at $5/$25 standard. Fast Mode price drops from $30/$150 to $10/$50, a 67% reduction vs Opus 4.7/4.6.
May 2026 - DeepSeek V4 Flash arrives on the API at $0.14/$0.28, replacing V3.2 routing. Cache hits at $0.0028.
May 2026 - GPT-5.5 launches as OpenAI flagship at $5/$30. Output cost doubles compared to GPT-5.4's $15/MTok.
May 2026 - xAI reprices Grok 4.3 and Grok 4.20 to $1.25/$2.50 per MTok from the old Grok 4 rate of $3/$15.
May 19, 2026 - Gemini 3.5 Flash launches at $1.50/$9.00 per MTok, batch at $0.75/$4.50.
May 7, 2026 - Gemini 3.1 Flash-Lite moves to general availability at $0.25/$1.50 per MTok.
Dec 2025 - Mistral Large 3 (2512) launches at $0.50/$1.50, 75% below Mistral Large 2 ($2/$6). Ministral 3-8B and Ministral 3-14B launch at flat $0.15 and $0.20/MTok with 256K context.
Apr-May 2026 - Mistral overhauls its lineup. Ministral 3B debuts at $0.04/$0.04. Mistral Small 4 replaces Small 3.x at $0.10/$0.30.
Apr 2026 - Claude Opus 4.7 launches at $5/$25 with a tokenizer that uses up to 35% more tokens than Opus 4.6.
Feb 2026 - Claude Opus 4.6 launches at $5/$25, a 67% reduction from Opus 4.1's $15/$75. Fast Mode added at $30/$150 for Opus 4.6 and 4.7.
Mistral Large 3 at $0.50/$1.50 with 262K context is the most under-tracked value in the frontier tier. Most pricing tools still show the Large 2 figure - it's worth a second look.
FAQ
Which LLM API is cheapest per million tokens?
Ministral 3B at $0.04/$0.04 via the legacy endpoint is the cheapest standard commercial option. The current 2512 version runs $0.10/$0.10. For production use, DeepSeek V4 Flash at $0.14/$0.28 offers far better capability per dollar than either.
What's the best value LLM API for production?
DeepSeek V4 Flash at $0.14/$0.28 per MTok. V4 architecture, 1M context, and 98% cache discounts at $0.0028 make it the best cost-per-capability option in the table. V4 Pro is now permanently $0.435/$0.87 and worth benchmarking for complex tasks.
Did xAI really cut Grok prices by 83%?
Yes, on output. Grok 4.3 and Grok 4.20 are both $1.25/$2.50 per MTok now, down from the old Grok 4 rate of $3/$15. The input cut is 58%, the output cut is 83%. Both models carry 1M context with no tiered surcharges.
Are there free LLM APIs for development?
Google's Gemini API offers the most generous free tier - Flash models including Gemini 3.5 Flash remain free with rate limits. Groq provides free LPU-accelerated inference on Llama, Qwen, and GPT OSS models. Mistral offers rate-limited free access to the legacy Ministral 3B.
What's new with Claude Opus 4.8 pricing vs 4.7?
Standard pricing is identical at $5/$25. The difference is Fast Mode: Opus 4.8 runs $10/$50 vs $30/$150 for Opus 4.7. If your application uses Fast Mode, Opus 4.8 cuts that cost by two-thirds. Batch pricing is unchanged at $2.50/$12.50 for both.
Is Mistral Large 3 worth using at $0.50/$1.50?
It's a capable December 2025 model with 262K context at a flat rate. The main comparison set is Gemini 3.5 Flash ($1.50/$9.00) and Grok 4.3 ($1.25/$2.50). Mistral Large 3 undercuts both on input and output pricing, though benchmark coverage is thinner than those alternatives.
Sources:
✓ Last verified June 8, 2026
