LLM API Pricing Comparison - June 2026

Verified June 8: Ministral 3B cheapest at $0.04/MTok, DeepSeek V4 Flash best value at $0.14, Claude Opus 4.8 Fast Mode cut to $10/$50, Mistral Large 3 corrected to $0.50/$1.50.

Cheapest: Ministral 3B Best Value: DeepSeek V4 Flash Updated weekly
LLM API Pricing Comparison - June 2026

TL;DR

  • Cheapest standard input: Ministral 3B (legacy endpoint) at $0.04/MTok; current 2512 version is $0.10
  • Claude Opus 4.8 Fast Mode drops to $10/$50 - down 67% from $30/$150 for Opus 4.7/4.6
  • DeepSeek V4 Pro 75% discount is now permanent; $0.435/$0.87 is the standard list price
  • Mistral Large 3 corrected to $0.50/$1.50 - our May table had the old Mistral Large 2 price

The Bottom Line

DeepSeek V4 Flash remains the best value at $0.14/$0.28 per MTok. Nothing else delivers frontier-class architecture at that price, and the 98% cache discount at $0.0028 makes it even cheaper on repeated-context workloads. The V4 Pro at $0.435/$0.87 is now permanently priced at the promotional rate - DeepSeek cancelled the May 31 expiry and made it the standard list price.

The headline model this month is Claude Opus 4.8. Base pricing matches Opus 4.7 at $5/$25, but the Fast Mode cost fell sharply from $30/$150 to $10/$50 per MTok. That's a 67% reduction for latency-sensitive workloads. Anthropic also cut the tool use overhead for 4.8 to 290 tokens (vs 675 for Opus 4.7 in auto mode), so agentic pipelines get a meaningful secondary discount on every call.

The Mistral table needed a major correction. Our May 25 table listed Mistral Large 3 at $2.00/$6.00, which was Mistral Large 2's price. Mistral Large 3 (the 2512 release from December 2025) launched at $0.50/$1.50 and has been priced there since. We also fixed Mistral Small 4 from $0.15/$0.60 to $0.10/$0.30. Both were stale values.

Full Pricing Table

All prices in USD per million tokens (MTok). Verified against official documentation June 8, 2026. Sorted by input price, cheapest first.

ModelProviderInput (/1M)Output (/1M)ContextNotes
Ministral 3BMistral$0.04$0.04256KLegacy endpoint; 2512 version at $0.10
Llama 3.1 8BGroq$0.05$0.08128K840+ tok/s on LPU
GPT OSS 20BGroq$0.075$0.30128KOpen-weight; LPU-accelerated
GPT-4.1 nanoOpenAI$0.10$0.401MRouting and classification
Gemini 2.5 Flash-LiteGoogle$0.10$0.401MFree tier available
Mistral Small 4Mistral$0.10$0.30128KUpdated from $0.15/$0.60
Ministral 3 3B (2512)Mistral$0.10$0.10256KCurrent recommended 3B
Llama 4 ScoutGroq$0.11$0.34128K17B active / 16E MoE
DeepSeek V4 FlashDeepSeek$0.14$0.281MCache hit: $0.0028; best value pick
Ministral 3-8BMistral$0.15$0.15256KNew in Ministral 3 lineup
GPT OSS 120BGroq$0.15$0.60128KOpen-weight on LPU
GPT-5.4 nanoOpenAI$0.20$1.251MBudget OpenAI option
Ministral 3-14BMistral$0.20$0.20256KNew in Ministral 3 lineup
Gemini 3.1 Flash-LiteGoogle$0.25$1.501MGA May 7; free tier retained
Qwen3 32BGroq$0.29$0.59131KOpen-weight; multilingual
Gemini 2.5 FlashGoogle$0.30$2.501MFree tier; solid mid-range
GPT-4.1 miniOpenAI$0.40$1.601M1M context
Devstral 2Mistral$0.40$2.00256KCoding/dev agent specialized
DeepSeek V4 ProDeepSeek$0.435$0.871MPermanent pricing; was promo through May 31
Mistral Large 3Mistral$0.50$1.50262KCorrected from $2/$6 (Large 2 price)
Llama 3.3 70BGroq$0.59$0.79128KDense 70B; strong instruction following
Claude Haiku 4.5Anthropic$1.00$5.00200KCheapest active Anthropic model
Grok Build 0.1xAI$1.00$2.00256KCoding/agentic; public beta Jun 2026
o4-miniOpenAI$1.10$4.40200KCheapest dedicated reasoning model
Grok 4.3xAI$1.25$2.501MDown from $3/$15
Grok 4.20xAI$1.25$2.501MReasoning variant; same price as 4.3
Gemini 2.5 ProGoogle$1.25$10.001M$2.50/$15 above 200K tokens
GPT-5OpenAI$1.25$10.00128K
Gemini 3.5 FlashGoogle$1.50$9.001MFree tier
Mistral Medium 3.5Mistral$1.50$7.50128KBalanced mid-range
GPT-5.2OpenAI$1.75$14.00128K
o3OpenAI$2.00$8.00200K87% cut from o1; reasoning model
GPT-4.1OpenAI$2.00$8.001M
Gemini 3.1 Pro PreviewGoogle$2.00$12.001M$4.00/$18.00 above 200K tokens
GPT-5.4OpenAI$2.50$15.001M
Claude Sonnet 4.6Anthropic$3.00$15.001M
Claude Opus 4.7Anthropic$5.00$25.001MSuperseded by Opus 4.8 for new projects
Claude Opus 4.8Anthropic$5.00$25.001MFast Mode at $10/$50; new Anthropic flagship
GPT-5.5OpenAI$5.00$30.00128KOpenAI flagship
GPT-5.4 Pro / GPT-5.5 ProOpenAI$30.00$180.001MUltra-premium research tier

For benchmark context behind these prices, see the cost-efficiency leaderboard.

A calculator and pen on a notebook, representing the cost math behind API pricing decisions Working out real API costs requires more than reading the headline price - caching, batching, context surcharges, and tokenizer differences all shift the final number. Source: unsplash.com

June 2026 Changes

Five standout changes since the May 25 update.

Claude Opus 4.8 launches (May 28, 2026) - Claude Opus 4.8 carries the same $5/$25 base price as Opus 4.7 but cuts Fast Mode costs from $30/$150 to $10/$50. That's a one-third cost for latency-sensitive workloads compared to running Opus 4.7 in Fast Mode. The tool use system prompt overhead also dropped to 290 tokens (from 675 for Opus 4.7 in auto mode), so agentic pipelines compound the savings on every turn.

DeepSeek V4 Pro permanent pricing (Jun 1, 2026) - The 75% promotional discount didn't expire. DeepSeek made $0.435/$0.87 the new standard list price. The original $1.74/$3.48 rate is a dead reference point. V4 Pro cache hits sit at $0.003625/MTok, and the 1M context window runs at standard pricing with no surcharge - competitive with anything at this price point.

Grok Build 0.1 public beta - xAI released Grok Build 0.1 at $1.00/$2.00 per MTok with 256K context. It targets agentic coding with native MCP tool support and runs at 100+ tokens/second. At $1/$2, it's cheaper on input than Claude Haiku 4.5 and sits near Devstral 2 ($0.40/$2.00) on output. If your workload involves agentic code generation with tool calls, both are worth running a real-workload comparison.

Mistral Large 3 table correction - Our May 25 table had Mistral Large 3 at $2.00/$6.00. That was Mistral Large 2's price. Mistral Large 3 (the 2512 December release) has been $0.50/$1.50 since launch, with a 262K context window. We also corrected Mistral Small 4 to $0.10/$0.30 - the $0.15/$0.60 figure in the May table was stale. Both corrections were errors in sourcing, not price cuts.

New Ministral 3 family - Mistral's Ministral 3 lineup now has three active sizes. The legacy Ministral 3B at $0.04 is still accessible via legacy endpoint. The current 2512 generation runs at $0.10/$0.10 (3B), $0.15/$0.15 (8B), and $0.20/$0.20 (14B). The 8B and 14B support 256K context with optional reasoning mode via the same API endpoint.

Hidden Costs

Claude Opus 4.8 Fast Mode

Fast Mode for Opus 4.8 costs $10/$50 - versus $30/$150 for Opus 4.6/4.7. At standard pricing all Opus 4.x models cost $5/$25, so Fast Mode on 4.8 is a 2x markup while on 4.7 it's a 6x markup on input. If your application already routes through Opus 4.7 in Fast Mode and performance is comparable on 4.8, the cost difference is meaningful at scale. Anthropic hasn't marked Fast Mode as generally available (it's "research preview") but the pricing asymmetry between 4.7 and 4.8 already justifies testing.

GPT-5.5 Output Cost Creep

GPT-5.5 input ($5/MTok) looks identical to Claude Opus 4.8. The output ($30/MTok) is 20% higher than Opus 4.8's $25/MTok. For reasoning-heavy or document-generation workloads that produce long outputs, that 20% gap compounds across millions of output tokens. Run a real-workload estimate before migrating from GPT-5.4 to GPT-5.5.

Rate Limits and Spend Tiers

OpenAI gates throughput by spend tier. New accounts cap at 500 RPM on frontier models; Tier 4 gets 10,000 RPM. Anthropic uses a four-tier structure. DeepSeek V4 Flash has no published tiers but queues under high load - latency spikes are common during peak hours on the free-tier endpoints.

Batch API Discounts

OpenAI, Anthropic, Google, and xAI all offer 50% off async batch processing with 24-hour SLAs. Groq offers 50% off batch jobs with a 24-hour to 7-day window. DeepSeek's automatic prompt caching at $0.0028 per cache-hit MTok competes with formal batch discounts without requiring a separate API endpoint. Gemini 3.5 Flash batch tier at $0.75/$4.50 is a clean 50% reduction.

Prompt Caching

Cache hit pricing across major providers:

  • DeepSeek V4 Flash: $0.0028/MTok (98% off standard input)
  • DeepSeek V4 Pro: $0.003625/MTok (99.2% off list price)
  • Anthropic: 10% of standard input ($0.50/MTok for Opus 4.8/4.7, $0.30/MTok for Sonnet 4.6)
  • OpenAI: 10% of standard input (automatic, no setup)
  • Google: 10% of standard input, plus storage fees ($0.15-$1.00/1M tokens/hour)
  • xAI: 10% of standard input for Grok 4.3 and Grok 4.20
  • Groq: 50% off cached input tokens

Electronic shelf labels showing real-time price tags in a retail setting LLM API prices continued moving in June 2026 - Opus 4.8 Fast Mode drops sharply, DeepSeek V4 Pro goes permanent, and Mistral's table gets a long-overdue correction. Source: commons.wikimedia.org

Context Window Surcharges

Anthropic Opus 4.8/4.7 and Sonnet 4.6 include full 1M context at standard rates with no surcharge. Gemini 3.1 Pro doubles input pricing above 200K tokens ($2 becomes $4, output goes $12 to $18). Gemini 2.5 Pro steps similarly ($1.25 to $2.50). OpenAI GPT-5.4 applies 2x input and 1.5x output above 272K tokens.

Grok 4.3 and Grok 4.20 offer 1M context at a flat $1.25/$2.50 - no tiered surcharges. Grok Build 0.1 offers 256K at a flat $1/$2. Mistral Large 3 covers 262K context at standard rates.

Free Tier Comparison

ProviderFree CreditsModels AvailableRate LimitsNotes
Google (Gemini)Unlimited free tierFlash-Lite, 2.5 Flash, 3.5 Flash5-15 RPM, 100-1,000 RPDPro models paid-only
GroqFree tier availableAll hosted modelsVaries by modelNo card required
DeepSeek5M tokens on signupV4 Flash, V4 ProStandard limitsNon-renewable
xAI$25 signup creditsAll Grok modelsStandard limitsNot recurring
OpenAI~$5 trial creditsGPT-4o mini, limited3 RPM (free tier)3-month expiry
Anthropic~$5 trial creditsAll modelsTier 1 limitsFew months expiry
MistralFree tier (limited)Ministral 3B legacyRate-limitedNo card required

Google's free tier is still the most generous. Flash models including Gemini 3.5 Flash are free with manageable rate limits. Groq covers all hosted models on the free tier including Llama and Qwen families. The new Ministral 3 8B and 14B are paid-only from launch.

Price History

  • Jun 1, 2026 - DeepSeek V4 Pro promotional pricing ($0.435/$0.87) becomes permanent. The original list price of $1.74/$3.48 is effectively retired.

  • May 28, 2026 - Claude Opus 4.8 launches at $5/$25 standard. Fast Mode price drops from $30/$150 to $10/$50, a 67% reduction vs Opus 4.7/4.6.

  • May 2026 - DeepSeek V4 Flash arrives on the API at $0.14/$0.28, replacing V3.2 routing. Cache hits at $0.0028.

  • May 2026 - GPT-5.5 launches as OpenAI flagship at $5/$30. Output cost doubles compared to GPT-5.4's $15/MTok.

  • May 2026 - xAI reprices Grok 4.3 and Grok 4.20 to $1.25/$2.50 per MTok from the old Grok 4 rate of $3/$15.

  • May 19, 2026 - Gemini 3.5 Flash launches at $1.50/$9.00 per MTok, batch at $0.75/$4.50.

  • May 7, 2026 - Gemini 3.1 Flash-Lite moves to general availability at $0.25/$1.50 per MTok.

  • Dec 2025 - Mistral Large 3 (2512) launches at $0.50/$1.50, 75% below Mistral Large 2 ($2/$6). Ministral 3-8B and Ministral 3-14B launch at flat $0.15 and $0.20/MTok with 256K context.

  • Apr-May 2026 - Mistral overhauls its lineup. Ministral 3B debuts at $0.04/$0.04. Mistral Small 4 replaces Small 3.x at $0.10/$0.30.

  • Apr 2026 - Claude Opus 4.7 launches at $5/$25 with a tokenizer that uses up to 35% more tokens than Opus 4.6.

  • Feb 2026 - Claude Opus 4.6 launches at $5/$25, a 67% reduction from Opus 4.1's $15/$75. Fast Mode added at $30/$150 for Opus 4.6 and 4.7.

Mistral Large 3 at $0.50/$1.50 with 262K context is the most under-tracked value in the frontier tier. Most pricing tools still show the Large 2 figure - it's worth a second look.

FAQ

Which LLM API is cheapest per million tokens?

Ministral 3B at $0.04/$0.04 via the legacy endpoint is the cheapest standard commercial option. The current 2512 version runs $0.10/$0.10. For production use, DeepSeek V4 Flash at $0.14/$0.28 offers far better capability per dollar than either.

What's the best value LLM API for production?

DeepSeek V4 Flash at $0.14/$0.28 per MTok. V4 architecture, 1M context, and 98% cache discounts at $0.0028 make it the best cost-per-capability option in the table. V4 Pro is now permanently $0.435/$0.87 and worth benchmarking for complex tasks.

Did xAI really cut Grok prices by 83%?

Yes, on output. Grok 4.3 and Grok 4.20 are both $1.25/$2.50 per MTok now, down from the old Grok 4 rate of $3/$15. The input cut is 58%, the output cut is 83%. Both models carry 1M context with no tiered surcharges.

Are there free LLM APIs for development?

Google's Gemini API offers the most generous free tier - Flash models including Gemini 3.5 Flash remain free with rate limits. Groq provides free LPU-accelerated inference on Llama, Qwen, and GPT OSS models. Mistral offers rate-limited free access to the legacy Ministral 3B.

What's new with Claude Opus 4.8 pricing vs 4.7?

Standard pricing is identical at $5/$25. The difference is Fast Mode: Opus 4.8 runs $10/$50 vs $30/$150 for Opus 4.7. If your application uses Fast Mode, Opus 4.8 cuts that cost by two-thirds. Batch pricing is unchanged at $2.50/$12.50 for both.

Is Mistral Large 3 worth using at $0.50/$1.50?

It's a capable December 2025 model with 262K context at a flat rate. The main comparison set is Gemini 3.5 Flash ($1.50/$9.00) and Grok 4.3 ($1.25/$2.50). Mistral Large 3 undercuts both on input and output pricing, though benchmark coverage is thinner than those alternatives.


Sources:

✓ Last verified June 8, 2026

LinkedIn
Reddit
Hacker News
Telegram
LLM API Pricing Comparison - June 2026
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.