LLM API Pricing Comparison - April 2026

Current LLM API prices verified April 2026: Mistral Nemo at $0.02/MTok cheapest, DeepSeek V3.2 best value, Claude Opus 4.7 launches with a hidden 35% tokenizer cost increase.

Cheapest: Mistral Nemo Best Value: DeepSeek V3.2 Updated weekly
LLM API Pricing Comparison - April 2026

TL;DR

  • Cheapest input: Mistral Nemo at $0.02/MTok - unchanged
  • Claude Opus 4.7 launched April 16 at $5/$25 (same sticker price as 4.6), but a new tokenizer adds up to 35% more tokens for the same text - an effective price increase
  • Claude Haiku 3 retired April 19; Google locked Pro models behind a paywall April 1
  • DeepSeek V4 still isn't on the API - official docs still route to V3.2

The Bottom Line

Mistral Nemo holds the floor at $0.02 input per million tokens. For a step up in quality, the new Mistral Small 3.2 at $0.07/$0.20 is now the cheapest sub-$0.10 commercial model with solid benchmark scores. Our March correction showed Mistral Small 3.1 at $0.03/$0.11 - multiple sources now consistently list that model at $0.10/$0.30, suggesting Mistral quietly normalized pricing between March and April.

DeepSeek V3.2 at $0.28/$0.42 stays the best value pick. DeepSeek V4 was expected on the API weeks ago and still isn't there. The official DeepSeek docs explicitly confirm the API routes to V3.2 with a 128K context limit. V4's 1M context at projected $0.30/$0.50 keeps getting delayed.

The headline this period is Claude Opus 4.7, which Anthropic launched on April 16. The price tag - $5/$25 per MTok - is identical to Opus 4.6. The catch: Opus 4.7 ships with a new tokenizer that uses up to 35% more tokens for the same text. That's not a price cut. At 35% token inflation, a workload that cost $5.00 on Opus 4.6 could run $6.75 on Opus 4.7 for identical input.

Full Pricing Table

All prices in USD per million tokens (MTok). Verified against official documentation April 20, 2026. Sorted by input price, cheapest first.

ModelProviderInput (/1M)Output (/1M)ContextNotes
Mistral NemoMistral$0.02$0.04128KCheapest commercial option
Llama 3.1 8BGroq$0.05$0.08128K840+ tok/s on LPU
GPT-5 NanoOpenAI$0.05$0.40400KLargest context under $0.10
Mistral Small 3.2Mistral$0.07$0.20128KNew April 2026; cheapest Mistral above Nemo
GPT OSS 20BGroq$0.075$0.30128KOpen-weight; LPU-accelerated
GPT-4.1 NanoOpenAI$0.10$0.401MClassification, routing tasks
Gemini 2.5 Flash-LiteGoogle$0.10$0.401MFree tier available
Llama 4 ScoutGroq$0.11$0.34128KNew; 17B active / 16E MoE
GPT OSS 120BGroq$0.15$0.60128KReplaced Llama 4 Maverick
GPT-4o miniOpenAI$0.15$0.60128KLegacy; still widely deployed
Mistral Small 3.1Mistral$0.10$0.30128KPrice revised up from March
Grok 4.1 FastxAI$0.20$0.502MLargest context window in this tier
GPT-5 MiniOpenAI$0.25$2.00128KBudget GPT-5 generation
Gemini 3.1 Flash-LiteGoogle$0.25$1.501MPreview; free tier retained
DeepSeek V3.2DeepSeek$0.28$0.42128KCache hit: $0.028; best value pick
Qwen3 32BGroq$0.29$0.59131KOpen-weight; multilingual
Gemini 2.5 FlashGoogle$0.30$2.501MFree tier; solid mid-range
GPT-4.1 miniOpenAI$0.40$1.601MMid-range, 1M context
Mistral Medium 3Mistral$0.40$2.00128KBalanced reasoning
Gemini 3 Flash PreviewGoogle$0.50$3.001MNew; preview pricing
Llama 3.3 70BGroq$0.59$0.79128K70B dense; strong instruction following
Kimi K2.6Moonshot AI$0.60$2.50256KDown from $1.00/$3.00 for K2-0905
Claude Haiku 3.5Anthropic$0.80$4.00200K
Claude Haiku 4.5Anthropic$1.00$5.00200KCheapest active Anthropic model
o4-miniOpenAI$1.10$4.40200KCheapest dedicated reasoning model
Gemini 2.5 ProGoogle$1.25$10.001MFree tier ended April 1
GPT-5OpenAI$1.25$10.00128K
GPT-5.2OpenAI$1.75$14.00128K
o3OpenAI$2.00$8.00200K
GPT-4.1OpenAI$2.00$8.001M
Grok 4.20xAI$2.00$6.002MMulti-agent beta
Gemini 3.1 ProGoogle$2.00$12.001M$4.00/$18.00 above 200K tokens
GPT-5.4OpenAI$2.50$15.001.1MOpenAI flagship
Claude Sonnet 4.6Anthropic$3.00$15.001M
Grok 4xAI$3.00$15.00256KxAI flagship
Claude Opus 4.6Anthropic$5.00$25.001MFast Mode available at $30/$150
Claude Opus 4.7Anthropic$5.00$25.001MNew; tokenizer inflates effective cost
GPT-5.4 ProOpenAI$30.00$180.001.1MUltra-premium tier

For benchmark rankings behind these numbers, see the cost-efficiency leaderboard.

A calculator and pen on a notebook, representing the cost math behind API pricing decisions Working out real API costs requires more than reading the headline price - caching, batching, context surcharges, and tokenizer differences all shift the final number. Source: unsplash.com

April 2026 Changes

Four things moved since our March 30 table:

Claude Opus 4.7 (April 16) - Anthropic's new flagship costs $5/$25 on the sticker, identical to Opus 4.6. Don't let that fool you. Anthropic's own documentation notes the new tokenizer "may use up to 35% more tokens for the same fixed text." Run that math: a 1M-token request on Opus 4.6 becomes up to 1.35M tokens on Opus 4.7 at the same per-token rate. That's $6.75 input instead of $5.00. Opus 4.7 may be worth it on quality grounds, but budget accordingly.

Claude Haiku 3 retired (April 19) - The cheapest Anthropic model ever at $0.25/$1.25 is gone as of today. Haiku 4.5 at $1.00/$5.00 is four times the price. If you were routing lightweight tasks through Haiku 3, you need to reroute now.

Google tightened the free tier (April 1) - Gemini 2.5 Pro is no longer free. Google restricted Pro-tier models behind a payment requirement starting April 1, with mandatory spending caps across all billing tiers. Flash and Flash-Lite remain free, but Pro access now requires a paid account.

Mistral Small 3.2 (April 2026) - A new model at $0.07/$0.20 slots in between Nemo and Small 3.1. Multiple independent sources consistently list it at that price. We also noticed that Mistral Small 3.1, which we reported as corrected to $0.03/$0.11 in March, is now consistently listed at $0.10/$0.30 across multiple sources. The cheaper price may have been a short-lived promotional rate. We'll note this discrepancy and track it.

Hidden Costs

The Opus 4.7 Tokenizer Problem

This deserves its own section. A 35% token inflation means every comparison table that shows $5/MTok for Opus 4.7 is understating the real cost. If you're migrating an existing Opus 4.6 workload, the safe assumption is a 15-35% cost increase even at identical task descriptions. Benchmark against your specific prompts before committing production traffic.

Rate Limits and Spend Tiers

OpenAI gates throughput by spend tier. New accounts (Tier 1) cap at 500 RPM on GPT-5.4; Tier 4 gets 10,000 RPM. Anthropic uses a four-tier system with similar structure. DeepSeek queues requests during peak hours - no published tiers, just latency variability.

Batch API Discounts

OpenAI, Anthropic, Google, and xAI all offer 50% off for async batch processing with 24-hour SLAs. Groq offers 50% off batch jobs with a 24-hour to 7-day processing window. DeepSeek's automatic prompt caching delivers comparable savings without a formal batch endpoint. If your workload isn't latency-sensitive, batch mode halves your bill right away.

Prompt Caching

Cache hit pricing across the major providers:

  • Anthropic: 10% of standard input rate (cache hits cost $0.50/MTok for Opus 4.7)
  • OpenAI: 10% of standard input rate (automatic, no setup required)
  • Google: 10% of standard input rate, plus storage fees ($1.00/1M tokens/hour for Flash)
  • xAI: 10% of standard input rate for Grok 4 and Grok 4.1 Fast
  • DeepSeek: Automatic caching; cache hits at $0.028/MTok on V3.2 (90% off)
  • Groq: 50% off cached input tokens (less aggressive but no storage fee)

For workloads with shared system prompts or repeated document context, caching can cut effective input costs by 80-90%.

Context Window Surcharges

Anthropic's Opus 4.7 and Sonnet 4.6 include the full 1M context at standard rates - no tiered pricing regardless of request size. Gemini 3.1 Pro doubles input pricing above 200K tokens ($2 becomes $4, output goes $12 to $18). Gemini 2.5 Pro similarly steps up above 200K. OpenAI's GPT-5.4 applies 2x input and 1.5x output pricing above 272K tokens.

xAI's Grok 4.1 Fast at 2M context is a genuine outlier - $0.20 input with no surcharge across the full window.

Tool Use Overhead

Anthropic adds 313-346 tokens per request when tools are enabled. OpenAI's function calling consumes tokens for schema definitions. These add up in agentic pipelines. Anthropic's web search costs $10 per 1,000 searches on top of token costs.

Electronic shelf labels showing real-time price tags in a retail setting LLM API prices shift regularly - the Opus 4.7 tokenizer change this week shows how a "same price" launch can quietly increase real costs. Source: commons.wikimedia.org

Free Tier Comparison

ProviderFree CreditsModels AvailableRate LimitsNotes
Google (Gemini)Unlimited free tierFlash-Lite, Flash, 3 Flash Preview5-15 RPM, 100-1,000 RPDPro models now paid-only (April 1)
GroqFree tier availableAll hosted modelsVaries by modelNo card required
DeepSeek5M tokens on signupAll modelsStandard limitsExpires unspecified
xAI$25 signup creditsAll Grok modelsStandard limitsNot recurring
OpenAI~$5 trial creditsGPT-4o mini, limited3 RPM (free tier)3-month expiry
Anthropic~$5 trial creditsAll modelsTier 1 limitsFew months expiry
MistralFree tier (some)NemoLimited RPMNo card required

Google's free tier tightened in April but remains viable for prototyping. Flash models (including the new Gemini 3 Flash Preview) are still zero-cost with manageable rate limits. Groq keeps offering hardware-accelerated inference on Llama, Qwen, and the GPT OSS series without any upfront payment, and the new Llama 4 Scout on Groq is available on the free tier.

Price History

  • Apr 2026 - Claude Opus 4.7 launches at $5/$25 per MTok (same as Opus 4.6), but a new tokenizer uses up to 35% more tokens for equivalent inputs. Effective cost is higher despite the unchanged headline rate.

  • Apr 2026 - Claude Haiku 3 retired April 19. The cheapest Anthropic model in the table drops out. Haiku 4.5 at $1.00/$5.00 is the new Anthropic budget entry point.

  • Apr 2026 - Google restricted Gemini Pro models (2.5 Pro and up) behind a payment requirement starting April 1. Flash and Flash-Lite retain free tiers.

  • Apr 2026 - Mistral Small 3.2 launches at $0.07/$0.20, undercutting Small 3.1. Mistral Nemo still cheaper at $0.02 but Small 3.2 now offers the best quality-per-dollar in the Mistral lineup for sub-$0.10 work.

  • Apr 2026 - Kimi K2.6 replaces K2-0905 at $0.60/$2.50 per MTok, down from $1.00/$3.00. Moonshot AI continues cutting prices as the Kimi K2.5 family matures.

  • Apr 2026 - Llama 4 Scout (17B active / 16E MoE) added to Groq's lineup at $0.11/$0.34. It's the only Llama 4 architecture currently available on Groq after Maverick's deprecation in February.

  • Mar 2026 - DeepSeek V4 expected on API; as of April 20 the official API still routes to V3.2 with a 128K context limit. No official ETA from DeepSeek.

  • Mar 2026 - Anthropic added Fast Mode for Claude Opus 4.6 at $30/$150 per MTok. Remains exclusive to Opus 4.6 - not available on Opus 4.7 at launch.

  • Feb 2026 - Claude Opus 4.6 launched at $5/$25, a 67% reduction from Opus 4.1's $15/$75.

  • Feb 2026 - Groq deprecated Llama 4 Maverick, replaced by GPT OSS 120B at $0.15/$0.60.

The "same price" model launch is becoming a pricing tactic. Opus 4.7 is the clearest example yet - identical sticker, new tokenizer, quietly higher effective cost.

FAQ

Which LLM API is cheapest per million tokens?

Mistral Nemo at $0.02/$0.04 per MTok is the cheapest commercial option. Mistral Small 3.2 at $0.07/$0.20 is the next step up and substantially more capable. DeepSeek V3.2 cache hits at $0.028 compete directly for workloads with repeated context.

What's the best value LLM API for production?

DeepSeek V3.2 at $0.28/$0.42 per MTok. It matches frontier models on most benchmarks at roughly one-fifth the price of GPT-5.4. Automatic caching drops effective input to $0.028/MTok. DeepSeek V4 is expected to improve on this when it eventually reaches the API.

Did Claude Opus 4.7 change API pricing?

The nominal price is unchanged at $5/$25. But the new tokenizer uses up to 35% more tokens for the same text. A workload that cost $5.00 on Opus 4.6 could cost up to $6.75 on Opus 4.7.

Are there free LLM APIs for development?

Google's Gemini API offers the most generous free tier - Flash and Flash-Lite models remain free after the April 1 tightening. Groq provides free inference on Llama, Qwen, and GPT OSS models at LPU-accelerated speeds.

Do batch APIs actually save 50%?

Yes. OpenAI, Anthropic, Google, xAI, and Groq all offer 50% off for async batch processing. Combined with prompt caching, total savings can reach 90% on repetitive workloads with a 24-hour processing window.

What happened to Google's free tier?

Starting April 1, 2026, Google restricted Gemini Pro models (2.5 Pro and above) to paid accounts only. Gemini Flash, Flash-Lite, and the new Gemini 3 Flash Preview remain available on the free tier with reduced daily quotas.


Sources:

✓ Last verified April 20, 2026

LinkedIn
Reddit
Hacker News
Telegram
LLM API Pricing Comparison - April 2026
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.