LLM API Pricing Comparison - March 2026

Side-by-side LLM API pricing for GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, DeepSeek V3.2, Grok 4.1, and 20+ models normalized to cost per million tokens.

Cheapest: Mistral Nemo Best Value: DeepSeek V3.2 Updated weekly
LLM API Pricing Comparison - March 2026

TL;DR

  • DeepSeek V3.2 costs $0.28/$0.42 per million tokens (input/output) and handles most production workloads - the clear value pick
  • Cheapest usable option: Mistral Nemo at $0.02/$0.04 per million tokens, though limited to simpler tasks
  • GPT-5.4 ($2.50/$15.00) and Claude Opus 4.6 ($5.00/$25.00) remain expensive but lead on reasoning quality
  • Every major provider now offers batch API discounts of 50%, and prompt caching can cut input costs by 90%

The Bottom Line

If you need the cheapest input price and your task is straightforward (classification, extraction, simple Q&A), Mistral Nemo at $0.02 per million input tokens is hard to beat. For anything requiring real reasoning ability, DeepSeek V3.2 at $0.28/$0.42 delivers quality that rivals models 10x its price. Production teams building agents or handling complex analysis will still reach for GPT-5.4, Claude Opus 4.6, or Gemini 3.1 Pro - but the gap between premium and budget models is narrowing fast.

Full Pricing Table

All prices are per million tokens (MTok) in USD. Sorted by input price, cheapest first. Prices verified against official documentation on March 11, 2026.

ModelProviderInput (/1M)Output (/1M)ContextNotes
Mistral NemoMistral$0.02$0.04128KLightweight, fast
Ministral 3BMistral$0.04$0.04128KEdge/mobile use cases
Llama 3.1 8BGroq$0.05$0.08128K840 tok/s on Groq LPU
Mistral Small 3.1Mistral$0.10$0.30128KGood for structured tasks
Llama 4 ScoutGroq$0.11$0.34128KMoE, 17Bx16E
GPT-4o miniOpenAI$0.15$0.60128KLegacy, still popular
Llama 4 MaverickGroq$0.20$0.60128KMoE, 17Bx128E
Grok 4.1 FastxAI$0.20$0.502MLargest context window
DeepSeek V3.2DeepSeek$0.28$0.42128KCache hit: $0.028
Qwen3 32BGroq$0.29$0.59131KOpen-weight, strong multilingual
Gemini 2.5 Flash-LiteGoogle$0.10$0.401MBudget option, long context
Gemini 2.5 FlashGoogle$0.30$2.501MSolid mid-tier
GPT-4.1 miniOpenAI$0.40$1.60128KNewer GPT-4 replacement
Mistral Large 3Mistral$0.50$1.50128KStrong multilingual
Gemini 3 FlashGoogle$0.50$3.001MLatest Google Flash
Llama 3.3 70BGroq$0.59$0.79128KOpen-weight workhorse
Kimi K2Groq$1.00$3.00256KMoonShot AI flagship
o4-miniOpenAI$1.10$4.40128KReasoning model, budget
Gemini 2.5 ProGoogle$1.25$10.001MPro tier, long context
GPT-5OpenAI$1.25$10.00128KStandard GPT-5
GPT-5.2OpenAI$1.75$14.00128KImproved reasoning
GPT-5.3 ChatOpenAI$1.75$14.00128KLatest chat-optimized
o3OpenAI$2.00$8.00200KHeavy reasoning
GPT-4.1OpenAI$2.00$8.001MStrong code, long context
Grok 4.20xAI$2.00$6.00256KFlagship Grok
Gemini 3.1 ProGoogle$2.00$12.001MGoogle flagship
GPT-5.4OpenAI$2.50$15.00272KOpenAI flagship
Claude Sonnet 4.6Anthropic$3.00$15.00200KBest mid-tier for code
Claude Opus 4.6Anthropic$5.00$25.00200KTop reasoning, agentic
o1OpenAI$15.00$60.00128KLegacy reasoning
GPT-5.4 ProOpenAI$30.00$180.00272KUltra-premium

For more on how these models compare on benchmarks, see our overall LLM rankings and the cost-efficiency leaderboard.

Hidden Costs Most Comparisons Miss

Rate Limits and Throttling

Raw token price means nothing if your requests get throttled. OpenAI gates access by spend tier - Tier 1 (new accounts) caps at 500 RPM for GPT-5.4, while Tier 4 gets 10,000 RPM. Anthropic uses a similar system with four tiers. DeepSeek doesn't publish formal tiers but has been known to queue requests during peak hours, adding latency that inflates effective cost for real-time applications.

Batch API Discounts

Every major provider now offers 50% off for batch (async) processing. OpenAI, Anthropic, Google, and xAI all match at exactly 50%. If your workload tolerates 24-hour turnaround, this is free money. DeepSeek's automatic caching achieves similar savings without requiring a separate API endpoint.

Context Window Surcharges

Anthropic charges 2x for input and 1.5x for output when Claude Opus 4.6 or Sonnet 4.6 requests exceed 200K tokens. Google doubles input pricing for Pro models above 200K tokens. OpenAI's GPT-5.4 hits 2x input and 1.5x output beyond 272K tokens. Budget accordingly if you're processing long documents.

Prompt Caching

Cache hits cost roughly 10% of base input price across OpenAI, Anthropic, and Google. DeepSeek's automatic caching drops input costs to $0.028 per MTok - effectively 90% off. xAI offers cached input at $0.05 for Grok 4.1 Fast (75% off). For repetitive workloads with shared system prompts, caching alone can cut your bill by 80-90%.

Tool Use Overhead

Anthropic adds 313-346 tokens per request when tools are enabled. OpenAI's function calling consumes tokens for the schema definition. These add up in agentic workflows making hundreds of tool calls per session. Anthropic's web search costs $10 per 1,000 searches on top of token costs.

Free Tier Comparison

ProviderFree CreditsModels AvailableRate LimitsExpiration
Google (Gemini)Unlimited free tierFlash-Lite, Flash, 2.5 Pro5-15 RPM, 100-1,000 RPDNone
GroqFree tier availableAll hosted modelsVaries by modelNone
DeepSeek5M tokens on signupAll modelsStandard limitsNo expiration stated
xAI$25 signup creditsAll Grok modelsStandard limitsNot specified
OpenAI~$5 trial creditsGPT-4o mini, GPT-3.53 RPM (free tier)3 months
Anthropic~$5 trial creditsAll modelsTier 1 limitsFew months
MistralFree tier (some models)Nemo, SmallLimited RPMNone

Google's free tier stands out by a wide margin. You get access to Gemini 2.5 Pro for zero cost, with rate limits that are workable for prototyping. Groq's free tier is similarly generous, letting you test Llama 4 models at hardware-accelerated speeds. The rest offer small one-time credit allocations that burn through quickly during development.

Price History

  • Feb 2026 - Anthropic released Claude Opus 4.6 at $5/$25, a 67% cut from Opus 4.1's $15/$75 pricing.

  • Feb 2026 - OpenAI launched GPT-5.4 at $2.50/$15.00 and GPT-5.4 Pro at $30/$180.

  • Jan 2026 - xAI released Grok 4.20 at $2.00/$6.00 with multi-agent capabilities.

  • Dec 2025 - Google quietly reduced Gemini free tier rate limits by 50-80% across most models.

  • Sep 2025 - DeepSeek unified V3.2 pricing at $0.28/$0.42, roughly halving V3.1 rates ($0.60/$1.70).

  • Jul 2025 - OpenAI launched o4-mini at $1.10/$4.40, making reasoning models affordable for the first time.

The trend is clear: flagship model prices are dropping 40-60% per generation while capabilities improve. DeepSeek and the open-weight ecosystem are putting sustained downward pressure on the entire market. If you locked in pricing assumptions six months ago, they're likely already stale.

FAQ

Which LLM API is cheapest per million tokens?

Mistral Nemo at $0.02/$0.04 per MTok is the cheapest commercial API. For cache-heavy workloads, DeepSeek V3.2's cache hit price of $0.028 input is competitive even with smaller models.

What's the best value LLM API for production?

DeepSeek V3.2 at $0.28/$0.42 per MTok. It scores within 5% of GPT-5 on most benchmarks at roughly one-fifth the price, and its automatic caching drops effective costs further.

Are there free LLM APIs for development?

Google's Gemini API offers the most generous free tier with no credit card required and access to 2.5 Pro. Groq also provides free access to Llama 4 and other open-weight models.

How much does it cost to process 1 million tokens?

Depends on the model. At the low end, $0.06 total (Mistral Nemo). Mid-range: $0.70 (DeepSeek V3.2). Premium: $17.50 (GPT-5.4) to $30.00 (Claude Opus 4.6). Most production apps process 10-50M tokens per day.

Do batch APIs really save 50%?

Yes. OpenAI, Anthropic, Google, and xAI all offer exactly 50% off for async batch processing with 24-hour SLAs. Combined with prompt caching, total savings can reach 90%+ on repetitive workloads.

How do reasoning model costs compare?

OpenAI's o4-mini ($1.10/$4.40) is the cheapest dedicated reasoning model. O3 costs $2.00/$8.00. The premium o1 at $15/$60 is hard to justify unless you need its specific capabilities. Claude Opus 4.6 offers strong reasoning at $5/$25 without a separate "reasoning" tier.


Sources:

✓ Last verified March 11, 2026

LLM API Pricing Comparison - March 2026
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.