LLM API Pricing Comparison - March 2026
Side-by-side LLM API pricing for GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, DeepSeek V3.2, Grok 4.1, and 20+ models normalized to cost per million tokens.

TL;DR
- DeepSeek V3.2 costs $0.28/$0.42 per million tokens (input/output) and handles most production workloads - the clear value pick
- Cheapest usable option: Mistral Nemo at $0.02/$0.04 per million tokens, though limited to simpler tasks
- GPT-5.4 ($2.50/$15.00) and Claude Opus 4.6 ($5.00/$25.00) remain expensive but lead on reasoning quality
- Every major provider now offers batch API discounts of 50%, and prompt caching can cut input costs by 90%
The Bottom Line
If you need the cheapest input price and your task is straightforward (classification, extraction, simple Q&A), Mistral Nemo at $0.02 per million input tokens is hard to beat. For anything requiring real reasoning ability, DeepSeek V3.2 at $0.28/$0.42 delivers quality that rivals models 10x its price. Production teams building agents or handling complex analysis will still reach for GPT-5.4, Claude Opus 4.6, or Gemini 3.1 Pro - but the gap between premium and budget models is narrowing fast.
Full Pricing Table
All prices are per million tokens (MTok) in USD. Sorted by input price, cheapest first. Prices verified against official documentation on March 11, 2026.
| Model | Provider | Input (/1M) | Output (/1M) | Context | Notes |
|---|---|---|---|---|---|
| Mistral Nemo | Mistral | $0.02 | $0.04 | 128K | Lightweight, fast |
| Ministral 3B | Mistral | $0.04 | $0.04 | 128K | Edge/mobile use cases |
| Llama 3.1 8B | Groq | $0.05 | $0.08 | 128K | 840 tok/s on Groq LPU |
| Mistral Small 3.1 | Mistral | $0.10 | $0.30 | 128K | Good for structured tasks |
| Llama 4 Scout | Groq | $0.11 | $0.34 | 128K | MoE, 17Bx16E |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | 128K | Legacy, still popular |
| Llama 4 Maverick | Groq | $0.20 | $0.60 | 128K | MoE, 17Bx128E |
| Grok 4.1 Fast | xAI | $0.20 | $0.50 | 2M | Largest context window |
| DeepSeek V3.2 | DeepSeek | $0.28 | $0.42 | 128K | Cache hit: $0.028 |
| Qwen3 32B | Groq | $0.29 | $0.59 | 131K | Open-weight, strong multilingual |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M | Budget option, long context | |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | Solid mid-tier | |
| GPT-4.1 mini | OpenAI | $0.40 | $1.60 | 128K | Newer GPT-4 replacement |
| Mistral Large 3 | Mistral | $0.50 | $1.50 | 128K | Strong multilingual |
| Gemini 3 Flash | $0.50 | $3.00 | 1M | Latest Google Flash | |
| Llama 3.3 70B | Groq | $0.59 | $0.79 | 128K | Open-weight workhorse |
| Kimi K2 | Groq | $1.00 | $3.00 | 256K | MoonShot AI flagship |
| o4-mini | OpenAI | $1.10 | $4.40 | 128K | Reasoning model, budget |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | Pro tier, long context | |
| GPT-5 | OpenAI | $1.25 | $10.00 | 128K | Standard GPT-5 |
| GPT-5.2 | OpenAI | $1.75 | $14.00 | 128K | Improved reasoning |
| GPT-5.3 Chat | OpenAI | $1.75 | $14.00 | 128K | Latest chat-optimized |
| o3 | OpenAI | $2.00 | $8.00 | 200K | Heavy reasoning |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | 1M | Strong code, long context |
| Grok 4.20 | xAI | $2.00 | $6.00 | 256K | Flagship Grok |
| Gemini 3.1 Pro | $2.00 | $12.00 | 1M | Google flagship | |
| GPT-5.4 | OpenAI | $2.50 | $15.00 | 272K | OpenAI flagship |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 200K | Best mid-tier for code |
| Claude Opus 4.6 | Anthropic | $5.00 | $25.00 | 200K | Top reasoning, agentic |
| o1 | OpenAI | $15.00 | $60.00 | 128K | Legacy reasoning |
| GPT-5.4 Pro | OpenAI | $30.00 | $180.00 | 272K | Ultra-premium |
For more on how these models compare on benchmarks, see our overall LLM rankings and the cost-efficiency leaderboard.
Hidden Costs Most Comparisons Miss
Rate Limits and Throttling
Raw token price means nothing if your requests get throttled. OpenAI gates access by spend tier - Tier 1 (new accounts) caps at 500 RPM for GPT-5.4, while Tier 4 gets 10,000 RPM. Anthropic uses a similar system with four tiers. DeepSeek doesn't publish formal tiers but has been known to queue requests during peak hours, adding latency that inflates effective cost for real-time applications.
Batch API Discounts
Every major provider now offers 50% off for batch (async) processing. OpenAI, Anthropic, Google, and xAI all match at exactly 50%. If your workload tolerates 24-hour turnaround, this is free money. DeepSeek's automatic caching achieves similar savings without requiring a separate API endpoint.
Context Window Surcharges
Anthropic charges 2x for input and 1.5x for output when Claude Opus 4.6 or Sonnet 4.6 requests exceed 200K tokens. Google doubles input pricing for Pro models above 200K tokens. OpenAI's GPT-5.4 hits 2x input and 1.5x output beyond 272K tokens. Budget accordingly if you're processing long documents.
Prompt Caching
Cache hits cost roughly 10% of base input price across OpenAI, Anthropic, and Google. DeepSeek's automatic caching drops input costs to $0.028 per MTok - effectively 90% off. xAI offers cached input at $0.05 for Grok 4.1 Fast (75% off). For repetitive workloads with shared system prompts, caching alone can cut your bill by 80-90%.
Tool Use Overhead
Anthropic adds 313-346 tokens per request when tools are enabled. OpenAI's function calling consumes tokens for the schema definition. These add up in agentic workflows making hundreds of tool calls per session. Anthropic's web search costs $10 per 1,000 searches on top of token costs.
Free Tier Comparison
| Provider | Free Credits | Models Available | Rate Limits | Expiration |
|---|---|---|---|---|
| Google (Gemini) | Unlimited free tier | Flash-Lite, Flash, 2.5 Pro | 5-15 RPM, 100-1,000 RPD | None |
| Groq | Free tier available | All hosted models | Varies by model | None |
| DeepSeek | 5M tokens on signup | All models | Standard limits | No expiration stated |
| xAI | $25 signup credits | All Grok models | Standard limits | Not specified |
| OpenAI | ~$5 trial credits | GPT-4o mini, GPT-3.5 | 3 RPM (free tier) | 3 months |
| Anthropic | ~$5 trial credits | All models | Tier 1 limits | Few months |
| Mistral | Free tier (some models) | Nemo, Small | Limited RPM | None |
Google's free tier stands out by a wide margin. You get access to Gemini 2.5 Pro for zero cost, with rate limits that are workable for prototyping. Groq's free tier is similarly generous, letting you test Llama 4 models at hardware-accelerated speeds. The rest offer small one-time credit allocations that burn through quickly during development.
Price History
Feb 2026 - Anthropic released Claude Opus 4.6 at $5/$25, a 67% cut from Opus 4.1's $15/$75 pricing.
Feb 2026 - OpenAI launched GPT-5.4 at $2.50/$15.00 and GPT-5.4 Pro at $30/$180.
Jan 2026 - xAI released Grok 4.20 at $2.00/$6.00 with multi-agent capabilities.
Dec 2025 - Google quietly reduced Gemini free tier rate limits by 50-80% across most models.
Sep 2025 - DeepSeek unified V3.2 pricing at $0.28/$0.42, roughly halving V3.1 rates ($0.60/$1.70).
Jul 2025 - OpenAI launched o4-mini at $1.10/$4.40, making reasoning models affordable for the first time.
The trend is clear: flagship model prices are dropping 40-60% per generation while capabilities improve. DeepSeek and the open-weight ecosystem are putting sustained downward pressure on the entire market. If you locked in pricing assumptions six months ago, they're likely already stale.
FAQ
Which LLM API is cheapest per million tokens?
Mistral Nemo at $0.02/$0.04 per MTok is the cheapest commercial API. For cache-heavy workloads, DeepSeek V3.2's cache hit price of $0.028 input is competitive even with smaller models.
What's the best value LLM API for production?
DeepSeek V3.2 at $0.28/$0.42 per MTok. It scores within 5% of GPT-5 on most benchmarks at roughly one-fifth the price, and its automatic caching drops effective costs further.
Are there free LLM APIs for development?
Google's Gemini API offers the most generous free tier with no credit card required and access to 2.5 Pro. Groq also provides free access to Llama 4 and other open-weight models.
How much does it cost to process 1 million tokens?
Depends on the model. At the low end, $0.06 total (Mistral Nemo). Mid-range: $0.70 (DeepSeek V3.2). Premium: $17.50 (GPT-5.4) to $30.00 (Claude Opus 4.6). Most production apps process 10-50M tokens per day.
Do batch APIs really save 50%?
Yes. OpenAI, Anthropic, Google, and xAI all offer exactly 50% off for async batch processing with 24-hour SLAs. Combined with prompt caching, total savings can reach 90%+ on repetitive workloads.
How do reasoning model costs compare?
OpenAI's o4-mini ($1.10/$4.40) is the cheapest dedicated reasoning model. O3 costs $2.00/$8.00. The premium o1 at $15/$60 is hard to justify unless you need its specific capabilities. Claude Opus 4.6 offers strong reasoning at $5/$25 without a separate "reasoning" tier.
Sources:
✓ Last verified March 11, 2026
