LLM API Pricing Comparison - March 2026
Side-by-side LLM API pricing for GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, DeepSeek V3.2, Grok 4, and 35+ models normalized to cost per million tokens.

TL;DR
- Mistral Small 3.1 is $0.03/$0.11 - our March 16 table had incorrect pricing ($0.10/$0.30 reflects a different Small variant, not the base 24B model)
- Llama 4 Maverick was deprecated Feb 20 on Groq, replaced by GPT OSS 120B at $0.15/$0.60
- GPT-5 Nano context window corrected to 400K (was listed as 128K); DeepSeek V4 expected on API soon at ~$0.30/$0.50 with 1M context
- Anthropic added Fast Mode for Claude Opus 4.6 at $30/$150 per MTok - 6x standard, for latency-critical workloads
The Bottom Line
For raw cheapness, Mistral Nemo at $0.02 per million input tokens is still the floor. The corrected Mistral Small 3.1 pricing ($0.03/$0.11) makes it the second cheapest commercial option by a wide margin - our March 16 table had it wrong due to a naming collision with a different Mistral Small variant.
DeepSeek V3.2 at $0.28/$0.42 remains the best value pick. It scores within 5% of GPT-5 on most benchmarks at roughly one-fifth the price. DeepSeek V4 - which adds multimodal support and a 1M context window at ~$0.30/$0.50 - is expected on the API soon, but as of March 30 the official API endpoint (api.deepseek.com) still serves V3.2 only.
GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro remain the premium tier anchors. None changed prices this period.
Full Pricing Table
All prices are per million tokens (MTok) in USD. Sorted by input price, cheapest first. Prices verified against official documentation on March 30, 2026.
| Model | Provider | Input (/1M) | Output (/1M) | Context | Notes |
|---|---|---|---|---|---|
| Mistral Nemo | Mistral | $0.02 | $0.04 | 128K | Cheapest commercial model |
| Mistral Small 3.1 | Mistral | $0.03 | $0.11 | 128K | Corrected from previous $0.10 |
| GPT-5 Nano | OpenAI | $0.05 | $0.40 | 400K | 400K context; corrected from 128K |
| Llama 3.1 8B | Groq | $0.05 | $0.08 | 128K | 840+ tok/s on LPU |
| Mistral Small 3.2 | Mistral | $0.075 | $0.20 | 128K | New - replaces Small Creative tier |
| GPT OSS 20B | Groq | $0.075 | $0.30 | 128K | Llama-class; no TPM announcements |
| GPT OSS 120B | Groq | $0.15 | $0.60 | 128K | Replaced Llama 4 Maverick (deprecated Feb 20) |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | 128K | Legacy, still widely used |
| Grok 4.1 Fast | xAI | $0.20 | $0.50 | 2M | Largest context window in budget tier |
| grok-code-fast-1 | xAI | $0.20 | $1.50 | 256K | Code-specialized variant |
| GPT-5 Mini | OpenAI | $0.25 | $2.00 | 128K | Budget GPT-5 generation |
| Claude Haiku 3 | Anthropic | $0.25 | $1.25 | 200K | Budget Anthropic entry point |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1M | Preview; replaces 2.5 Flash-Lite tier | |
| DeepSeek V3.2 | DeepSeek | $0.28 | $0.42 | 128K | Cache hit: $0.028 |
| Qwen3 32B | Groq | $0.29 | $0.59 | 131K | Open-weight, multilingual |
| Codestral 2508 | Mistral | $0.30 | $0.90 | 262K | Code-optimized; Aug 2025 release |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | Reliable mid-tier | |
| GPT-4.1 mini | OpenAI | $0.40 | $1.60 | 1M | Mid-range, 1M context |
| GPT-4.1 Nano | OpenAI | $0.10 | $0.40 | 1M | Classification, routing |
| Claude Haiku 3.5 | Anthropic | $0.80 | $4.00 | 200K | Improved Haiku generation |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 200K | Latest Anthropic budget tier |
| Kimi K2-0905 | Groq | $1.00 | $3.00 | 256K | MoonShot AI flagship |
| o4-mini | OpenAI | $1.10 | $4.40 | 200K | Cheapest dedicated reasoning model |
| GPT-5 | OpenAI | $1.25 | $10.00 | 128K | Standard GPT-5 |
| Gemini 2.5 Pro | $1.25 | $10.00 | 2M | Massive context, strong benchmarks | |
| GPT-5.2 | OpenAI | $1.75 | $14.00 | 128K | Improved reasoning over GPT-5 |
| o3 | OpenAI | $2.00 | $8.00 | 200K | Heavy reasoning tasks |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | 1M | Strong code, 1M context |
| Grok 4.20 | xAI | $2.00 | $6.00 | 2M | Multi-agent beta |
| Gemini 3.1 Pro | $2.00 | $12.00 | 1M | 2x input pricing above 200K | |
| GPT-5.4 | OpenAI | $2.50 | $15.00 | 1.1M | OpenAI flagship; 1.1M context |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 1M | Best mid-tier for code and agents |
| Grok 4 | xAI | $3.00 | $15.00 | 256K | xAI flagship reasoning |
| Claude Opus 4.6 | Anthropic | $5.00 | $25.00 | 1M | Top reasoning; Fast Mode available |
| GPT-5.4 Pro | OpenAI | $30.00 | $180.00 | 1.1M | Ultra-premium |
For benchmark rankings behind these numbers, see the overall LLM leaderboard and our cost-efficiency leaderboard.
Working out real API costs requires more than reading the headline price - caching, batching, context surcharges, and tool overhead all shift the final number.
Source: unsplash.com
Corrections from March 16
Two corrections in this update:
Mistral Small 3.1 was listed at $0.10/$0.30. That pricing reflects Mistral's "Small Creative" variant - a fine-tuned flavor of the Small architecture. The base Mistral Small 3.1 24B has been priced at $0.03/$0.11 since its March 2025 launch. Anyone building on what they assumed was "budget Small" at $0.10 was paying 3x too much.
GPT-5 Nano context window was listed as 128K. Official documentation confirms 400K tokens, released at that spec in August 2025. At $0.05/$0.40, it now offers the largest context window of any sub-$0.10/MTok model.
Pricing data on AI model APIs changes as frequently as retail shelf labels - the number you saw last week may already be wrong.
Source: commons.wikimedia.org
Hidden Costs Most Comparisons Miss
Rate Limits and Throttling
OpenAI gates access by spend tier - Tier 1 (new accounts) caps at 500 RPM for GPT-5.4, while Tier 4 gets 10,000 RPM. Anthropic uses a similar four-tier system. DeepSeek doesn't publish formal tiers but has queued requests during peak hours, adding latency that inflates effective cost for real-time applications.
Starting March 31, 2026, OpenAI bills container usage per 20-minute session for certain API features. Budget accordingly if you're using code execution or stateful containers.
Batch API Discounts
OpenAI, Anthropic, Google, and xAI all offer 50% off for async batch processing. Groq offers 50% off cached input tokens on cache hits (not a formal batch API - caching is automatic). DeepSeek's cache hit pricing achieves similar savings without a separate endpoint. If your workload tolerates 24-hour turnaround, the standard batch APIs cut your bill in half immediately.
Fast Mode and High-Performance Tiers
Anthropic introduced Fast Mode for Claude Opus 4.6 - essentially a latency-optimized inference tier running at $30/$150 per MTok, 6x the standard rate. It's not available with the Batch API and stacks with other pricing modifiers. For pipelines where response speed matters more than cost, it's an option. For most production teams, it isn't.
OpenAI's o1-pro at $150/$600 per MTok sits in the same extreme end. These ultra-premium tiers exist; just don't confuse them with the standard product.
Context Window Surcharges
Anthropic includes the full 1M token context window for Opus 4.6 and Sonnet 4.6 at standard rates - no surcharge regardless of request size. Earlier Claude 4 models (Sonnet 4, Sonnet 4.5) apply 2x input / 1.5x output pricing above 200K tokens, but only when a specific beta header is used to unlock the extended context.
OpenAI's GPT-5.4 and GPT-5.4 Pro apply long-context pricing (2x input, 1.5x output) on requests above 272K tokens. Google doubles input pricing for Gemini 3.1 Pro above 200K tokens.
DeepSeek V3.2 is capped at 128K context. When V4 lands on the API, its 1M context at ~$0.30/$0.50 will remove that constraint at effectively the same price point.
Prompt Caching
Cache hits cost 10% of base input price across OpenAI and Anthropic. DeepSeek's automatic caching drops input to $0.028/MTok for V3.2 and $0.03/MTok for V4 - roughly 90% off. xAI charges $0.75/MTok for Grok 4 cache hits (75% off standard) and $0.05/MTok for Grok 4.1 Fast (75% off). Groq offers 50% off cached input tokens.
For repetitive workloads with shared system prompts or document context, caching cuts effective input costs dramatically. The math changes which model wins on cost.
Tool Use Overhead
Anthropic adds 313-346 tokens per request when tools are enabled. OpenAI's function calling consumes tokens for the schema definition. These add up in agentic workflows making hundreds of tool calls per session. Anthropic's web search costs $10 per 1,000 searches on top of token costs; web fetch has no additional charge.
Data Residency
Anthropic now charges a 1.1x multiplier for US-only inference routing via the inference_geo parameter on Opus 4.6 and newer models. Global routing (the default) uses standard pricing.
Free Tier Comparison
| Provider | Free Credits | Models Available | Rate Limits | Expiration |
|---|---|---|---|---|
| Google (Gemini) | Unlimited free tier | Flash-Lite, Flash, 2.5 Pro | 5-15 RPM, 100-1,000 RPD | None |
| Groq | Free tier available | All hosted models | Varies by model | None |
| DeepSeek | 5M tokens on signup | All models | Standard limits | Not stated |
| xAI | $25 signup credits | All Grok models | Standard limits | Not specified |
| OpenAI | ~$5 trial credits | GPT-4o mini, GPT-3.5 | 3 RPM (free tier) | 3 months |
| Anthropic | ~$5 trial credits | All models | Tier 1 limits | Few months |
| Mistral | Free tier (some models) | Nemo, Small | Limited RPM | None |
Google's free tier remains the most generous. Gemini 2.5 Pro at zero cost, with rate limits workable for prototyping. Groq continues offering fast inference on Llama, Qwen, and GPT OSS models with no upfront cost. The new GPT OSS 120B on Groq - which replaced Llama 4 Maverick - is available on the free tier.
Price History
Mar 2026 - Mistral Small 3.1 pricing corrected: the base 24B model has been at $0.03/$0.11 since its March 2025 release. Prior comparisons (including this page on March 16) used the "Small Creative" variant's pricing of $0.10/$0.30 instead.
Mar 2026 - DeepSeek V4 Lite appeared on the DeepSeek web platform; full API availability still pending as of March 30. Expected pricing ~$0.30/$0.50 per MTok with 1M context. The
api.deepseek.comendpoint still routes to V3.2.Feb 2026 - Groq deprecated Llama 4 Maverick on Feb 20, replacing it with GPT OSS 120B at $0.15/$0.60. Maverick pricing had been $0.20/$0.60; the new model runs at lower input cost.
Mar 2026 - Anthropic added Fast Mode (beta) for Claude Opus 4.6 at $30/$150 per MTok - 6x the standard rate. Available on the Claude API but not the Batch API.
Mar 2026 - xAI released Grok 4 at $3.00/$15.00 (256K context), positioning it as the flagship above the Grok 4.20 beta tier.
Feb 2026 - Anthropic released Claude Opus 4.6 at $5/$25, a 67% cut from Opus 4.1's $15/$75 pricing.
Feb 2026 - OpenAI launched GPT-5.4 at $2.50/$15.00 with a 1.1M token context window.
Jan 2026 - xAI released Grok 4.20 at $2.00/$6.00 with multi-agent capabilities and a 2M context window.
Sep 2025 - DeepSeek unified V3.2 pricing at $0.28/$0.42, roughly halving V3.1 rates.
The pattern holds: flagship input prices drop 40-60% per generation while the budget tier fills with models that would have been mid-tier 12 months ago. Mistral Small 3.1's $0.03 input price - corrected from our earlier erroneous figure - puts a solid 24B model within pennies of Nemo. The floor keeps falling.
FAQ
Which LLM API is cheapest per million tokens?
Mistral Nemo at $0.02/$0.04 per MTok is the cheapest commercial option. Mistral Small 3.1 at $0.03/$0.11 is the second cheapest and considerably more capable. For cache-heavy workloads, DeepSeek V4's cache hit price of $0.03 input competes directly.
What's the best value LLM API for production?
DeepSeek V3.2 at $0.28/$0.42 per MTok. It matches frontier models on most benchmarks at roughly one-fifth the price of GPT-5.4, and automatic caching drops effective input costs to $0.028/MTok. V4 is expected on the API soon at similar prices with larger context.
Are there free LLM APIs for development?
Google's Gemini API offers the most generous free tier - no credit card required, access to 2.5 Pro. Groq provides free inference on Llama, Qwen, and GPT OSS models at hardware-accelerated speeds.
How much does it cost to process 1 million tokens?
Depends entirely on the model. Low end: $0.06 total (Mistral Nemo, equal input/output). Mid-range: $0.70 (DeepSeek V3.2). Premium: $17.50 (GPT-5.4) or $30.00 (Claude Opus 4.6). Most production apps process 10-50M tokens per day.
Do batch APIs really save 50%?
Yes. OpenAI, Anthropic, Google, and xAI all offer 50% off for async batch processing with 24-hour SLAs. Combined with prompt caching, total savings can reach 90%+ on repetitive workloads.
How do reasoning model costs compare?
o4-mini at $1.10/$4.40 is the cheapest dedicated reasoning model. o3 costs $2.00/$8.00. Claude Opus 4.6 at $5/$25 handles complex reasoning without a separate reasoning tier. The legacy o1 at $15/$60 is rarely the right choice today.
Sources:
Last updated
✓ Last verified March 30, 2026
