AI API Pricing Q2 2026: What Dropped and What Didn't

TL;DR

DeepSeek V4 finally reached the official API in late April - V4-Flash at $0.14/M input is now the best value flagship-tier model
DeepSeek V4-Pro runs at 75% off until May 31 ($0.435/M input during promo); after that date, the "official" price becomes $1.74/M
Overall AI API costs are down 60-80% from Q2 2025 - what cost $5/M a year ago now costs under $1/M in the same quality tier
Anthropic's Opus 4.7 tokenizer change added up to 35% effective cost on workloads that moved from Opus 4.6

The story of Q2 2026 AI pricing isn't a single dramatic cut. It's a sustained compression across the full market. Every major provider dropped at least one model tier, introduced batch discounts, or expanded free access since January. The floor has fallen. What cost $3 per million tokens in mid-2025 now runs under $0.50 for the same capability tier.

The practical effect: developers who built cost models against 2025 API rates are discovering they're overpaying. The teams who haven't renegotiated or re-benchmarked their provider mix in the last six months are leaving real money on the table.

This quarterly report covers what changed in Q2 2026 (April through May), what's coming in June, and where the biggest savings opportunities are right now. For the full static pricing table across all models, see the LLM API pricing comparison.

The Q2 2026 Moves

The three biggest pricing events of the quarter:

DeepSeek V4 hits the official API. As of late April, DeepSeek V4-Flash and V4-Pro are available via the DeepSeek API. V4-Flash at $0.14/M input and $0.28/M output is the most capable sub-$0.50 model now available, with a 1M token context window. V4-Pro launched with a 75% promotional discount that runs through May 31.

Anthropic's Opus 4.7 tokenizer change. Claude Opus 4.7 launched April 16 at $5/$25 per million tokens - identical sticker price to Opus 4.6. The difference: Opus 4.7 uses a new tokenizer that can generate up to 35% more tokens for the same input text. A workload costing $5.00 on Opus 4.6 can run $6.75 on Opus 4.7 without any change in your prompts. Anthropic has not publicly disclosed this change prominently.

Google removed Gemini 2.5 Pro's free tier. On April 1, Google moved Gemini 2.5 Pro entirely behind a paywall. The model is still available but no longer carries a free tier. Gemini 2.5 Flash-Lite retains free access.

AI API costs are down 60-80% from Q2 2025. Teams who haven't re-benchmarked their provider mix in the last six months are paying rates that no longer reflect the market.

Quarterly Changes Table

The table below shows the meaningful price moves in Q2 2026 compared to Q1 2026 (January). Verified against official documentation as of May 24, 2026.

Model	Q1 2026 Price (Input/Output)	Q2 2026 Price	Change
DeepSeek V4 Flash	Not on API	$0.14 / $0.28 /1M	New
DeepSeek V4 Pro	Not on API	$0.435 / $0.87 /1M (promo)	New
Claude Opus 4.7	$15 / $75 (Opus 4.1 equivalent)	$5 / $25 /1M	-67% sticker; +35% effective
GPT-5.5	Not available	$5 / $30 /1M	New
Gemini 2.5 Pro	$1.25/M + free tier	$1.25/M, no free tier	Free tier removed
Kimi K2.6	$1.00 / $3.00 /1M	$0.60 / $2.50 /1M	-40% / -17%
Claude Sonnet 4.6	$3 / $15 /1M	$3 / $15 /1M	No change
GPT-5.4	$2.50 / $15 /1M	$2.50 / $15 /1M	No change

Flat prices in the frontier tier. The cuts are happening in the mid and budget tiers. The top-end models ($3+ input) haven't moved since their launches.

DeepSeek V4 on the API

DeepSeek V4's arrival on the official API is the most consequential pricing event of the quarter. The April comparison showed "V4 still isn't on the API - official docs still route to V3.2." That changed on April 26 with V4-Flash, and V4-Pro followed shortly after.

The architecture is truly efficient. V4 uses 1.6 trillion total parameters in a mixture-of-experts configuration, but only 49 billion activate per inference. That efficiency is what allows DeepSeek to price V4-Flash at $0.14/M input while offering a 1M token context window.

The promotional structure of V4-Pro matters for planning. The current price - $0.435/M input (75% off the list price of $1.74/M) - ends May 31. After that date, DeepSeek says the "official" price becomes approximately $1.74/M input, though it has also stated prices will be "officially adjusted to 1/4 of the original price" after the promo. Reading those two statements together: the post-promo price is $1.74/M unless DeepSeek changes course again.

Calculator and financial documents on a desk representing AI API cost analysis API cost management has become a budget line item for engineering teams in 2026 - the combination of price drops and new discount mechanisms changed the math. Source: unsplash.com

Cache hit pricing changes the calculus further. DeepSeek cut V4-Flash cache hit prices to $0.0028/M on April 26 - a 98% reduction for cache hits. For applications with repeated prompt prefixes (system prompts, retrieved documents, tool definitions), this is an extraordinary discount. A workload that runs 90% cache hits effectively costs $0.014/M, not $0.14/M.

The Batch and Caching Revolution

The headline per-token prices are misleading for most real workloads. Every major provider now offers at least two discount mechanisms that cut effective costs by 50-90%.

Batch API discounts:

Provider	Batch Discount	Processing Time
Anthropic	50% off all models	24 hours
OpenAI	50% off all models	24 hours
Google	50% off all models	24 hours
DeepSeek	Cache pricing (up to 98%)	Real-time

Prompt caching discounts on cache hits:

Provider	Cache Hit Discount	Cache Duration
Anthropic	90% off input	5 min or 1 hour (paid)
OpenAI	50-75% off input	5-10 minutes
Google	Up to 90%	Session-based
DeepSeek	98% off input (V4-Flash)	Session-based

The practical implication: a team running Claude Sonnet 4.6 for document analysis - where the same 50K-token context document is prepended to every query - can hit 90% cache savings on that repeated input. The effective input price drops from $3.00/M to $0.30/M for the cached portion.

Batch processing is the right choice for any workload that isn't user-facing in real time: data pipelines, bulk classification, nightly summarization runs, training data processing. The 50% discount applies unconditionally. No configuration change needed, just the batch API endpoint.

Free Tiers in Q2 2026

Free access to capable models narrowed slightly in Q2. Google removed Gemini 2.5 Pro's free tier; the other providers held their free tiers steady.

Model	Provider	Free Tier	Limit
Gemini 2.5 Flash-Lite	Google	Yes	RPM-limited
Gemini 2.5 Flash	Google	Yes	RPM-limited
Gemini 3.1 Flash-Lite	Google	Yes (preview)	RPM-limited
GPT-4.1 Nano	OpenAI	Limited (trial credits)	One-time
Mistral models	Mistral	Yes (La Plateforme)	RPM-limited
Claude (all models)	Anthropic	No free API tier	-
DeepSeek	DeepSeek	No free tier	-

Anthropic remains the only Tier 1 provider with no free API tier. This has been consistent since Claude 3 launched. Teams evaluating Claude before committing to a subscription need to use the Claude.ai interface for testing or accept the cost of even small-scale API evaluation.

Data center servers illuminated in blue representing cloud AI infrastructure Self-hosting open-weight models remains a cost-effective alternative for teams with sufficient engineering resources - the open-source hosting costs page breaks down the real numbers. Source: unsplash.com

The 60-80% Year-Over-Year Compression

Looking back a year: in Q2 2025, accessing a GPT-4-class model cost $10-30 per million input tokens. Today, GPT-5.4 - a materially better model - runs $2.50/M. The AI Pricing Guru analysis puts the average cost decrease at 60-80% across the industry from Q2 2025 to Q2 2026.

The compression is driven by two factors. First, DeepSeek's ultra-efficient architecture showed that frontier-class performance didn't require OpenAI-level compute costs. Second, cloud providers (Groq, Together.ai, Fireworks) began running open-weight models at scale, creating direct price competition for providers like OpenAI and Anthropic who had operated without comparable alternatives.

The trend line is still sloping down. Google's Gemini 3.1 Flash-Lite preview launched at $0.25/M - below Gemini 2.5 Flash-Lite's $0.10/M, which launched below its predecessor's price. Each generation launches cheaper than the one before it.

What's Coming in June

Two changes are certain for June 2026. First, the DeepSeek V4-Pro promotional 75% discount ends May 31. Teams running V4-Pro should expect the per-token price to roughly 4x on June 1. Either switch to V4-Flash for cost-sensitive workloads, or budget the post-promo V4-Pro cost.

Second, Google's Gemini 3.1 Pro is currently in preview pricing. Preview rates are historically lower than GA pricing. When Gemini 3.1 Pro exits preview (timeline unannounced), expect prices to move upward toward standard rates.

FAQ

Which AI API is cheapest right now?

Mistral Nemo at $0.02/M input is still the lowest absolute price for a commercial model. For a production-capable model at the frontier, DeepSeek V4-Flash at $0.14/M input is the best cost-performance option as of May 2026.

Which API offers the best value for a production coding agent?

DeepSeek V4-Flash for most agentic coding tasks. It scores 93.5 on LiveCodeBench with 1M context at $0.14/M input. The post-promo V4-Pro price on June 1 will be the re-evaluation point.

Does Anthropic offer a free API tier?

No. Anthropic has no free API tier - the Claude.ai web interface has a free plan, but API access requires a paid account from the first token.

How much does it cost to process 1 million tokens with a mid-range model?

With Claude Haiku 4.5: $1.00 input + $5.00 output = $6.00 for 1M tokens each. With Gemini 2.5 Flash: $0.30 + $2.50 = $2.80. With DeepSeek V4-Flash: $0.14 + $0.28 = $0.42. The gap between the cheapest and most expensive mid-tier option is roughly 14x.

Are batch API discounts worth using?

Yes - unconditionally. Any workload that can accept 24-hour turnaround gets 50% off from OpenAI, Anthropic, and Google. There is no trade-off in output quality. The only cost is latency.

Which provider has the best prompt caching implementation?

DeepSeek V4-Flash at 98% cache hit discount is the most aggressive. Anthropic's 90% cache hit discount on longer 1-hour cache writes is the most flexible for production RAG applications.