
LLM API Pricing Comparison - June 2026
Verified June 8: Ministral 3B cheapest at $0.04/MTok, DeepSeek V4 Flash best value at $0.14, Claude Opus 4.8 Fast Mode cut to $10/$50, Mistral Large 3 corrected to $0.50/$1.50.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Verified June 8: Ministral 3B cheapest at $0.04/MTok, DeepSeek V4 Flash best value at $0.14, Claude Opus 4.8 Fast Mode cut to $10/$50, Mistral Large 3 corrected to $0.50/$1.50.

Verified June 2026: real cost per million tokens for self-hosting Llama 4 Scout, Maverick, Qwen3-235B, and DeepSeek V3.2 - GPU requirements, cost formulas, and when cheap APIs actually win.

Q2 2026 AI API pricing review: DeepSeek V4 hits the API, GPT-5.5 launches at $5/1M, and overall token costs are down 60-80% year-over-year - but a hidden tokenizer change at Anthropic quietly raised effective prices.

Amazon Q Lite stays cheapest at $3/user, ChatGPT Business dropped to $20 annual, and Anthropic shifted Enterprise to pure usage-based billing.

Updated May 2026: DeepSeek V4-Flash reasoning now $0.28/MTok output (8x cheaper than R1), o3-pro launched at $20/$80, Grok 4 retires May 15 - verified pricing across 11 models.

May 2026: Together AI adds Llama 4 and DeepSeek fine-tuning, Fireworks raised deployment prices $1/hr, and H100 rentals fell to under $2.40/hr.

Per-image API costs for GPT Image 2, FLUX.2 Pro, Imagen 4, Ideogram v3, Stable Diffusion, and more - with price corrections and new additions for April 2026.

Normalized per-second pricing for Sora 2, Veo 3, Runway Gen-4, Kling 2.x, Luma Ray2, Seedance 2, and more - Kling and Haiper lead on cost.

True cost breakdown of commercial agent frameworks and platforms - LangGraph, CrewAI, AutoGen, E2B, Modal, Fly.io, and more at 1k, 100k, and 1M runs, including LLM passthrough costs.

Raw GPU rental rates across 20+ providers normalized to per-GPU-hour - H100, H200, A100, L40S, RTX 4090, on-demand vs spot vs reserved, with hidden costs and value-tier recommendations.

Per-image cost comparison for vision APIs across OpenAI, Anthropic, Google, Mistral, Meta Llama 4, xAI, Amazon Nova, and open-source models - with cost-at-scale math for OCR and document processing workloads.

Per-query pricing for search APIs used in AI agents and RAG pipelines - Brave, Tavily, Exa, SerpAPI, Serper, Perplexity Sonar, You.com, Jina Reader, Firecrawl, and more compared at 10k, 100k, and 1M queries.