
Cloud GPU Rental Pricing Compared - April 2026
Raw GPU rental rates across 20+ providers normalized to per-GPU-hour - H100, H200, A100, L40S, RTX 4090, on-demand vs spot vs reserved, with hidden costs and value-tier recommendations.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Raw GPU rental rates across 20+ providers normalized to per-GPU-hour - H100, H200, A100, L40S, RTX 4090, on-demand vs spot vs reserved, with hidden costs and value-tier recommendations.

Per-image cost comparison for vision APIs across OpenAI, Anthropic, Google, Mistral, Meta Llama 4, xAI, Amazon Nova, and open-source models - with cost-at-scale math for OCR and document processing workloads.

Per-query pricing for search APIs used in AI agents and RAG pipelines - Brave, Tavily, Exa, SerpAPI, Serper, Perplexity Sonar, You.com, Jina Reader, Firecrawl, and more compared at 10k, 100k, and 1M queries.

Per-minute and per-1000-minute transcription API pricing across OpenAI Whisper, Deepgram Nova-3, AssemblyAI, Google Chirp 2, Azure, AWS Transcribe, Groq, ElevenLabs Scribe, and more.

Normalized per-1M-character and per-hour TTS pricing across ElevenLabs, OpenAI, Google, Azure, Amazon Polly, Play.ht, Cartesia, Deepgram Aura, WellSaid, and more.

A 755-point Hacker News thread and a 139-upvote GitHub issue document Claude Code Pro Max 5x users exhausting their quota in 1.5 hours. An independent investigation with 1,500 logged API calls reveals the math behind the drain.

A developer used an HTTP proxy to capture full API requests across four Claude Code versions and found that v2.1.100 adds roughly 20,000 invisible server-side tokens to every request - inflating billing by 40% with no user visibility.

OpenAI released GPT-5.4 mini and nano on March 17, bringing near-flagship performance at 70% and 92% lower cost respectively.

Google's Gemini 3.1 Flash-Lite delivers frontier-class benchmarks at a fraction of the cost of Pro - but a sluggish first-token response and preview-only status mean it's not for every workload.

Anthropic is doubling Claude usage limits outside 8 AM-2 PM ET through March 27 for Free, Pro, Max, and Team plans across web, desktop, mobile, Claude Code, and Excel.

Anthropic made the 1M-token context window generally available for Claude Opus 4.6 and Sonnet 4.6, dropping the long-context pricing premium entirely - a 900K-token request now costs the same per token as a 9K one.

At BlackRock's Infrastructure Summit, Sam Altman laid out OpenAI's vision of selling AI compute by the token like a metered utility, backed by a $110 billion war chest and the Stargate buildout.