Claude Code Max Users Are Burning Through Quotas in 90 Minutes

A 755-point Hacker News thread and a 139-upvote GitHub issue document Claude Code Pro Max 5x users exhausting their quota in 1.5 hours. An independent investigation with 1,500 logged API calls reveals the math behind the drain.

Claude Code Max Users Are Burning Through Quotas in 90 Minutes

TL;DR

  • Claude Code Pro Max 5x ($100/month) users report exhausting their 5-hour quota window in as little as 90 minutes during normal development (755 points on HN, 139 upvotes on GitHub)
  • An independent investigation logging 1,500+ API calls found the best-fit model is that cache_read tokens don't count toward quota - but 1M context windows, cache misses after 1-hour TTL, background sessions, and auto-compacts create massive token spikes
  • A single auto-compact on a near-full context sends ~966K tokens as cache_creation - the most expensive call happens automatically without user action
  • Anthropic's Boris Cherny acknowledged the problem and said the team is investigating defaulting to 400K context instead of 1M
  • A preliminary finding suggests peak hours cost 25-35% more quota per token-equivalent than off-peak

The Claude Code quota problem we covered two days ago (the 20K invisible token inflation in v2.1.100) is one piece of a larger puzzle. A GitHub issue filed April 9 has become the central gathering point for Pro Max users documenting exactly how fast their quotas drain and why.

The issue has 139 upvotes. The Hacker News thread has 755 points and 656 comments. Anthropic has responded, but the core problem isn't fixed.

The report that started it

User JoeyChen (issue #45756) documented two consecutive quota windows with precise token counts:

Window 1 (5 hours, heavy development): 2,715 API calls, 1,044M cache read tokens, 1.15M output tokens. The session included Express server work, iOS app development, multi-agent coordination, and two auto-compacts. Peak context hit 966,078 tokens. This consumed the full quota over 5 hours. Normal.

Window 2 (1.5 hours, moderate use): 222 API calls in the active session, 91K output tokens, peak context 182,302. Plus two background sessions left open in other terminals that made 469 calls consuming 80.7M cache read tokens. Quota exhausted in 90 minutes. Not normal.

The math problem

JoeyChen's core question: how are cache_read tokens counted against quota?

If cache_read counts at 1/10 rate (matching Anthropic's billing discount), Window 2's 13.1M effective tokens in 1.5 hours shouldn't exhaust a quota that handled 24.4M/hour in Window 1.

If cache_read counts at full rate, the 103.9M raw tokens would explain the drain - but it would mean prompt caching provides zero benefit for rate limiting, only for billing.

The independent investigation

Community member @cnighswonger built an interceptor proxy (claude-code-cache-fix) and logged 1,500+ API calls across six consecutive 5-hour quota windows on April 10. He tested three hypotheses:

HypothesisImplied 100% quotaConsistency (CV)
cache_read = 0.0x (doesn't count)~4.66M token-equivalents34.4%
cache_read = 0.1x (billing rate)~24.4M tok-eq101.6%
cache_read = 1.0x (full rate)~201.6M tok-eq123.7%

The cache_read = 0.0x model fits best. Cache read tokens appear to not count toward quota at all - the quota is driven by uncached input (1x), output (5x weight, reflecting Opus's output pricing), and cache creation (2x).

This means the culprit isn't cache_read counting at full rate. It's everything else:

Four things eating your quota

1. The 1M context window trap

Claude Code defaults to a 1M token context window on Max plans. Each API call at near-full context sends ~960K tokens. Even with prompt caching, the first call after a cache miss (which happens every hour when the 1-hour cache TTL expires) creates nearly 1M tokens of cache_creation at 2x quota weight. That's roughly 2M token-equivalents from a single call.

2. Background sessions drain the same pool

Sessions left open in other terminals keep making API calls - compacts, retros, hook processing. JoeyChen's two background sessions made 469 calls consuming 80.7M cache read tokens without any active user interaction. Even at 0x quota weight for cache_read, the cache_creation and output from those sessions still count.

3. Auto-compact spikes

When context approaches the limit, Claude Code triggers auto-compact. This sends the entire pre-compact context (~966K tokens) as cache_creation. It's the single most expensive operation, and it happens automatically. JoeyChen observed context growing from 32K to 783K before the first auto-compact, then 39K to 966K before the second.

4. Peak-hour multiplier (unconfirmed)

cnighswonger's preliminary data suggests weekday peak hours (13:00-19:00 UTC) consume 25-35% more quota per token-equivalent than off-peak. This hasn't been confirmed with enough data, but if real, it means the same session costs more at 2pm than at 2am.

Anthropic's response

Boris Cherny from the Claude Code team posted a pinned response on April 12:

We've been investigating these reports, and a few of the top issues we've found are: Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss.

His proposed mitigations:

  • UX improvements to nudge users to /clear before continuing stale sessions
  • Investigating defaulting to 400K context instead of 1M, with an option to configure up to 1M
  • Continued investigation into specific cases

No timeline was given for the context default change.

The community tool

cnighswonger released his analysis tool in claude-code-cache-fix v1.6.4+:

npm install -g claude-code-cache-fix
node $(npm root -g)/claude-code-cache-fix/tools/quota-analysis.mjs

It tests all three counting hypotheses against your own usage.jsonl data and reports which model best fits your observed quota depletion. If enough users run it, the community can confirm or refine the quota formula.

The bigger picture

This issue connects directly to the v2.1.100 phantom token problem and the broader ArkNill investigation that documented 11 bugs affecting token consumption. The common thread: Anthropic is selling a product with opaque resource accounting. Users can't see what's consuming their quota, can't control auto-compact timing, can't prevent background sessions from draining shared pools, and can't audit the quota formula.

For a $100-200/month subscription, that opacity is becoming the product's central friction point.


Sources:

Claude Code Max Users Are Burning Through Quotas in 90 Minutes
About the author AI Infrastructure & Open Source Reporter

Sophie is a journalist and former systems engineer who covers AI infrastructure, open-source models, and the developer tooling ecosystem.