Claude Code Silently Burns 40% More Tokens Since v2.1.100

A developer used an HTTP proxy to capture full API requests across four Claude Code versions and found that v2.1.100 adds roughly 20,000 invisible server-side tokens to every request - inflating billing by 40% with no user visibility.

Claude Code Silently Burns 40% More Tokens Since v2.1.100

TL;DR

  • Claude Code v2.1.100 adds ~20,000 invisible tokens to every API request compared to v2.1.98 - server-side, invisible to /context, billed against your quota
  • A developer's HTTP proxy analysis showed v2.1.100 sends fewer bytes but gets billed 20,196 more tokens per request
  • The extra tokens enter the model's actual context window, potentially diluting your CLAUDE.md instructions and degrading quality in long sessions
  • A separate 14-month investigation cataloged 11 confirmed bugs affecting token consumption on Max plans, nine still unfixed
  • Anthropic has acknowledged some issues but stated "none were over-charging you" - the community disagrees

A developer who goes by Adrian-Mteam on GitHub did something simple that nobody at Anthropic apparently thought users would do: he set up an HTTP proxy between Claude Code and Anthropic's API, captured the full request and response bodies, and compared what each version actually sends and receives.

The results, filed as issue #46917 on April 12, tell a clear story.

The numbers

VersionBytes sentTokens billedDelta
v2.1.98169,51449,726baseline
v2.1.100168,536 (-978 B)69,922+20,196 tokens
v2.1.101171,903 (+2,389 B)~72,000+22,274 tokens

The critical finding: v2.1.100 sends fewer bytes than v2.1.98 but gets billed 20,196 more tokens. The inflation is 100% server-side. The User-Agent header (which includes the version number) appears to be the routing mechanism - the same payload, same account, same project produces different token counts depending solely on which version string the client sends.

The methodology is reproducible. Adrian used claude-code-logger as a transparent proxy, tested in --print mode (single API call, no session state, cold cache), and confirmed the pattern across 40+ sessions showing a bimodal distribution: Group A clusters around ~50K tokens (matching v2.1.98) and Group B around ~71K tokens (matching v2.1.100+).

What those 20K tokens actually do

This isn't just a billing issue. The extra 20,000 tokens are classified as cache_creation_input_tokens, which means they enter the model's actual context window. They're not a metadata surcharge or an accounting artifact. They sit in the same context that your code, your CLAUDE.md, and your conversation history occupy.

The implications:

  1. Your instructions get diluted. If you've carefully written a CLAUDE.md to steer Claude's behavior, 20,000 tokens of invisible content you can't see or audit now compete with it for the model's attention.

  2. Quality degrades faster. In long sessions where context fills up, you lose 20K tokens of effective capacity. That's roughly the equivalent of 50 pages of code or documentation that could have been in your context but isn't.

  3. You can't debug it. When Claude ignores your project rules or makes unexpected tool choices, you have no way to know if the cause is invisible server-side context or something else. The /context command doesn't show it.

The connectors problem

Adrian's earlier investigation (issue #45515, filed April 9) found a related but distinct issue: OAuth connectors from claude.ai inject tool schemas invisibly into every API request.

His account with an Asana connector attached sent 190,983 bytes per request. Without Asana: 189,867 bytes. The difference: exactly 28 mcp__claude_ai_Asana__* tool definitions consuming ~22K tokens per prompt. Crucially, disabling the connector in Claude Code's local MCP config did nothing - these are server-side OAuth integrations from a completely separate path.

The fix for this specific issue: go to claude.ai/settings, disconnect unused connectors (Asana, Linear, Jira, etc.), and start a new session. But this only addresses connector bloat, not the version-specific inflation in v2.1.100+.

The broader investigation: 11 bugs, 9 unfixed

Adrian's work intersects with a larger investigation by the pseudonymous researcher ArkNill, who has been monitoring Claude Code's token behavior for 14 months. Using a custom transparent proxy (cc-relay v2), ArkNill analyzed 27,708 requests across 218 sessions and cataloged 11 confirmed bugs affecting token consumption on Max plans. Only two have been fixed.

The unfixed issues include:

  • B3 (False Rate Limiter): The client blocks valid API calls with synthetic error messages - 151 events across 65 sessions where users were told they hit limits they hadn't actually reached
  • B5 (Budget Cap): A silent 200K aggregate tool result limit causes context truncation without warning - 167,818 events documented
  • B8 (JSONL Duplication): Extended thinking mode duplicates conversation entries, inflating context by 2.37x on average (max 4.42x observed)
  • B9 (/branch Inflation): The /branch command duplicates messages, inflating context from 6% to 73% in a single operation
  • B10 (TaskOutput Thrashing): A deprecation message injects 21x the expected context (87K vs 4K baseline)
  • B11 (Zero Reasoning): Adaptive thinking emits no reasoning tokens under certain conditions, triggering fabricated output - acknowledged by Anthropic

Since March 2026, Max subscribers have reported quota exhaustion in as little as 19 minutes instead of the expected 5 hours. The ~20K per-request inflation explains part of the gap: on a clean project, that's roughly 40% overhead on every turn. Compound it over a session with the other bugs - JSONL duplication at 2.37x, branch inflation at 12x - and the numbers spiral.

Anthropic's response

Anthropic's Lydia Hallie commented on April 2: "Peak-hour limits are tighter and 1M-context sessions got bigger, that's most of what you're feeling. We fixed a few bugs along the way, but none were over-charging you."

The community response has been skeptical. The issue has 68 thumbs-up reactions. Multiple commenters have requested an urgent fix. Anthropic assigned issue #46917 to engineer Daniel Hudson (notitatall), but as of April 13, there's no fix or further official comment on the version-specific inflation.

The workarounds

For now, users have three options:

Downgrade to v2.1.98:

npx [email protected]

This bypasses the server-side inflation since the version-specific routing appears tied to the User-Agent string. However, auto-updates may overwrite the older binary, and v2.1.98 misses subsequent bug fixes.

Spoof the User-Agent header (suggested by community member @fabifont):

export ANTHROPIC_CUSTOM_HEADERS='User-Agent: claude-cli/2.1.98 (external, sdk-cli)'

This keeps the newer client code but tells the server to use the older, less inflated path. Unverified whether this actually changes billing behavior or just the User-Agent string.

Disconnect unused OAuth connectors: Go to claude.ai/settings > Connectors and disconnect anything you don't actively need in terminal sessions. This addresses the ~22K connector-specific bloat separately from the version inflation.

What's actually in those 20K tokens

Nobody outside Anthropic knows. The content is injected server-side and never appears in the request payload, the /context output, or any client-side log. Plausible candidates include expanded system prompts, safety classifiers, capability declarations for server-side tool routing, or telemetry instrumentation. The ArkNill investigation noted that a system prompt section containing "straight to the point" / "do not overdo" language disappeared from local JSONL session files after April 10 - suggesting Anthropic is actively modifying server-side system content without documenting changes.

For a product where users pay $100-200/month for finite token budgets, the inability to audit what consumes those tokens is the core issue. As Adrian put it in his issue: users are "paying more for potentially degraded output with no visibility."


Sources:

Claude Code Silently Burns 40% More Tokens Since v2.1.100
About the author AI Infrastructure & Open Source Reporter

Sophie is a journalist and former systems engineer who covers AI infrastructure, open-source models, and the developer tooling ecosystem.