Caveman Plugin Hits 34K Stars by Making Claude Talk Like a Caveman

The Caveman plugin for Claude Code cuts 65-87% of output tokens by stripping filler words and using terse fragment-based language. Its new Classical Chinese mode pushes compression even further.

Caveman Plugin Hits 34K Stars by Making Claude Talk Like a Caveman

TL;DR

  • Caveman is a Claude Code plugin (33.8K GitHub stars) that cuts 65-87% of output tokens by making Claude respond in terse, fragment-based language
  • New Classical Chinese (文言文) modes achieve 80-90% character reduction using literary Chinese syntax where subjects are omitted and verbs precede objects
  • Benchmarked across 11 dev tasks: average 294 tokens per response vs 1,214 in normal mode - a 65% drop with zero loss in technical accuracy
  • The /caveman:compress command also reduces input tokens by ~46% by rewriting memory files into caveman-speak
  • A March 2026 research paper found brevity constraints actually improved model accuracy by 26 percentage points on certain benchmarks

The most popular Claude Code plugin right now doesn't add a new capability. It removes words.

Caveman, by Julius Brussee, makes Claude respond in terse, fragment-based sentences that strip articles, filler words, pleasantries, and hedging while preserving technical accuracy. It hit 33,800 GitHub stars and just shipped its most aggressive compression mode yet: Classical Chinese.

What it does

Normal Claude explaining a React re-render (69 tokens):

"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle..."

Caveman Claude (19 tokens):

"New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."

Same information. 87% fewer tokens. The pattern is consistent: [thing] [action] [reason]. [next step].

Code blocks, URLs, file paths, commands, and technical content stay untouched. Caveman only compresses the natural language around them.

The benchmarks

TaskNormalCavemanSaved
React re-render explanation1,18015987%
Auth middleware token expiry70412183%
PostgreSQL connection pool2,34738084%
React error boundary3,45445687%
Average across 11 tasks1,21429465%

The savings range from 22% (already-terse responses) to 87% (explanatory content). Code-heavy tasks save less; explanation-heavy tasks save dramatically more.

An important caveat: Caveman only affects output tokens. Thinking and reasoning tokens are untouched. As the README puts it: "Caveman no make brain smaller. Caveman make mouth smaller."

The Classical Chinese twist

The newest modes replace English caveman-speak with 文言文 (wenyanwen) - Classical Chinese literary syntax. Six intensity levels are now available:

ModeStyleCompression
liteFiller removed, grammar preserved~40%
full (default)Articles dropped, fragments, terse~65%
ultraTelegraphic, abbreviated everything~75%
wenyan-liteClassical Chinese, readable~70%
wenyan-fullFull literary Chinese~80%
wenyan-ultraMaximum compression classical~90%

An example of wenyan-lite output:

"組件頻重繪,以每繪新生對象參照故。以 useMemo 包之。"

And wenyan-ultra:

"新參照→重繪。useMemo Wrap。"

Classical Chinese achieves extreme compression because the language is inherently denser than English: subjects are routinely omitted, verbs precede objects directly, and classical particles (之/乃/為/其) replace multi-word constructions. The trade-off is readability - wenyan modes are useful for developers who read Chinese or who are optimizing purely for token budget rather than human comprehension.

Input compression too

Beyond output, the /caveman:compress command rewrites your project's memory files (CLAUDE.md, session context) into caveman-speak, cutting input tokens by ~46% (range: 36-59%). It preserves the originals so you can revert. For long sessions where context window inflation is already eating your budget, shaving 46% off the input side is substantial.

Why this works

A March 2026 paper ("Brevity Constraints Reverse Performance Hierarchies") found that constraining models to brief responses didn't just save tokens - it improved accuracy by 26 percentage points on certain benchmarks. The hypothesis: when models are forced to be terse, they skip the hedging and filler that introduces ambiguity and go straight to the correct answer.

This tracks with what Caveman users report. The compressed responses aren't just shorter - they're often clearer because there's no space for the "it depends" and "there are several approaches" padding that LLMs default to.

Installation

One command for Claude Code:

claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman

Works with Cursor, Windsurf, Cline, Copilot, and Gemini via npx skills add. Activate with /caveman, /caveman ultra, or /caveman wenyan-full. Deactivate with "normal mode."

For anyone paying per token or hitting Max plan usage limits, a 65% output reduction is the single highest-impact optimization available that doesn't require changing models or workflows.


Sources:

Caveman Plugin Hits 34K Stars by Making Claude Talk Like a Caveman
About the author AI Infrastructure & Open Source Reporter

Sophie is a journalist and former systems engineer who covers AI infrastructure, open-source models, and the developer tooling ecosystem.