Caveman Plugin Hits 34K Stars by Making Claude Talk Like a Caveman
The Caveman plugin for Claude Code cuts 65-87% of output tokens by stripping filler words and using terse fragment-based language. Its new Classical Chinese mode pushes compression even further.

TL;DR
- Caveman is a Claude Code plugin (33.8K GitHub stars) that cuts 65-87% of output tokens by making Claude respond in terse, fragment-based language
- New Classical Chinese (文言文) modes achieve 80-90% character reduction using literary Chinese syntax where subjects are omitted and verbs precede objects
- Benchmarked across 11 dev tasks: average 294 tokens per response vs 1,214 in normal mode - a 65% drop with zero loss in technical accuracy
- The
/caveman:compresscommand also reduces input tokens by ~46% by rewriting memory files into caveman-speak - A March 2026 research paper found brevity constraints actually improved model accuracy by 26 percentage points on certain benchmarks
The most popular Claude Code plugin right now doesn't add a new capability. It removes words.
Caveman, by Julius Brussee, makes Claude respond in terse, fragment-based sentences that strip articles, filler words, pleasantries, and hedging while preserving technical accuracy. It hit 33,800 GitHub stars and just shipped its most aggressive compression mode yet: Classical Chinese.
What it does
Normal Claude explaining a React re-render (69 tokens):
"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle..."
Caveman Claude (19 tokens):
"New object ref each render. Inline object prop = new ref = re-render. Wrap in
useMemo."
Same information. 87% fewer tokens. The pattern is consistent: [thing] [action] [reason]. [next step].
Code blocks, URLs, file paths, commands, and technical content stay untouched. Caveman only compresses the natural language around them.
The benchmarks
| Task | Normal | Caveman | Saved |
|---|---|---|---|
| React re-render explanation | 1,180 | 159 | 87% |
| Auth middleware token expiry | 704 | 121 | 83% |
| PostgreSQL connection pool | 2,347 | 380 | 84% |
| React error boundary | 3,454 | 456 | 87% |
| Average across 11 tasks | 1,214 | 294 | 65% |
The savings range from 22% (already-terse responses) to 87% (explanatory content). Code-heavy tasks save less; explanation-heavy tasks save dramatically more.
An important caveat: Caveman only affects output tokens. Thinking and reasoning tokens are untouched. As the README puts it: "Caveman no make brain smaller. Caveman make mouth smaller."
The Classical Chinese twist
The newest modes replace English caveman-speak with 文言文 (wenyanwen) - Classical Chinese literary syntax. Six intensity levels are now available:
| Mode | Style | Compression |
|---|---|---|
| lite | Filler removed, grammar preserved | ~40% |
| full (default) | Articles dropped, fragments, terse | ~65% |
| ultra | Telegraphic, abbreviated everything | ~75% |
| wenyan-lite | Classical Chinese, readable | ~70% |
| wenyan-full | Full literary Chinese | ~80% |
| wenyan-ultra | Maximum compression classical | ~90% |
An example of wenyan-lite output:
"組件頻重繪,以每繪新生對象參照故。以 useMemo 包之。"
And wenyan-ultra:
"新參照→重繪。useMemo Wrap。"
Classical Chinese achieves extreme compression because the language is inherently denser than English: subjects are routinely omitted, verbs precede objects directly, and classical particles (之/乃/為/其) replace multi-word constructions. The trade-off is readability - wenyan modes are useful for developers who read Chinese or who are optimizing purely for token budget rather than human comprehension.
Input compression too
Beyond output, the /caveman:compress command rewrites your project's memory files (CLAUDE.md, session context) into caveman-speak, cutting input tokens by ~46% (range: 36-59%). It preserves the originals so you can revert. For long sessions where context window inflation is already eating your budget, shaving 46% off the input side is substantial.
Why this works
A March 2026 paper ("Brevity Constraints Reverse Performance Hierarchies") found that constraining models to brief responses didn't just save tokens - it improved accuracy by 26 percentage points on certain benchmarks. The hypothesis: when models are forced to be terse, they skip the hedging and filler that introduces ambiguity and go straight to the correct answer.
This tracks with what Caveman users report. The compressed responses aren't just shorter - they're often clearer because there's no space for the "it depends" and "there are several approaches" padding that LLMs default to.
Installation
One command for Claude Code:
claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman
Works with Cursor, Windsurf, Cline, Copilot, and Gemini via npx skills add. Activate with /caveman, /caveman ultra, or /caveman wenyan-full. Deactivate with "normal mode."
For anyone paying per token or hitting Max plan usage limits, a 65% output reduction is the single highest-impact optimization available that doesn't require changing models or workflows.
Sources:
