why use many token when few do trick
Install • Benchmarks • Before/After • Intensity Levels • Compress • Why
A Claude Code skill/plugin and Codex plugin that makes agent talk like caveman — cutting ~75% of output tokens while keeping full technical accuracy. Plus a companion tool that compresses your memory files to cut ~45% of input tokens every session.
Based on the viral observation that caveman-speak dramatically reduces LLM token usage without losing technical substance. So we made it a one-line install.
|
|
|
|
Same fix. 75% less word. Brain still big.
Sometimes too much caveman. Sometimes not enough:
|
|
|
Same answer. You pick how many word.
Real token counts from the Claude API (reproduce it yourself):
| Task | Normal (tokens) | Caveman (tokens) | Saved |
|---|---|---|---|
| Explain React re-render bug | 1180 | 159 | 87% |
| Fix auth middleware token expiry | 704 | 121 | 83% |
| Set up PostgreSQL connection pool | 2347 | 380 | 84% |
| Explain git rebase vs merge | 702 | 292 | 58% |
| Refactor callback to async/await | 387 | 301 | 22% |
| Architecture: microservices vs monolith | 446 | 310 | 30% |
| Review PR for security issues | 678 | 398 | 41% |
| Docker multi-stage build | 1042 | 290 | 72% |
| Debug PostgreSQL race condition | 1200 | 232 | 81% |
| Implement React error boundary | 3454 | 456 | 87% |
| Average | 1214 | 294 | 65% |
Range: 22%–87% savings across prompts.
IMPORTANT
Caveman only affects output tokens — thinking/reasoning tokens are untouched. Caveman no make brain smaller. Caveman make mouth smaller. Biggest win is readability and speed, cost savings are a bonus.
A March 2026 paper "Brevity Constraints Reverse Performance Hierarchies in Language Models" found that constraining large models to brief responses improved accuracy by 26 percentage points on certain benchmarks and completely reversed performance hierarchies. Verbose not always better. Sometimes less word = more correct.
npx skills add JuliusBrussee/caveman
npx skills supports 40+ agents — Claude Code, GitHub Copilot, Cursor, Windsurf, Cline, and more. To install for a specific agent:
npx skills add JuliusBrussee/caveman -a cursor npx skills add JuliusBrussee/caveman -a copilot npx skills add JuliusBrussee/caveman -a cline npx skills add JuliusBrussee/caveman -a windsurf
Or with Claude Code plugin system:
claude plugin marketplace add JuliusBrussee/caveman claude plugin install caveman@caveman
Codex:
/pluginsCavemanInstall once. Use in all sessions after that.
One rock. That it.
Trigger with:
/caveman or Codex $cavemanStop with: "stop caveman" or "normal mode"
Sometimes full caveman too much. Sometimes not enough. Now you pick:
| Level | Trigger | What it do |
|---|---|---|
| Lite | /caveman lite or $caveman lite | Drop filler, keep grammar. Professional but no fluff |
| Full | /caveman full or $caveman full | Default caveman. Drop articles, fragments, full grunt |
| Ultra | /caveman ultra or $caveman ultra | Maximum compression. Telegraphic. Abbreviate everything |
Level stick until you change it or session end.
| Thing | Caveman Do? |
|---|---|
| English explanation | 🪨 Caveman smash filler words |
| Code blocks | ✍️ Write normal (caveman not stupid) |
| Technical terms | 🧠 Keep exact (polymorphism stay polymorphism) |
| Error messages | 📋 Quote exact |
| Git commits & PRs | ✍️ Write normal |
| Articles (a, an, the) | 💀 Gone |
| Pleasantries | 💀 "Sure I'd be happy to" is dead |
| Hedging | 💀 "It might be worth considering" extinct |
┌─────────────────────────────────────┐ │ TOKENS SAVED ████████ 75% │ │ TECHNICAL ACCURACY ████████ 100%│ │ SPEED INCREASE ████████ ~3x │ │ VIBES ████████ OOG │ └─────────────────────────────────────┘
Caveman not dumb. Caveman efficient.
Normal LLM waste token on:
Caveman say what need saying. Then stop.
Caveman makes Claude speak with fewer tokens. Caveman Compress makes Claude read fewer tokens.
Your CLAUDE.md loads on every session start. A 1000-token project memory file costs you tokens every single time you open a project. Caveman Compress rewrites those files into caveman-speak so Claude reads less — without you losing the human-readable original.
/caveman-compress CLAUDE.md
CLAUDE.md ← compressed (Claude reads this every session — fewer tokens) CLAUDE.original.md ← human-readable backup (you read and edit this)
A Python pipeline that shells out to claude --print for the actual compression, then validates the result locally — no tokens wasted on checking.
detect file type (local) → compress with Claude (1 call) → validate (local) ↓ if errors: targeted fix (1 call, cherry-pick only) ↓ retry up to 2×, restore original on failure
Code blocks, inline code, URLs, file paths, commands, headings, table structure, dates, version numbers — anything technical passes through untouched. Only natural language prose gets compressed.
| File | Original | Compressed | Saved |
|---|---|---|---|
claude-md-preferences.md | 706 | 285 | 59.6% |
project-notes.md | 1145 | 535 | 53.3% |
claude-md-project.md | 1122 | 687 | 38.8% |
todo-list.md | 627 | 388 | 38.1% |
mixed-with-code.md | 888 | 574 | 35.4% |
| Average | 898 | 494 | 45% |
| Tool | What it cuts | Savings |
|---|---|---|
| caveman | Output tokens (Claude's responses) | ~65% |
| caveman-compress | Input tokens (memory files loaded per session) | ~45% |
| Both together | The whole conversation | Output + input both shrunk |
See the full caveman-compress README for install, usage, and validation details.
If caveman save you mass token, mass money — leave mass star. ⭐
MIT — free like mass mammoth on open plain.