Why Enterprises Are Making AI Talk Like a Caveman To Cut Costs

Legrand memos and Uber budget blowouts show firms cutting AI verbosity costs, though prose tokens often represent under 5% of total spend

Jun 30, 2026

2 min read

Image: Deposit Photos | Edited by: Gadget Review

Key Takeaways

Caveman plugin cuts AI output tokens by 65–75%, but real session savings may reach only 4–5%.
Enterprises like Uber and Walmart cap AI usage after budgets collapse under unexpected token costs.
Structural fixes like prompt pruning and RAG outperform politeness-stripping as true token budget strategies.

Enterprise AI coding assistants are being rationed — not because they failed, but because they succeeded too well. Uber reportedly burned through its entire annual AI budget in four months, according to 404 Media. Walmart introduced usage caps. GitHub Copilot Business shifted from flat subscriptions to per-token billing. Tokens — roughly three-quarters of a word each — are the billing unit for every LLM interaction. Input and output both count. Every “Certainly, happy to help!” from Claude costs real money at scale. Enter caveman. The scale of investment driving these pressures is underscored by the Stargate Project, a $500 billion AI infrastructure initiative that signals just how much is at stake.

Brain Still Big. Mouth Small.

A lightweight plugin strips AI pleasantries while preserving every line of code.

Developer Julius Brussee built caveman after noticing how much token spend disappeared into hedging, transitions, and chatbot politeness inside agent loops. The tool is a simple markdown config file compatible with Claude Code, Codex, Gemini, and over 30 other coding agents — installed with a single command (npx skills add JuliusBrussee/caveman). “It makes the model speak less like a polite chatbot and more like a terse tool,” Brussee told 404 Media. “Same substance, fewer words.”

The numbers hold up — partially. Brussee’s tests showed 65–75% output token reduction versus default verbose output. Elastic Labs independently measured 63.6% average reduction across eight Elasticsearch scenarios with zero accuracy loss. A separate technical walkthrough found roughly 45% output savings and approximately 39% cost reduction. The honest caveat: one deeper analysis found that in typical coding sessions, prose accounts for a small fraction of total tokens, so real-world session savings may land closer to 4–5%.

When “Please” Costs Tens of Millions

The token bill problem extends far beyond chatty AI responses.

Sam Altman has noted that users typing “please” and “thank you” into LLMs collectively costs OpenAI tens of millions in electricity. Legrand, an electrical and data center infrastructure company, distributed an internal memo — obtained by 404 Media — explicitly listing caveman as one of four high-impact cost practices, alongside avoiding powerful models and high reasoning settings by default. Uber’s CTO capped employee AI usage after that four-month budget blowout. Walmart followed with its own restrictions.

Critics rightly point out that output tokens are often the smaller cost driver. Long input contexts, bloated prompt histories, and agent loops burning tokens in the background do more damage. Structural fixes matter more:

Prompt pruning
RAG (injecting only relevant data instead of entire databases)
Small-model routing for intake tasks
Token caching at roughly 10% of standard input price

Caveman is useful. It is not a budget strategy on its own. Teams looking for broader efficiency gains may also benefit from exploring AI-Powered Websites that complement these cost-saving approaches.

Something telling is happening regardless. OpenAI’s director of engineering Shayne Sweeney contributed Codex plugin support directly to the caveman repository. Engineers at Nvidia and GitHub are reportedly experimenting with it. Formal AI style guides specifying token budgets per workflow — and a new specialty called “token economist“ — may arrive sooner than anyone planned.