How Prompt Caching Cuts Your API Costs by Up to 90%

Overclock Team·March 24, 2026

What is prompt caching?

Prompt caching is an Anthropic API feature that lets you mark certain parts of your prompt as cacheable. When the API sees the same content in a subsequent request, it reads from cache instead of reprocessing — and charges you just 10% of the normal input token price for cached tokens.

Overclock configures this automatically. You don't need to do anything — caching is built into how we structure API requests.

What gets cached?

In a typical Overclock session, your API requests contain several types of content:

System prompt — The instructions that tell Claude how to behave as a coding assistant
Tools definition — The specifications for all available tools (Read, Write, Edit, Bash, Glob, Grep, etc.)
CLAUDE.md — Your project-specific instructions file
Conversation history — Previous messages in the session
Current message — Your latest prompt

Items 1–3 are static — they don't change between prompts in the same session. This is the content that benefits most from caching.

The two TTL tiers

Anthropic's caching system has two time-to-live (TTL) tiers:

5-minute TTL (standard cache)

Most cached content has a 5-minute TTL. If you send another request within 5 minutes, the cached tokens are read at 10% cost. The timer resets with each cache hit, so continuous coding keeps the cache warm.

1-hour TTL (extended cache)

Certain content — including content marked with the extended cache control — gets a 1-hour TTL. Overclock uses this for your CLAUDE.md file, which means your project context survives longer gaps between prompts. Step away for lunch, come back, and your CLAUDE.md is still cached.

The cost math

Let's look at a concrete example. A typical Overclock request might contain:

Component	Tokens	Without caching	With caching
System prompt	~2,000	$0.030	$0.003
Tools definition	~15,000	$0.225	$0.023
CLAUDE.md	~3,000	$0.045	$0.005
Conversation history	~5,000	$0.075	$0.075
Current message	~500	$0.008	$0.008
Total input	~25,500	$0.383	$0.114

Prices based on Claude Sonnet 4 input pricing at $15/MTok. Cached read at $1.50/MTok.

That's a 70% reduction on this single request. The savings are even greater when the tools definition is larger or when conversation history starts getting cached too.

Over a full session of 20+ prompts, the cumulative savings on static content are substantial — often reaching 80–90% reduction in input token costs.

How Overclock configures it

You don't need to set up anything. Overclock automatically:

Marks the system prompt with cache control headers
Marks the tools definition with cache control headers
Marks your CLAUDE.md with extended (1-hour) cache control
Structures requests so cached content appears first (cache requires prefix matching)

The first request in a session pays full price for input tokens plus a small cache write fee. Every subsequent request benefits from cached reads.

Cache write costs

There's a small premium for the initial cache write — Anthropic charges 25% more than the normal input price for the first time content is cached. But this is a one-time cost per cache entry, and it pays for itself after just 2 subsequent requests.

What you see in the dashboard

Your Overclock usage dashboard at overclock.win shows:

Cache read tokens — How many tokens were read from cache (at 10% cost)
Cache write tokens — How many tokens were written to cache (at 125% cost)
Uncached input tokens — Tokens that weren't cached (full price)
Effective savings — The percentage you saved compared to no caching

This transparency is part of the BYOK model — you see exactly what you're paying for.

Tips for maximizing cache savings

Keep sessions alive: The cache stays warm as long as you keep prompting. Rapid-fire prompts in a focused session maximize savings.
Write a good CLAUDE.md: Since it gets the 1-hour TTL, a comprehensive CLAUDE.md means more context is cached for longer.
Don't worry about prompt length: Longer static context actually benefits more from caching, since the absolute savings are larger.

Prompt caching is one of the key reasons the BYOK model works so well. You're paying Anthropic directly at their rates, and caching ensures those rates are as low as possible.