OpenAI Prompt Caching: Complete Technical Guide

Published April 2025 · 7 min read

What Is Prompt Caching?

Prompt caching lets you store large, static portions of your prompt (system messages, documents, instructions) in OpenAI's servers. Subsequent requests using the same cached content cost 90% less — tokens are charged at 10% of normal rate.

Cost comparison: 100K system message tokens cost 100,000 tokens (normal) vs. 100 tokens (cached, just the cache creation cost) + minimal retrieval cost.

How Caching Works

Request 1: Send full prompt + request. OpenAI stores cacheable content (marked with cache control headers). You pay 100% for all tokens.
Request 2+: Send same cacheable content. OpenAI verifies hash match, retrieves from cache. You pay 10% of token cost.
Cache persists for 5 minutes (minimum). Reset if content changes.

Best Use Cases

Document Q&A: Cache the full document + system prompt, vary only the question.
Multi-turn conversations: Cache system message + conversation history, new user input = cheap.
Code review: Cache large codebase + review guidelines, vary only the file being reviewed.
Content moderation: Cache policies + examples, vary incoming content.

Important Caveats

Cache hits require exact content match (byte-for-byte). Whitespace, formatting, or subtle changes break the cache.
Only supported in GPT-4o, GPT-4 Turbo, and newer models.
5-minute minimum cache lifetime; you can't force immediate eviction.
Cache size limits apply (check OpenAI docs for your model).

CachePilot automates prompt caching decisions so you don't have to manually track cache hashes. Learn more in our docs.