Your OpenAI bill doubled. You don't know why.
Seven things make OpenAI bills spike. Four of them are invisible from the OpenAI dashboard alone. Here's how to tell which one hit you.
What happened to your bill?
The 7 causes of OpenAI bill spikes
Retry storms
Invisible in OpenAI dashboardOpenAI returns 429 or 503 and your client retries automatically. Each retry is a full new request — same tokens, same cost, zero additional output. A spike of retries can double your bill in a single hour.
Silent cache misses
Invisible in OpenAI dashboardYou think your cache is hitting. But a deploy changed one character in your system prompt, silently invalidating the prefix hash. Every request now pays full price for input tokens that could have been 50% off.
Drifting request shape
Invisible in OpenAI dashboardYour request format changed: more input tokens per call, longer output budget, additional tool schemas. These compound silently — your bill goes up and the usage page shows the aggregate number with no breakdown.
Runaway output tokens
An OpenAI update increased default max_tokens, or your prompt encourages verbose responses. Output tokens are the most expensive part of a request — a 2x increase in output length doubles your per-request cost.
Tool-call overhead
Invisible in OpenAI dashboardMulti-step tool use adds tokens to both input (tool schema + results) and output (tool use formatting). A 5-step loop can add 5,000+ tokens to each request. The usage page shows tokens used — not where they came from.
Model mismatch
GPT-4o tasks going to a model that could have used GPT-4o-mini at 6% of the cost. Most teams can't see per-route model distribution, so over-provisioning is invisible.
o-series reasoning effort at default
Invisible in OpenAI dashboardThe o-series models default to high reasoning effort, which can generate many internal reasoning tokens charged at output rates. If your use case doesn't need maximum reasoning, you're paying for tokens you'll never see.
Four of these are invisible from the OpenAI dashboard
The OpenAI usage page shows tokens and dollars. It doesn't show retries as separate requests. It doesn't show which prompts missed the cache and why. It doesn't show when a deploy changed your request shape and quietly broke prefix caching across your top route.
If your bill jumped and the usage page isn't explaining it — that's because four of the seven causes live outside what the OpenAI dashboard can see.
Stop guessing. Start seeing.
CachePilot gives you a per-request, per-route view of cost, cache source, and retries — content-free. Free tier, connect in 5 minutes.
Connect free →Frequently Asked Questions
Does OpenAI's dashboard show retry counts?
No. OpenAI's usage dashboard aggregates total tokens and requests. A retry that returned 429 shows up as an identical request — there's no Retry-For header or separate retry counting.
How do I know if my cache is actually hitting?
OpenAI's response includes cached_tokens in the usage object, but it doesn't tell you why a cache miss happened or which route has a stable prefix. For that you need per-request prefix hash tracking.
What's the fastest way to reduce my bill?
The highest-leverage quick win is identifying which routes are over-provisioned to GPT-4o when GPT-4o-mini would work. That alone can cut eligible route costs by 94%. But you need per-route telemetry to see that.