Free Diagnostic

Your OpenAI bill doubled. You don't know why.

Seven things make OpenAI bills spike. Four of them are invisible from the OpenAI dashboard alone. Here's how to tell which one hit you.

What happened to your bill?

The 7 causes of OpenAI bill spikes

01

Retry storms

Invisible in OpenAI dashboard

OpenAI returns 429 or 503 and your client retries automatically. Each retry is a full new request — same tokens, same cost, zero additional output. A spike of retries can double your bill in a single hour.

Cost impact2–5x multiplier during degraded API conditions
FixAdd per-request retry counting and alert on retry rate > 5%. Use CachePilot's retry telemetry to see retry-as-request-count in production.
02

Silent cache misses

Invisible in OpenAI dashboard

You think your cache is hitting. But a deploy changed one character in your system prompt, silently invalidating the prefix hash. Every request now pays full price for input tokens that could have been 50% off.

Cost impact50% more on input tokens for affected routes
FixTrack prefix_hash per request. Alert when the hash changes on a route that previously had a stable hash. This is what CachePilot's drift detection does automatically.
03

Drifting request shape

Invisible in OpenAI dashboard

Your request format changed: more input tokens per call, longer output budget, additional tool schemas. These compound silently — your bill goes up and the usage page shows the aggregate number with no breakdown.

Cost impact10–40% gradual increase from structural drift
FixLog request shape metadata: input token count, tool schema version, output budget. Track these per route over time.
04

Runaway output tokens

An OpenAI update increased default max_tokens, or your prompt encourages verbose responses. Output tokens are the most expensive part of a request — a 2x increase in output length doubles your per-request cost.

Cost impactUp to 10x if max_output was raised significantly
FixSet explicit output budgets per route. Track output_tokens per request and alert on statistical outliers.
05

Tool-call overhead

Invisible in OpenAI dashboard

Multi-step tool use adds tokens to both input (tool schema + results) and output (tool use formatting). A 5-step loop can add 5,000+ tokens to each request. The usage page shows tokens used — not where they came from.

Cost impactVaries by call depth, but often 20–100% token inflation
FixSeparate tool-call token costs from raw completion costs. Track tool call count per request as a separate dimension.
06

Model mismatch

GPT-4o tasks going to a model that could have used GPT-4o-mini at 6% of the cost. Most teams can't see per-route model distribution, so over-provisioning is invisible.

Cost impactUp to 17x over-provisioning on eligible requests
FixAudit which routes use which models. Route appropriate requests to cheaper models. Use CachePilot's per-route telemetry to identify over-provisioning.
07

o-series reasoning effort at default

Invisible in OpenAI dashboard

The o-series models default to high reasoning effort, which can generate many internal reasoning tokens charged at output rates. If your use case doesn't need maximum reasoning, you're paying for tokens you'll never see.

Cost impact2–5x extra vs. 'low' reasoning effort on the same request
FixSet reasoning_effort explicitly to 'low' or 'medium' unless you need maximum reasoning. Track reasoning token counts separately.

Four of these are invisible from the OpenAI dashboard

The OpenAI usage page shows tokens and dollars. It doesn't show retries as separate requests. It doesn't show which prompts missed the cache and why. It doesn't show when a deploy changed your request shape and quietly broke prefix caching across your top route.

If your bill jumped and the usage page isn't explaining it — that's because four of the seven causes live outside what the OpenAI dashboard can see.

Stop guessing. Start seeing.

CachePilot gives you a per-request, per-route view of cost, cache source, and retries — content-free. Free tier, connect in 5 minutes.

Connect free →

Frequently Asked Questions

Does OpenAI's dashboard show retry counts?

No. OpenAI's usage dashboard aggregates total tokens and requests. A retry that returned 429 shows up as an identical request — there's no Retry-For header or separate retry counting.

How do I know if my cache is actually hitting?

OpenAI's response includes cached_tokens in the usage object, but it doesn't tell you why a cache miss happened or which route has a stable prefix. For that you need per-request prefix hash tracking.

What's the fastest way to reduce my bill?

The highest-leverage quick win is identifying which routes are over-provisioned to GPT-4o when GPT-4o-mini would work. That alone can cut eligible route costs by 94%. But you need per-route telemetry to see that.

Cost is the symptom. Visibility is the fix. CachePilot gives you per-request cost telemetry with no prompts stored.

Start free →