OpenAI API Cost Calculator
Model-by-model pricing for the Responses API, Chat Completions, and the o-series. Input tokens, output tokens, cache discount — all accounted for.
Estimate Your Monthly Spend
This is the ideal. Want to see the real number from your production traffic?
See the real number → Connect CachePilot freeRetry waste
OpenAI's usage page doesn't distinguish retries from original requests. A retry storm can multiply your bill 2–5x without any additional useful output.
Silent cache misses
You think your cache is hitting. But a one-character change in your system prompt after a deploy silently invalidates the cache across every route that shares it. OpenAI's dashboard can't show you that.
Tool-call overhead
Every tool call adds tokens to both input and output. Multi-step loops compound this. The calculator handles the happy path — production traffic rarely stays there.
Current OpenAI Pricing (Standard Tier)
| Model | Input / 1M | Cached / 1M | Output / 1M |
|---|---|---|---|
gpt-5 | $1.25 | $0.13 | $10.00 |
gpt-5-mini | $0.25 | $0.03 | $2.00 |
gpt-4.1 | $2.00 | $0.50 | $8.00 |
gpt-4.1-mini | $0.40 | $0.10 | $1.60 |
gpt-4o | $2.50 | $1.25 | $10.00 |
gpt-4o-mini | $0.15 | $0.07 | $0.60 |
o1 | $15.00 | $7.50 | $60.00 |
o3 | $2.00 | $0.50 | $8.00 |
o4-mini | $1.10 | $0.28 | $4.40 |
o3-mini | $1.10 | $0.55 | $4.40 |
Full pricing for all models available via API. Batch pricing is 50% off input tokens.
How Caching Changes the Math
Prompt caching discounts cached input tokens. Here's what that looks like at your current inputs:
The real question isn't "what could I save with caching?" — it's "what is my actual hit rate in production right now?" Your savings depend on your token mix (input vs. output ratio) and your real hit rate. CachePilot shows you exactly both, per route, with every request.
Read the prompt caching guide →Frequently Asked Questions
How much does GPT-4o cost per 1M tokens?
GPT-4o is $2.50 per 1M input tokens (cached at $1.25) and $10.00 per 1M output tokens on the standard tier. Batch pricing is 50% less.
How does prompt caching discount work?
OpenAI discounts cached input tokens — 50% off on GPT-4o, 75% off on GPT-4.1, and up to 90% off on GPT-5. The exact discount depends on the model. Your actual savings depend on cache hit rate, which the OpenAI dashboard doesn't show per route.
Why is my actual bill higher than this estimate?
Three things this calculator can't see: (1) retries as separate requests — a retry storm can multiply your bill by 2–5x, (2) cache misses you think are hits — prefix changes from deploys silently break caching, (3) tool-call overhead — each tool call adds tokens to input and output. For real production numbers, you need per-request telemetry.
Does the calculator include reasoning token costs?
Yes for the o-series models (o1, o3, o4-mini). Reasoning tokens are priced at the output token rate. The calculator uses separate input/cached/output fields.