Free Tool

OpenAI API Cost Calculator

Model-by-model pricing for the Responses API, Chat Completions, and the o-series. Input tokens, output tokens, cache discount — all accounted for.

Estimate Your Monthly Spend

Model (standard short-context pricing)

Pricing Tier

Input tokens per request

Output tokens per request

Requests per day

Assumed cache hit rate (%)

Monthly estimate

$1.01K

$33.75/day · $12.32K/year

Cost per request$0.03

Input tokens (uncached)7,000

Input tokens (cached)3,000 at 50% off

Output tokens2,000

gpt-4o · standard tier

This is the ideal. Want to see the real number from your production traffic?

See the real number → Connect CachePilot free

Retry waste

OpenAI's usage page doesn't distinguish retries from original requests. A retry storm can multiply your bill 2–5x without any additional useful output.

Silent cache misses

You think your cache is hitting. But a one-character change in your system prompt after a deploy silently invalidates the cache across every route that shares it. OpenAI's dashboard can't show you that.

Tool-call overhead

Every tool call adds tokens to both input and output. Multi-step loops compound this. The calculator handles the happy path — production traffic rarely stays there.

Current OpenAI Pricing (Standard Tier)

Model	Input / 1M	Cached / 1M	Output / 1M
`gpt-5.5`	$5.00	$0.50	$30.00
`gpt-5.4`	$2.50	$0.25	$15.00
`gpt-5.4-mini`	$0.75	$0.07	$4.50
`gpt-5.4-nano`	$0.20	$0.02	$1.25
`gpt-5`	$1.25	$0.13	$10.00
`gpt-5-mini`	$0.25	$0.03	$2.00
`gpt-4.1`	$2.00	$0.50	$8.00
`gpt-4.1-mini`	$0.40	$0.10	$1.60
`gpt-4o`	$2.50	$1.25	$10.00
`gpt-4o-mini`	$0.15	$0.07	$0.60
`o1`	$15.00	$7.50	$60.00
`o3`	$2.00	$0.50	$8.00
`o4-mini`	$1.10	$0.28	$4.40
`o3-mini`	$1.10	$0.55	$4.40

Full pricing for all models available via API. Batch pricing is 50% off input tokens.

How Caching Changes the Math

Prompt caching discounts cached input tokens. Here's what that looks like at your current inputs:

0% cache hit rate

$1.35K/mo

Full price on all 300,000,000 input tokens/month

→

50% cache hit rate

$787.50/mo

42% savings on input tokens

→

80% cache hit rate

$675.00/mo

50% savings on input tokens

The real question isn't "what could I save with caching?" — it's "what is my actual hit rate in production right now?" Your savings depend on your token mix (input vs. output ratio) and your real hit rate. CachePilot shows you exactly both, per route, with every request.

Read the prompt caching guide →

Cached vs Uncached CostSee exactly what cache misses are costing you at your volume Why Is My Bill So High?Walk through the 7 hidden causes of OpenAI bill spikes OpenAI Prompt Caching GuideHow caching actually works and what silently breaks it

Frequently Asked Questions

How much does GPT-4o cost per 1M tokens?

GPT-4o is $2.50 per 1M input tokens (cached at $1.25) and $10.00 per 1M output tokens on the standard tier. Batch pricing is 50% less.

How does prompt caching discount work?

OpenAI discounts cached input tokens — 50% off on GPT-4o, 75% off on GPT-4.1, and up to 90% off on GPT-5. The exact discount depends on the model. Your actual savings depend on cache hit rate, which the OpenAI dashboard doesn't show per route.

Why is my actual bill higher than this estimate?

Three things this calculator can't see: (1) retries as separate requests — a retry storm can multiply your bill by 2–5x, (2) cache misses you think are hits — prefix changes from deploys silently break caching, (3) tool-call overhead — each tool call adds tokens to input and output. For real production numbers, you need per-request telemetry.

Does the calculator include reasoning token costs?

Yes for the o-series models (o1, o3, o4-mini). Reasoning tokens are priced at the output token rate. The calculator uses separate input/cached/output fields.

Estimates are useful. Production telemetry is better. CachePilot gives you free, content-free visibility into every OpenAI request — cache hits, retries, drift, cost per route.

Start free → Connect in 5 minutes