Documentation
CachePilot is a governed proxy for the OpenAI Responses API. Swap your base URL, add one header, and every request gets policy enforcement, cache optimization, and auditable receipts.
⚡ Quickstart
1. Get your project key — Create a project in the Dashboard. You'll receive cp_live_... — save it, it's shown once.
2. Swap your base URL — Point your OpenAI client to the CachePilot endpoint:
3. Read the receipt headers — Every response includes X-CP-* headers proving what policy was applied.
Authorization header (your OpenAI key) passes through untouched — BYOK.Python / TypeScript SDK:
🖥️ Codex CLI
OpenAI Codex CLI works natively with CachePilot. Set up a custom model provider so every Codex session goes through the proxy — policy enforcement, cache optimization, and telemetry included.
1. Set environment variables
CachePilot needs your project key and an OpenAI API key with full api.responses.write scope.
$env:VAR = "value" in PowerShell is session-scoped — it won't persist across terminals.2. Configure Codex
Add this to your ~/.codex/config.toml:
env_key, not requires_openai_auth. The latter uses Codex's OAuth token which may lack the api.responses.write scope needed for the Responses API.3. Run Codex
That's it. Every Codex request now flows through CachePilot with policy enforcement, cache optimization, and full telemetry. Check your Dashboard to see requests in real time.
Supported models
The proxy supports all OpenAI Responses API models:
🛡️ Policies
Each project has a PolicyV1 JSON object that controls what the proxy allows. Policies are enforced deterministically before any request reaches OpenAI, and every change creates a new immutable version — so you can always reconstruct exactly what was applied to a given request from its policy_version and policy_hash.
Shell Access
openai.allow_shell — When false, any request containing {"type":"shell"} or {"type":"computer_use_preview"} is rejected with HTTP 403 before it reaches OpenAI. The telemetry row records shell_requested=true, shell_denied=true.
Skills
Four arrays of skill IDs combine into a deterministic contract:
| Field | Behavior |
|---|---|
allowed_skill_ids | Whitelist. Empty = no restriction; non-empty = only these can run. |
deny_skill_ids | Blocklist. Always removed from the final set. |
default_skill_ids | Applied when the client doesn't request any skills. |
required_skill_ids | Always injected, even if the client didn't ask for them. |
See the Skills section for the catalog of built-in skill IDs.
Output Budget
Controls the max_output_tokens parameter forwarded to OpenAI.
| Field | Behavior |
|---|---|
mode | PASS_THROUGH — client value untouched · DEFAULT — use policy default if client omits it · CLAMP — client value wins but is clamped to [min, hard] · FIXED — always override to policy default. |
default_max_output_tokens | Used when the client omits a value, or as the fixed override. |
hard_max_output_tokens | Absolute ceiling. Client values above this are clamped. |
min_max_output_tokens | Absolute floor. Client values below this are raised. |
allow_request_override | When false, request-level values are ignored entirely. |
The final value sent upstream is recorded as applied_max_output_tokens on the telemetry row and echoed in X-CP-Output-Budget-Applied.
Prefix Cache
prefix_cache.mode (DISABLED / AUTO / MANUAL) governs how the proxy derives a stable prompt_cache_key for each request. prompt_cache_key_source selects what that key is seeded from (NONE, PREFIX_HASH, or PROJECT_ID). The goal is to maximize upstream cache hits without leaking content into the key; the exact derivation is an implementation detail of the proxy.
Output Cap Instruction
When output_cap_instruction.enabled is true, the proxy appends a short system-level instruction reinforcing the output budget. Useful for models that treat max_output_tokens as advisory.
Telemetry Controls
telemetry.store_prompts and telemetry.store_outputs both default to false. In the default configuration the proxy is fully content-free — only hashes, counts, and operational metadata are persisted.
project_policies. The policy_hash on each telemetry row is an audit fingerprint — if the policy changed between two requests, you'll see different hashes, and you can always look up the exact JSON that was applied.🧩 Skills
Skills are the named governance primitives the proxy enforces on every request. Each skill is a small, deterministic contract — the list below is the catalog of built-ins you can reference by ID in a policy's skill arrays.
| ID | Contract |
|---|---|
prefix_guard | Asserts the instructional prefix matches the expected fingerprint. |
seed_lock | Pins determinism controls (seed / sampling) for reproducible runs. |
output_budget | Enforces the output-budget policy on every request. |
request_cost_guard | Rejects requests whose projected cost exceeds the policy ceiling. |
tool_whitelist | Applies the skills set-algebra contract to the request's tool list. |
tool_schema_enforcer | Validates tool definitions against the expected schema shape. |
tool_output_sanitizer | Strips disallowed fields from tool outputs before they reach the model. |
receipt_emitter | Produces the X-CP-* receipt headers on every response. |
hash_redactor | Guarantees only hashes — never raw content — are persisted to telemetry. |
drift_detector | Compares the live request against pinned golden runs and emits drift events. |
Skills are grouped into tier bundles — core_free, core_starter, and core_pro — which determine which skills are available on your project's plan. See Pricing for bundle membership.
🎯 Drift & Golden Runs
A golden run is a pinned baseline — any request from the dashboard can be promoted, which captures its policy version, prefix hash, and tool/skill fingerprints as the expected state for that route. Subsequent requests on the same route are compared against the baseline, and any mismatch is recorded as a drift_event.
Drift detection is content-free: it only looks at hashes, so it catches structural changes without ever needing to see your prompts or outputs.
| Drift type | Meaning |
|---|---|
PREFIX_CHANGED | The instructional prefix hash no longer matches the golden run — prompt has shifted. |
POLICY_CHANGED | A different policy version was applied than the one pinned on the baseline. |
SKILL_CHANGED | The applied skills hash differs from the golden run — the governance envelope changed. |
Drift events surface in the dashboard's Determinism and Golden Runs tabs, so you can spot silent prompt or policy regressions before they hit production traffic.
📋 Receipt Headers
Every response from the proxy includes X-CP-* headers. These are your policy receipt — proof of what was enforced, without storing any content.
| Header | Example | Description |
|---|---|---|
X-CP-Request-Id | b7a3...f2e1 | UUID of this telemetry row |
X-CP-Policy-Version | 1 | Policy schema version that was applied |
X-CP-Output-Budget-Applied | 4096 | Actual max_output_tokens sent to OpenAI |
X-CP-Skills-Applied-Hash | a3f8...c1e2 | SHA-256 of the final tool set after policy |
X-CP-Prefix-Hash | 7b2d...f491 | SHA-256 of instructional prefix (proves prompt stability) |
📊 Telemetry Fields
Every proxied request writes a content-free telemetry row to Postgres. No prompts, no outputs — only operational metadata.
| Field | Type | Description |
|---|---|---|
request_ts | timestamp | When the request arrived |
model | text | Model name (e.g. gpt-4.1) |
stream | boolean | Was this a streaming request? |
prefix_hash | text | SHA-256 of instructional prefix (16 hex) |
skills_hash | text | Hash of the applied skills set |
toolset_hash | text | Hash of the canonicalized tool list |
schema_hash | text | Hash of the response-format schema, if any |
policy_version | int | Policy version applied |
policy_hash | text | SHA-256 of the full policy JSON |
applied_max_output_tokens | int | Output budget after enforcement |
reasoning_effort | text | Reasoning effort for o-series models |
prompt_cache_key | text | Stable key forwarded to upstream for cache routing |
allow_shell | boolean | Policy shell setting |
shell_requested | boolean | Did the client request a shell tool? |
shell_denied | boolean | Was the shell request blocked? |
http_status | int | Response status code |
upstream_request_id | text | OpenAI's request ID (for support) |
latency_ms_total | int | Total round-trip latency |
latency_ms_api | int | OpenAI API call latency |
retry_429_count | int | Number of 429 retries |
error_code | text | Error code (null if success) |
Usage row (per request):
| Field | Type | Description |
|---|---|---|
input_tokens | int | Total input tokens |
output_tokens | int | Total output tokens |
cached_tokens | int | Input tokens served from cache |
uncached_tokens | int (generated) | input_tokens − cached_tokens |
reasoning_tokens | int | Hidden reasoning tokens billed on o-series / GPT-5 models |
cache_source | text | upstream (OpenAI), proxy, or engine — where the cache hit came from |
🚀 Deployment
The CachePilot proxy runs on a dedicated VPS with TLS via Caddy. Your API key never touches our storage layer — it's forwarded to OpenAI in-memory during each request.
| Component | Stack |
|---|---|
| Proxy | Node.js / Express, Docker |
| TLS | Caddy (auto Let's Encrypt) |
| Database | Neon Postgres (serverless) |
| Dashboard | Next.js on Vercel |
Endpoint: https://api.cachepilot.clclabs.ai/v1/responses
Health check: https://api.cachepilot.clclabs.ai/health