Documentation

CachePilot is a governed proxy for the OpenAI Responses API. Swap your base URL, add one header, and every request gets policy enforcement, cache optimization, and auditable receipts.

OpenAI only. CachePilot currently supports the OpenAI Responses API exclusively. Support for additional providers is on the roadmap.

⚡ Quickstart

1. Get your project key — Create a project in the Dashboard. You'll receive cp_live_... — save it, it's shown once.

2. Swap your base URL — Point your OpenAI client to the CachePilot endpoint:

# Before (direct) curl https://api.openai.com/v1/responses \ -H "Authorization: Bearer sk-your-key" \ -d '{"model":"gpt-4.1","input":"Hello"}' # After (through CachePilot) curl https://api.cachepilot.clclabs.ai/v1/responses \ -H "Authorization: Bearer sk-your-key" \ -H "X-CachePilot-Key: cp_live_proj_abc" \ -d '{"model":"gpt-4.1","input":"Hello"}'

3. Read the receipt headers — Every response includes X-CP-* headers proving what policy was applied.

That's it. The proxy is fully transparent to the OpenAI SDK. Your Authorization header (your OpenAI key) passes through untouched — BYOK.

Python / TypeScript SDK:

# Python from openai import OpenAI client = OpenAI( base_url="https://api.cachepilot.clclabs.ai/v1", default_headers={ "X-CachePilot-Key": "cp_live_proj_abc", }, ) response = client.responses.create( model="gpt-4.1", input="Refactor the auth module...", tools=[{"type": "shell"}, {"type": "code_interpreter"}], stream=True, )
// TypeScript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.cachepilot.clclabs.ai/v1", defaultHeaders: { "X-CachePilot-Key": "cp_live_proj_abc", }, }); const response = await client.responses.create({ model: "gpt-4.1", input: "Refactor the auth module...", tools: [{ type: "shell" }, { type: "code_interpreter" }], stream: true, });

🖥️ Codex CLI

OpenAI Codex CLI works natively with CachePilot. Set up a custom model provider so every Codex session goes through the proxy — policy enforcement, cache optimization, and telemetry included.

1. Set environment variables

CachePilot needs your project key and an OpenAI API key with full api.responses.write scope.

# PowerShell (permanent — survives restarts) [System.Environment]::SetEnvironmentVariable("CACHE_PILOT_KEY", "cp_live_YOUR_KEY", "User") [System.Environment]::SetEnvironmentVariable("OPENAI_API_KEY", "sk-proj-YOUR_KEY", "User") # macOS / Linux export CACHE_PILOT_KEY="cp_live_YOUR_KEY" export OPENAI_API_KEY="sk-proj-YOUR_KEY" # Add to ~/.bashrc or ~/.zshrc to persist
Open a new terminal after setting environment variables. $env:VAR = "value" in PowerShell is session-scoped — it won't persist across terminals.

2. Configure Codex

Add this to your ~/.codex/config.toml:

# ~/.codex/config.toml [profiles.cachepilot] name = "CachePilot" model_provider = "cachepilot" model = "gpt-4o" # or gpt-4.1, o3, etc. [model_providers.cachepilot] name = "CachePilot Proxy" base_url = "https://api.cachepilot.clclabs.ai/v1" wire_api = "responses" # Use your sk-proj key (NOT OAuth) env_key = "OPENAI_API_KEY" # Send your cp_live_ project key env_http_headers = { "X-CachePilot-Key" = "CACHE_PILOT_KEY" }
Important: Use env_key, not requires_openai_auth. The latter uses Codex's OAuth token which may lack the api.responses.write scope needed for the Responses API.

3. Run Codex

codex --profile cachepilot

That's it. Every Codex request now flows through CachePilot with policy enforcement, cache optimization, and full telemetry. Check your Dashboard to see requests in real time.

Supported models

The proxy supports all OpenAI Responses API models:

gpt-4o gpt-4o-mini gpt-4.1 gpt-4.1-mini gpt-4.1-nano gpt-5 gpt-5-codex gpt-5.1 gpt-5.1-codex gpt-5.1-codex-mini o3 o3-mini o4-mini

🛡️ Policies

Each project has a PolicyV1 JSON object that controls what the proxy allows. Policies are enforced deterministically before any request reaches OpenAI, and every change creates a new immutable version — so you can always reconstruct exactly what was applied to a given request from its policy_version and policy_hash.

{ "version": 1, "openai": { "allow_shell": false, "allowed_skill_ids": [], "default_skill_ids": [], "required_skill_ids": [], "deny_skill_ids": [] }, "output_budget": { "mode": "CLAMP", "default_max_output_tokens": 4096, "hard_max_output_tokens": 16384, "min_max_output_tokens": 100, "allow_request_override": false }, "prefix_cache": { "mode": "AUTO", "instructions_splitting": true, "spine_field": null, "prompt_cache_key_source": "PREFIX_HASH" }, "output_cap_instruction": { "enabled": false, "instruction": "" }, "telemetry": { "store_prompts": false, "store_outputs": false } }

Shell Access

openai.allow_shell — When false, any request containing {"type":"shell"} or {"type":"computer_use_preview"} is rejected with HTTP 403 before it reaches OpenAI. The telemetry row records shell_requested=true, shell_denied=true.

Skills

Four arrays of skill IDs combine into a deterministic contract:

applied = ((requested ∪ default) ∩ allowed) − deny ∪ required
FieldBehavior
allowed_skill_idsWhitelist. Empty = no restriction; non-empty = only these can run.
deny_skill_idsBlocklist. Always removed from the final set.
default_skill_idsApplied when the client doesn't request any skills.
required_skill_idsAlways injected, even if the client didn't ask for them.

See the Skills section for the catalog of built-in skill IDs.

Output Budget

Controls the max_output_tokens parameter forwarded to OpenAI.

FieldBehavior
modePASS_THROUGH — client value untouched · DEFAULT — use policy default if client omits it · CLAMP — client value wins but is clamped to [min, hard] · FIXED — always override to policy default.
default_max_output_tokensUsed when the client omits a value, or as the fixed override.
hard_max_output_tokensAbsolute ceiling. Client values above this are clamped.
min_max_output_tokensAbsolute floor. Client values below this are raised.
allow_request_overrideWhen false, request-level values are ignored entirely.

The final value sent upstream is recorded as applied_max_output_tokens on the telemetry row and echoed in X-CP-Output-Budget-Applied.

Prefix Cache

prefix_cache.mode (DISABLED / AUTO / MANUAL) governs how the proxy derives a stable prompt_cache_key for each request. prompt_cache_key_source selects what that key is seeded from (NONE, PREFIX_HASH, or PROJECT_ID). The goal is to maximize upstream cache hits without leaking content into the key; the exact derivation is an implementation detail of the proxy.

Output Cap Instruction

When output_cap_instruction.enabled is true, the proxy appends a short system-level instruction reinforcing the output budget. Useful for models that treat max_output_tokens as advisory.

Telemetry Controls

telemetry.store_prompts and telemetry.store_outputs both default to false. In the default configuration the proxy is fully content-free — only hashes, counts, and operational metadata are persisted.

Immutable versions. Every policy edit creates a new row in project_policies. The policy_hash on each telemetry row is an audit fingerprint — if the policy changed between two requests, you'll see different hashes, and you can always look up the exact JSON that was applied.

🧩 Skills

Skills are the named governance primitives the proxy enforces on every request. Each skill is a small, deterministic contract — the list below is the catalog of built-ins you can reference by ID in a policy's skill arrays.

IDContract
prefix_guardAsserts the instructional prefix matches the expected fingerprint.
seed_lockPins determinism controls (seed / sampling) for reproducible runs.
output_budgetEnforces the output-budget policy on every request.
request_cost_guardRejects requests whose projected cost exceeds the policy ceiling.
tool_whitelistApplies the skills set-algebra contract to the request's tool list.
tool_schema_enforcerValidates tool definitions against the expected schema shape.
tool_output_sanitizerStrips disallowed fields from tool outputs before they reach the model.
receipt_emitterProduces the X-CP-* receipt headers on every response.
hash_redactorGuarantees only hashes — never raw content — are persisted to telemetry.
drift_detectorCompares the live request against pinned golden runs and emits drift events.

Skills are grouped into tier bundles — core_free, core_starter, and core_pro — which determine which skills are available on your project's plan. See Pricing for bundle membership.

🎯 Drift & Golden Runs

A golden run is a pinned baseline — any request from the dashboard can be promoted, which captures its policy version, prefix hash, and tool/skill fingerprints as the expected state for that route. Subsequent requests on the same route are compared against the baseline, and any mismatch is recorded as a drift_event.

Drift detection is content-free: it only looks at hashes, so it catches structural changes without ever needing to see your prompts or outputs.

Drift typeMeaning
PREFIX_CHANGEDThe instructional prefix hash no longer matches the golden run — prompt has shifted.
POLICY_CHANGEDA different policy version was applied than the one pinned on the baseline.
SKILL_CHANGEDThe applied skills hash differs from the golden run — the governance envelope changed.

Drift events surface in the dashboard's Determinism and Golden Runs tabs, so you can spot silent prompt or policy regressions before they hit production traffic.

📋 Receipt Headers

Every response from the proxy includes X-CP-* headers. These are your policy receipt — proof of what was enforced, without storing any content.

HeaderExampleDescription
X-CP-Request-Idb7a3...f2e1UUID of this telemetry row
X-CP-Policy-Version1Policy schema version that was applied
X-CP-Output-Budget-Applied4096Actual max_output_tokens sent to OpenAI
X-CP-Skills-Applied-Hasha3f8...c1e2SHA-256 of the final tool set after policy
X-CP-Prefix-Hash7b2d...f491SHA-256 of instructional prefix (proves prompt stability)
< HTTP/2 200 < X-CP-Request-Id: b7a3c8e2-1f4d-4a9b-8c6e-3d5f7a2b1e09 < X-CP-Policy-Version: 1 < X-CP-Output-Budget-Applied: 4096 < X-CP-Skills-Applied-Hash: a3f829c1e2b4d6f8 < X-CP-Prefix-Hash: 7b2df491a8c3e5d7

📊 Telemetry Fields

Every proxied request writes a content-free telemetry row to Postgres. No prompts, no outputs — only operational metadata.

FieldTypeDescription
request_tstimestampWhen the request arrived
modeltextModel name (e.g. gpt-4.1)
streambooleanWas this a streaming request?
prefix_hashtextSHA-256 of instructional prefix (16 hex)
skills_hashtextHash of the applied skills set
toolset_hashtextHash of the canonicalized tool list
schema_hashtextHash of the response-format schema, if any
policy_versionintPolicy version applied
policy_hashtextSHA-256 of the full policy JSON
applied_max_output_tokensintOutput budget after enforcement
reasoning_efforttextReasoning effort for o-series models
prompt_cache_keytextStable key forwarded to upstream for cache routing
allow_shellbooleanPolicy shell setting
shell_requestedbooleanDid the client request a shell tool?
shell_deniedbooleanWas the shell request blocked?
http_statusintResponse status code
upstream_request_idtextOpenAI's request ID (for support)
latency_ms_totalintTotal round-trip latency
latency_ms_apiintOpenAI API call latency
retry_429_countintNumber of 429 retries
error_codetextError code (null if success)

Usage row (per request):

FieldTypeDescription
input_tokensintTotal input tokens
output_tokensintTotal output tokens
cached_tokensintInput tokens served from cache
uncached_tokensint (generated)input_tokens − cached_tokens
reasoning_tokensintHidden reasoning tokens billed on o-series / GPT-5 models
cache_sourcetextupstream (OpenAI), proxy, or engine — where the cache hit came from

🚀 Deployment

Proxy runs on our infra (BYOK). You bring your own OpenAI API key — it passes through untouched. We never store prompts or outputs. Only content-free telemetry metadata is persisted.

The CachePilot proxy runs on a dedicated VPS with TLS via Caddy. Your API key never touches our storage layer — it's forwarded to OpenAI in-memory during each request.

ComponentStack
ProxyNode.js / Express, Docker
TLSCaddy (auto Let's Encrypt)
DatabaseNeon Postgres (serverless)
DashboardNext.js on Vercel

Endpoint: https://api.cachepilot.clclabs.ai/v1/responses

Health check: https://api.cachepilot.clclabs.ai/health