Documentation

CachePilot is a governed proxy for the OpenAI Responses API. Swap your base URL, add one header, and every request gets policy enforcement, cache optimization, and auditable receipts.

OpenAI only. CachePilot currently supports the OpenAI Responses API exclusively. Support for additional providers is on the roadmap.

⚡ Quickstart

1. Get your project key — Create a project in the Dashboard. You'll receive cp_live_... — save it, it's shown once.

2. Swap your base URL — Point your OpenAI client to the CachePilot endpoint:

# Before (direct) curl https://api.openai.com/v1/responses \ -H "Authorization: Bearer sk-your-key" \ -d '{"model":"gpt-4.1","input":"Hello"}' # After (through CachePilot) curl https://api.cachepilot.clclabs.ai/v1/responses \ -H "Authorization: Bearer sk-your-key" \ -H "X-CachePilot-Key: cp_live_proj_abc" \ -d '{"model":"gpt-4.1","input":"Hello"}'

3. Read the receipt headers — Every response includes X-CP-* headers proving what policy was applied.

That's it. The proxy is fully transparent to the OpenAI SDK. Your Authorization header (your OpenAI key) passes through untouched — BYOK.

Python / TypeScript SDK:

# Python from openai import OpenAI client = OpenAI( base_url="https://api.cachepilot.clclabs.ai/v1", default_headers={ "X-CachePilot-Key": "cp_live_proj_abc", }, ) response = client.responses.create( model="gpt-4.1", input="Refactor the auth module...", tools=[{"type": "shell"}, {"type": "code_interpreter"}], stream=True, )
// TypeScript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.cachepilot.clclabs.ai/v1", defaultHeaders: { "X-CachePilot-Key": "cp_live_proj_abc", }, }); const response = await client.responses.create({ model: "gpt-4.1", input: "Refactor the auth module...", tools: [{ type: "shell" }, { type: "code_interpreter" }], stream: true, });

🖥️ Codex CLI

OpenAI Codex CLI works natively with CachePilot. Set up a custom model provider so every Codex session goes through the proxy — policy enforcement, cache optimization, and telemetry included.

1. Set environment variables

CachePilot needs your project key and an OpenAI API key with full api.responses.write scope.

# PowerShell (permanent — survives restarts) [System.Environment]::SetEnvironmentVariable("CACHE_PILOT_KEY", "cp_live_YOUR_KEY", "User") [System.Environment]::SetEnvironmentVariable("OPENAI_API_KEY", "sk-proj-YOUR_KEY", "User") # macOS / Linux export CACHE_PILOT_KEY="cp_live_YOUR_KEY" export OPENAI_API_KEY="sk-proj-YOUR_KEY" # Add to ~/.bashrc or ~/.zshrc to persist
Open a new terminal after setting environment variables. $env:VAR = "value" in PowerShell is session-scoped — it won't persist across terminals.

2. Configure Codex

Add this to your ~/.codex/config.toml:

# ~/.codex/config.toml [profiles.cachepilot] name = "CachePilot" model_provider = "cachepilot" model = "gpt-4o" # or gpt-4.1, o3, etc. [model_providers.cachepilot] name = "CachePilot Proxy" base_url = "https://api.cachepilot.clclabs.ai/v1" wire_api = "responses" # Use your sk-proj key (NOT OAuth) env_key = "OPENAI_API_KEY" # Send your cp_live_ project key env_http_headers = { "X-CachePilot-Key" = "CACHE_PILOT_KEY" }
Important: Use env_key, not requires_openai_auth. The latter uses Codex's OAuth token which may lack the api.responses.write scope needed for the Responses API.

3. Run Codex

codex --profile cachepilot

That's it. Every Codex request now flows through CachePilot with policy enforcement, cache optimization, and full telemetry. Check your Dashboard to see requests in real time.

Supported models

The proxy supports all OpenAI Responses API models:

gpt-4o gpt-4o-mini gpt-4.1 gpt-4.1-mini gpt-4.1-nano gpt-5 gpt-5-codex gpt-5.1 gpt-5.1-codex gpt-5.1-codex-mini o3 o3-mini o4-mini

🛡️ Policies

Each project has a PolicyV1 JSON object that controls what the proxy allows. Policies are enforced deterministically before any request reaches OpenAI.

Shell Access

allow_shell — When false, any request containing {"type":"shell"} or {"type":"computer_use_preview"} is rejected with HTTP 403 before it reaches OpenAI. The telemetry row records shell_requested=true, shell_denied=true.

Skills Allowlist

Skills are applied with deterministic set algebra:

final_tools = (requested ∩ allowed) − denied + required

Four modes are available:

ModeBehavior
ALLOWOnly tools in the allowlist can pass through
DENYAll tools pass except those in the denylist
DEFAULTNo filtering — all requested tools pass through
REQUIREThese tools are always injected, even if the client didn't request them

Output Budget

Controls the max_output_tokens parameter sent to OpenAI:

ModeBehavior
CLAMPIf the client requests more than the policy limit, it's clamped down. If less, the client value wins.
FIXEDAlways overrides to the policy value, ignoring the client
PASS_THROUGHNo enforcement — the client's value passes through unchanged

Cache Optimization

In Hardened mode, CachePilot automatically optimizes your requests for maximum cache efficiency and reduced token spend. These optimizations are applied transparently — no changes needed on your side.

Policy Hash. Every request logs a SHA-256 of the canonical JSON policy that was applied (policy_hash). This is an immutable audit fingerprint — if a policy changes between requests, you'll see a different hash.

📋 Receipt Headers

Every response from the proxy includes X-CP-* headers. These are your policy receipt — proof of what was enforced, without storing any content.

HeaderExampleDescription
X-CP-Request-Idb7a3...f2e1UUID of this telemetry row
X-CP-Policy-Version1Policy schema version that was applied
X-CP-Output-Budget-Applied4096Actual max_output_tokens sent to OpenAI
X-CP-Skills-Applied-Hasha3f8...c1e2SHA-256 of the final tool set after policy
X-CP-Prefix-Hash7b2d...f491SHA-256 of instructional prefix (proves prompt stability)
< HTTP/2 200 < X-CP-Request-Id: b7a3c8e2-1f4d-4a9b-8c6e-3d5f7a2b1e09 < X-CP-Policy-Version: 1 < X-CP-Output-Budget-Applied: 4096 < X-CP-Skills-Applied-Hash: a3f829c1e2b4d6f8 < X-CP-Prefix-Hash: 7b2df491a8c3e5d7

📊 Telemetry Fields

Every proxied request writes a content-free telemetry row to Postgres. No prompts, no outputs — only operational metadata.

FieldTypeDescription
request_tstimestampWhen the request arrived
modeltextModel name (e.g. gpt-4.1)
streambooleanWas this a streaming request?
prefix_hashtextSHA-256 of prompt prefix
policy_versionintPolicy version applied
policy_hashtextSHA-256 of the full policy JSON
applied_max_output_tokensintOutput budget after enforcement
skills_hashtextHash of final tool set
allow_shellbooleanPolicy shell setting
shell_requestedbooleanDid the client request a shell tool?
shell_deniedbooleanWas the shell request blocked?
http_statusintResponse status code
latency_ms_totalintTotal round-trip latency
latency_ms_apiintOpenAI API call latency
retry_429_countintNumber of 429 retries
error_codetextError code (null if success)

Usage row (per request):

FieldTypeDescription
input_tokensintTotal input tokens
output_tokensintTotal output tokens
cached_tokensintTokens served from OpenAI's cache
uncached_tokensint (generated)input_tokens − cached_tokens

🚀 Deployment

Proxy runs on our infra (BYOK). You bring your own OpenAI API key — it passes through untouched. We never store prompts or outputs. Only content-free telemetry metadata is persisted.

The CachePilot proxy runs on a dedicated VPS with TLS via Caddy. Your API key never touches our storage layer — it's forwarded to OpenAI in-memory during each request.

ComponentStack
ProxyNode.js / Express, Docker
TLSCaddy (auto Let's Encrypt)
DatabaseNeon Postgres (serverless)
DashboardNext.js on Vercel

Endpoint: https://api.cachepilot.clclabs.ai/v1/responses

Health check: https://api.cachepilot.clclabs.ai/health