Documentation
CachePilot is a governed proxy for the OpenAI Responses API. Swap your base URL, add one header, and every request gets policy enforcement, cache optimization, and auditable receipts.
⚡ Quickstart
1. Get your project key — Create a project in the Dashboard. You'll receive cp_live_... — save it, it's shown once.
2. Swap your base URL — Point your OpenAI client to the CachePilot endpoint:
3. Read the receipt headers — Every response includes X-CP-* headers proving what policy was applied.
Authorization header (your OpenAI key) passes through untouched — BYOK.Python / TypeScript SDK:
🖥️ Codex CLI
OpenAI Codex CLI works natively with CachePilot. Set up a custom model provider so every Codex session goes through the proxy — policy enforcement, cache optimization, and telemetry included.
1. Set environment variables
CachePilot needs your project key and an OpenAI API key with full api.responses.write scope.
$env:VAR = "value" in PowerShell is session-scoped — it won't persist across terminals.2. Configure Codex
Add this to your ~/.codex/config.toml:
env_key, not requires_openai_auth. The latter uses Codex's OAuth token which may lack the api.responses.write scope needed for the Responses API.3. Run Codex
That's it. Every Codex request now flows through CachePilot with policy enforcement, cache optimization, and full telemetry. Check your Dashboard to see requests in real time.
Supported models
The proxy supports all OpenAI Responses API models:
🛡️ Policies
Each project has a PolicyV1 JSON object that controls what the proxy allows. Policies are enforced deterministically before any request reaches OpenAI.
Shell Access
allow_shell — When false, any request containing {"type":"shell"} or {"type":"computer_use_preview"} is rejected with HTTP 403 before it reaches OpenAI. The telemetry row records shell_requested=true, shell_denied=true.
Skills Allowlist
Skills are applied with deterministic set algebra:
Four modes are available:
| Mode | Behavior |
|---|---|
ALLOW | Only tools in the allowlist can pass through |
DENY | All tools pass except those in the denylist |
DEFAULT | No filtering — all requested tools pass through |
REQUIRE | These tools are always injected, even if the client didn't request them |
Output Budget
Controls the max_output_tokens parameter sent to OpenAI:
| Mode | Behavior |
|---|---|
CLAMP | If the client requests more than the policy limit, it's clamped down. If less, the client value wins. |
FIXED | Always overrides to the policy value, ignoring the client |
PASS_THROUGH | No enforcement — the client's value passes through unchanged |
Cache Optimization
In Hardened mode, CachePilot automatically optimizes your requests for maximum cache efficiency and reduced token spend. These optimizations are applied transparently — no changes needed on your side.
policy_hash). This is an immutable audit fingerprint — if a policy changes between requests, you'll see a different hash.📋 Receipt Headers
Every response from the proxy includes X-CP-* headers. These are your policy receipt — proof of what was enforced, without storing any content.
| Header | Example | Description |
|---|---|---|
X-CP-Request-Id | b7a3...f2e1 | UUID of this telemetry row |
X-CP-Policy-Version | 1 | Policy schema version that was applied |
X-CP-Output-Budget-Applied | 4096 | Actual max_output_tokens sent to OpenAI |
X-CP-Skills-Applied-Hash | a3f8...c1e2 | SHA-256 of the final tool set after policy |
X-CP-Prefix-Hash | 7b2d...f491 | SHA-256 of instructional prefix (proves prompt stability) |
📊 Telemetry Fields
Every proxied request writes a content-free telemetry row to Postgres. No prompts, no outputs — only operational metadata.
| Field | Type | Description |
|---|---|---|
request_ts | timestamp | When the request arrived |
model | text | Model name (e.g. gpt-4.1) |
stream | boolean | Was this a streaming request? |
prefix_hash | text | SHA-256 of prompt prefix |
policy_version | int | Policy version applied |
policy_hash | text | SHA-256 of the full policy JSON |
applied_max_output_tokens | int | Output budget after enforcement |
skills_hash | text | Hash of final tool set |
allow_shell | boolean | Policy shell setting |
shell_requested | boolean | Did the client request a shell tool? |
shell_denied | boolean | Was the shell request blocked? |
http_status | int | Response status code |
latency_ms_total | int | Total round-trip latency |
latency_ms_api | int | OpenAI API call latency |
retry_429_count | int | Number of 429 retries |
error_code | text | Error code (null if success) |
Usage row (per request):
| Field | Type | Description |
|---|---|---|
input_tokens | int | Total input tokens |
output_tokens | int | Total output tokens |
cached_tokens | int | Tokens served from OpenAI's cache |
uncached_tokens | int (generated) | input_tokens − cached_tokens |
🚀 Deployment
The CachePilot proxy runs on a dedicated VPS with TLS via Caddy. Your API key never touches our storage layer — it's forwarded to OpenAI in-memory during each request.
| Component | Stack |
|---|---|
| Proxy | Node.js / Express, Docker |
| TLS | Caddy (auto Let's Encrypt) |
| Database | Neon Postgres (serverless) |
| Dashboard | Next.js on Vercel |
Endpoint: https://api.cachepilot.clclabs.ai/v1/responses
Health check: https://api.cachepilot.clclabs.ai/health