# SBproxy: full documentation > Concatenation of every Markdown file under `sbproxy/docs/` plus the top-level `README.md`, `MIGRATION.md`, and `CHANGELOG.md`. Designed for AI tools (Claude, ChatGPT, Cursor) that want the entire SBproxy corpus in one request per the [llmstxt.org](https://llmstxt.org/) convention. Pairs with `/llms.txt` (the small AI-discoverable feature catalog at `docs/llms.txt`). For per-doc URLs see `docs/README.md`. Regenerated by `scripts/regen-llms-full.sh`. Generated; do not hand-edit. Source: https://github.com/soapbucket/sbproxy Generated: 2026-06-09T01:12:41Z --- ## Table of contents - `README.md`: README.md - `MIGRATION.md`: Migrating from v0.1.x (Go) to v1.0 (Rust) - `CHANGELOG.md`: Changelog - `docs/README.md`: SBproxy documentation - `docs/402-challenge.md`: 402 Challenge contract - `docs/a2a-gateway.md`: A2A gateway - `docs/access-log.md`: Access log - `docs/admin-api-reference.md`: Admin API reference - `docs/adr-ai-hub-format.md`: ADR: AI gateway hub format and the `ChatFormat` trait - `docs/adr-outbound-credential-resolver.md`: ADR: outbound credential resolver, OSS vs enterprise line - `docs/agent-budget.md`: agent_budget policy - `docs/agent-skills.md`: Agent Skills v0.2.0 - `docs/ai-crawl-control.md`: AI Crawl Control + Pay Per Crawl - `docs/ai-gateway.md`: SBproxy AI gateway guide - `docs/ai-lb-benchmark.md`: AI router load-balancing benchmark - `docs/architecture.md`: SBproxy architecture and deployment guide - `docs/audit-log.md`: Audit log - `docs/auth-oidc.md`: OIDC Relying-Party login - `docs/build.md`: Build pipeline - `docs/bulk-redirects.md`: Bulk redirects - `docs/cache-reserve.md`: Cache Reserve - `docs/clickhouse-attribution.md`: ClickHouse attribution - `docs/cloudflare-code-mode.md`: Cloudflare Code Mode - `docs/comparison.md`: How SBproxy compares - `docs/config-stability.md`: Config stability tiers - `docs/configuration.md`: SBproxy Configuration Reference - `docs/content-digest.md`: content_digest policy - `docs/content-for-agents.md`: Content for agents - `docs/degradation.md`: Dependency degradation matrix - `docs/enterprise.md`: Enterprise - `docs/events.md`: SBproxy events - `docs/exposed-credentials.md`: Exposed credentials check - `docs/faq.md`: Frequently asked questions - `docs/feature-flags.md`: Edge feature flags - `docs/features.md`: SBproxy features manual - `docs/getting-started-agent-identity.md`: Getting started: Agent identity issuance and enforcement - `docs/getting-started-ai-estate.md`: Getting started: AI estate (LLM gateway in front of model providers) - `docs/getting-started-api-estate.md`: Getting started: API estate governance (reverse proxy in front of existing APIs) - `docs/getting-started-content-estate.md`: Getting started: Content estate (HTML-to-markdown / content transformation for agents) - `docs/getting-started-sovereign-multicloud.md`: Getting started: Sovereign / multi-cloud deployment - `docs/glossary.md`: Glossary - `docs/headers-reference.md`: Response headers reference - `docs/headless-detection.md`: Headless detection - `docs/json-schema.md`: JSON Schema for `sb.yml` - `docs/kubernetes.md`: Running sbproxy on Kubernetes - `docs/l402.md`: L402 (Lightning HTTP 402) - `docs/listings.md`: Listings - `docs/manual.md`: SBproxy Runtime Manual - `docs/mcp-schema-drift.md`: MCP schema-drift detection - `docs/mcp.md`: MCP gateway - `docs/metrics-stability.md`: Metrics stability - `docs/migration-credentials.md`: Migration: credentials block - `docs/migration-mcp-rbac.md`: Migrating MCP tool access policies - `docs/model-pinning.md`: Model pinning - `docs/multi-tenant.md`: Multi-tenant deployment - `docs/object-authz.md`: object_authz policy - `docs/observability.md`: Observability - `docs/openapi-emission.md`: OpenAPI Emission - `docs/openapi-validation.md`: OpenAPI schema validation - `docs/operator-runbook.md`: Operator runbook - `docs/outbound-peer-pricing.md`: Outbound peer-pricing pre-flight - `docs/performance.md`: Performance - `docs/policy.md`: Policy engine - `docs/prompt-injection-v2.md`: prompt_injection_v2 - `docs/providers.md`: Supported providers - `docs/quickstart-operator.md`: Operator quickstart: first 24 hours - `docs/README.md`: SBproxy documentation - `docs/routing-strategies.md`: Routing Strategies - `docs/rsl.md`: RSL 1.0 licensing cookbook - `docs/scripting.md`: SBproxy scripting reference: CEL, Lua, JavaScript, and WASM - `docs/secrets.md`: Secret backends - `docs/sidecar-deployment.md`: Sidecar deployment - `docs/threat-model.md`: SBproxy threat model - `docs/troubleshooting.md`: Troubleshooting - `docs/upgrade.md`: Upgrade Guide - `docs/wasm-development.md`: WASM transform development guide - `docs/web-bot-auth.md`: Web Bot Auth --- ================================================================ # README.md ================================================================

SBproxy

SBproxy

*Last modified: 2026-05-16*

The AI gateway built like a real proxy.

Release License CI Stars Rust 1.82+

Install · Quick start · Examples · Docs

--- ## Why SBproxy Most teams run one tool for HTTP traffic and another for LLM traffic. That's two systems to configure, deploy, and monitor. SBproxy handles both in one binary. - **One config file** replaces your reverse proxy, AI gateway, and the middleware glue between them. - **200+ LLM models** behind an OpenAI-compatible API, with fallback chains, guardrails, and budgets. - **Secure by default.** Auth, rate limiting, WAF, DDoS, and CSRF are built in. - **Hot reload** with no dropped connections. - **Sub-millisecond p99 overhead.** Idle RSS in single-digit megabytes. --- ## Install curl (macOS / Linux): ```bash curl -fsSL https://download.sbproxy.dev | sh ``` The script detects your OS and architecture, fetches the matching release binary from GitHub, and drops it in `~/.local/bin`. Override with `SBPROXY_INSTALL=` for a custom location or `SBPROXY_VERSION=` to pin a release. Homebrew (macOS / Linux): ```bash brew tap soapbucket/tap brew install sbproxy ``` Docker: ```bash docker pull ghcr.io/soapbucket/sbproxy:latest ``` From source (needs Rust 1.82+): ```bash git clone https://github.com/soapbucket/sbproxy cd sbproxy make build-release ``` --- ## Quick start We host a public HTTP echo service at `test.sbproxy.dev` (request inspection, like httpbin) so you can wire up a real upstream without leaving the SoapBucket ecosystem. Try it directly: ```bash curl https://test.sbproxy.dev/get ``` Now run the gateway in front of it. Drop this into `sb.yml`: ```yaml proxy: http_bind_port: 8080 origins: "myapp.example.com": action: type: proxy url: https://test.sbproxy.dev ``` ```bash make run CONFIG=sb.yml curl -H "Host: myapp.example.com" http://127.0.0.1:8080/get ``` `myapp.example.com` is the host your client sees; SoapBucket matches it against `origins:` and forwards to the upstream. Use any hostname you want here; `example.com` is reserved (RFC 2606), so it never collides with anything real. That's a reverse proxy. Add AI routing, auth, and rate limiting in the same file. See [`examples/`](examples/) for runnable end-to-end configurations covering each feature. --- ## Documentation The full documentation lives in [`docs/README.md`](docs/README.md): manual, configuration reference, AI gateway guide, scripting reference, performance, troubleshooting, architecture, and more. Running the operator for the first time? Start with [`docs/quickstart-operator.md`](docs/quickstart-operator.md). For contributors: [CONTRIBUTING.md](CONTRIBUTING.md). --- ## Community - [Issue Tracker](https://github.com/soapbucket/sbproxy/issues) for bug reports and feature requests. - Looking for a managed offering? [SBproxy Enterprise](https://sbproxy.dev/enterprise). --- ## Upgrading from v0.1.x (Go) SBproxy v1.0 is a Rust rewrite. The Go implementation that previously occupied this repository is archived at [soapbucket/sbproxy-go](https://github.com/soapbucket/sbproxy-go) and tagged `v0.1.2-go-final`. New work happens here. See [MIGRATION.md](./MIGRATION.md) for the upgrade path; existing `sb.yml` files should compile unchanged. --- ## License Licensed under the [Apache License 2.0](LICENSE). Free for any use, including production and commercial, with no field-of-use restriction. See also [NOTICE](NOTICE) and [TRADEMARKS](TRADEMARKS.md). A [Soap Bucket LLC](https://www.soapbucket.com) project. ================================================================ # MIGRATION.md ================================================================ ## Migrating from v0.1.x (Go) to v1.0 (Rust) *Last modified: 2026-04-28* SBproxy v1.0 replaces the Go implementation with a Rust rewrite built on Cloudflare's Pingora. This document covers what changes for operators upgrading from a v0.1.x Go binary to a v1.0 Rust binary. The v0.1.x Go binary continues to be available at `github.com/soapbucket/sbproxy-go` (archived, read-only) at the `v0.1.2` release tag. New development happens only on v1.0 and later. ## TL;DR - Your `sb.yml` is mostly portable. Field names match. Most operators upgrade by swapping the binary and re-deploying. - The install command and binary name are unchanged (`sbproxy`, `brew install sbproxy`, `ghcr.io/soapbucket/sbproxy:latest`). - A handful of v0.1.x flags were renamed or removed in v1.0. See `Breaking changes` below. - Performance improves substantially (3x throughput, 3-4x lower p99 on the AI path) with no config changes required. ## What's the same - **Config language**. `sb.yml` field names, structure, and semantics are preserved across the proxy, AI gateway, auth, policy, transform, and modifier surfaces. - **Binary name and install paths**. The binary is still `sbproxy`. `brew install sbproxy/sbproxy` and `docker pull ghcr.io/soapbucket/sbproxy:latest` continue to work. - **Hot reload**. Send `SIGHUP` (or save the config file when watcher mode is on) and the new pipeline atomically swaps in. - **Admin endpoint**. `/api/health`, `/api/metrics`, `/api/openapi.{json,yaml}` work the same way. - **CEL and Lua scripts**. Existing CEL expressions and Lua transform scripts run unchanged on the Rust extension engine. - **Provider catalog**. The 90+ AI provider catalog is the same data file; existing AI routes continue to resolve providers by the same names. ## What's new in v1.0 These are additive and do not require config changes: - **Cloudflare-style edge security policies**: `ai_crawl_control` (Pay Per Crawl), `exposed_credentials`, `page_shield`, `bulk_redirects`, `cache_reserve`, `dlp_catalog`, `web_bot_auth`. See `docs/` for each. - **OpenAPI emission**. The gateway publishes its live config as OpenAPI 3.0 at `/api/openapi.json` (admin) and per-host `/.well-known/openapi.json` (opt-in via `expose_openapi: true` on the origin). - **Storage action with real backends**. The `storage` action now drives S3, GCS, Azure Blob, or local filesystem via `object_store`. - **JavaScript and WASM scripting** alongside CEL and Lua. - **Pattern-aware PII redaction at the request boundary** for AI routes. - **Single-digit-MB idle RSS** and sub-millisecond p99 added latency. - **Hierarchical budgets across team/project/user/model** with downgrade-on-exceed. ## Breaking changes ### Removed - No CLI flags or environment variables from v0.1.x have been removed in v1.0. If your v0.1.x deployment uses a non-default flag and you cannot find the equivalent in v1.0, file an issue tagged `migration`. ### Renamed - No `sb.yml` field renames between the v0.1.x Go config schema and the v1.0 Rust config schema. (The internal config schema is also referred to as `schema-v1`; that label has not changed.) The compatibility promise is pinned by the `v1_compat::v1_fixtures_compile_unmodified` test in `crates/sbproxy-config/`. If a real-world v0.1.x config fails to compile under v1.0, that is a bug; file an issue tagged `migration`. ### Default changes - The upstream `Host` header now defaults to the upstream URL's hostname (matching nginx and Envoy `auto_host_rewrite`). Set `host_override: ` per action to keep the v0.1.x client-Host pass-through behavior. - `proxy.trusted_proxies` is now strictly enforced. When the immediate TCP peer is not in the trust list, inbound `X-Forwarded-*` headers are stripped on ingress (forgery defense). v0.1.x had a more permissive default. ## Recommended upgrade procedure 1. **Read `CHANGELOG.md`** for the full list of changes between your starting v0.1.x version and v1.0.0. 2. **Stage v1.0 alongside v0.1.x** in a non-production environment. Point a copy of your `sb.yml` at the v1.0 binary and run `sbproxy validate sb.yml`. Address any validation errors. 3. **Run a smoke test** against a small percentage of real traffic. Observe `/api/metrics` and `/api/health/targets` for any regressions in 4xx/5xx rates or upstream latency. 4. **Verify signed binary** before promoting to production. v1.0 ships with cosign signatures and an SBOM; see `SUPPLY-CHAIN.md` for the verification commands. 5. **Promote to production** once smoke is clean. 6. **Keep v0.1.x available for rollback** for at least one full deployment cycle. The v0.1.x binary at the `v0.1.2` tag of `github.com/soapbucket/sbproxy-go` is the recommended rollback target. ## Help - File migration questions as an issue tagged `migration` on `github.com/soapbucket/sbproxy`. - Security-sensitive issues go through `SECURITY.md`. - For paid migration support (e.g., enterprise customers with non-trivial v0.1.x customizations), contact support@soapbucket.dev. ================================================================ # CHANGELOG.md ================================================================ ## Changelog All notable changes to SBproxy v1.x. Versions before v1.0 shipped as the Go implementation and now live in the archived [`soapbucket/sbproxy-go`](https://github.com/soapbucket/sbproxy-go) repository. ## [Unreleased] Work that has merged to `main` since the v1.1.0 tag and is queued for the next version cut. No promises about backward compatibility for any of the new YAML fields below until the version that ships them. ## [1.1.0] - 2026-06-06 First minor release on the Rust v1.x line. This release carries breaking changes to the MCP tool-access policy (now closed-by-default and principal-aware); read the Breaking section and `docs/migration-mcp-rbac.md` before upgrading. It also ships 66 native AI providers behind one OpenAI-compatible API. ### Breaking - **MCP default-deny**: `ToolAccessPolicy` flipped from open-by-default to closed-by-default. An unknown caller (no matching ACL rule) is denied every tool. An empty `allowed: []` list under an ACL rule means "deny all", not "allow all". Operators who want the legacy behaviour add `default_allow: true` on the origin's MCP action. The legacy `key_permissions: { key: [tools] }` shape is gone; rewrite to the principal-aware `tool_access[]` selector list. See `docs/migration-mcp-rbac.md`. - **MCP principal-aware ACL**: `ToolAccessPolicy` now carries `tool_access[]` rules with `principals[]` selectors (`virtual_key`, `sub`, `team`, `project`, `user`, `role`, `tenant_id`) plus an `allowed[]` tool list. The legacy `key_permissions: HashMap>` map is removed along with `ToolAccessPolicy::is_tool_allowed(key, tool)`; the new surface is `policy.check(&principal, tool) -> ToolAccessDecision` and `policy.filter_tools(&principal, &tools)`. `tools/list` now filters by RBAC against the inbound principal (the legacy schema leaked tool names through `tools/list` even when the gate would deny the matching `tools/call`). A new `tool_quotas[]` table enforces per-tool sliding-window quotas keyed on `(tenant_id, principal_id, tool_name)`. See `docs/migration-mcp-rbac.md`. ### Added - **66 native AI providers behind one OpenAI-compatible API.** The embedded `ai_providers.yml` registry ships 66 providers (up from 43), adding Hugging Face Inference, GitHub Models, Vercel AI Gateway, Nebius, Baseten, Lambda, FriendliAI, Scaleway, Nscale, DigitalOcean Gradient, OVHcloud, Inference.net, kluster.ai, OpenPipe, Writer, Upstage, Aleph Alpha, MiniMax, Volcengine Ark (Doubao), Tencent Hunyuan, Baidu Qianfan (ERNIE), StepFun, and Mixedbread. The catalog is plain YAML and operator-extensible at runtime via `proxy.ai_providers_file`; the `model` field passes through to the upstream, so any model a provider serves is reachable without per-model config. The "200+ models" reach is native (bring your own keys); OpenRouter is one provider among the 66, not a dependency. See `docs/providers.md#extending-the-provider-catalog`. - **Session ledger from live MCP traffic.** A new top-level `session_ledger:` block makes SBproxy emit the canonical `session-ledger-v1` run record (shared with mcptest) from its `tools/call` path: one `header` per session, then one `tool_call` record per call carrying `session_id`, a zero-based `hop_index`, the bare tool name and server, redacted `params` / `result`, an error flag, and the round-trip `duration_ms`. `sink: logging` (default) emits each record as a `session_ledger` tracing line; `sink: file` with a `path:` appends NDJSON. Off unless `enabled: true`; when off the tool-call path pays only a single atomic load. Payloads are redacted with the same secret-stripping the access log uses. See `docs/mcp.md` and `examples/mcp-federation/sb.yml`. - **Structured-log schema v2 (`SCHEMA_VERSION = "2"`).** Three changes land together so downstream tooling can read them in one swing: optional `session_id` and `user_id` top-level fields parallel the `RequestEvent` envelope (cross-surface JOIN no longer relies on `request_id` alone); the field-key redaction marker is normalised to `[REDACTED:]` everywhere (was `` in v1) so the schema-v1 layer matches the existing PII-rule replacement shape; the schema bump is additive on the field set (a v1 reader parsing a v2 line keeps working because every new field is `skip_serializing_if = Option::is_none`). Marker normalisation is a string change; downstream tooling that greps for the old `` form must update. - **Phase-timing breakdown on the access log + new `sbproxy_phase_duration_seconds` Prometheus histogram.** The access log carried `latency_ms` end to end and that was it; an operator looking at a slow request could not tell from the log whether the time went to the auth provider, the upstream, or a response transform. Three new optional fields land on every `AccessLogEntry`: `auth_ms` (request_start → auth provider returned), `upstream_ttfb_ms` (request_start → first upstream response byte), `response_filter_ms` (first upstream byte → end of `response_filter`). All three are `Option` and `serde-skip` when None, so origins that short-circuit (cache hit, auth deny) keep compact lines. The same observations also feed a new `sbproxy_phase_duration_seconds{phase, origin}` histogram with buckets identical to `sbproxy_request_duration_seconds` for cross-cut dashboards. See `docs/access-log.md` and `docs/metrics-stability.md`. - **Nine standard HTTP fields on the access log: `host`, `query`, `protocol`, `scheme`, `user_agent`, `referer`, `upstream_status`, `response_content_type`, `response_content_encoding`.** The log was missing the canonical fields most HTTP access-log consumers expect (Apache, NGINX, Envoy, the cookie-cutter ELK pipeline). `host` is the client-supplied Host header (distinct from `origin`, the matched virtual-host pattern); `upstream_status` is the upstream's response code when the proxy rewrote the status the client sees. All nine are `Option`, `serde-skip` when not applicable. Promoted from the generic header allowlist because nearly every analytics consumer wants them. See `docs/access-log.md`. - **Opt-in OpenTelemetry metrics mirror alongside the canonical Prometheus surface.** New `telemetry.export_metrics: true` (with `telemetry.metrics_interval_secs` cadence, default 30s) installs an OTel `MeterProvider` that ships observations to the same OTLP collector the trace pipeline targets. The first two mirrored instruments are `sbproxy.phase.duration` and `sbproxy.request.duration`; record-paths fall back to OTel's global no-op meter when the export is off, so operators pay nothing for the mirror unless they opt in. The Prometheus surface remains canonical; this is for operators who already aggregate via Mimir / Datadog / Honeycomb and want to skip the Prometheus scrape. - **OIDC Relying-Party stack shipped end to end.** `/oidc/callback` (auth-code + PKCE + sealed session cookie) plus the helpers + config wiring for `/.well-known/openid-configuration` discovery, refresh-token rotation, RP-initiated logout at `/oidc/logout`, userinfo → `X-Auth-*` trust headers, an optional server-side session store (in-memory + KV-backed redb/file/Redis) for targeted revocation. See `docs/configuration.md` § OIDC auth. - **OpenAI Apps SDK / MCP Apps (SEP-1865) compatibility.** Gateway-side `_meta.mcpApps` passthrough for tool definitions, `params.audit.cause` plumbing on `tools/call`, and a typed validator set (`apps.template_declared`, `apps.iframe_sandbox`, `apps.csp_present`, `apps.cache_metadata`) usable by sbproxy, the enterprise extension, and any CI gate over the `sbproxy-plugin` surface. - **Web Bot Auth full conformance, publish + sign sides.** SBproxy now publishes its own JWKS-shaped directory at `/.well-known/http-message-signatures-directory` and a Signature Agent Card at `/.well-known/web-bot-auth/agent-card` (opt in via `web_bot_auth_publish` per origin). New `sbproxy-middleware::signatures::MessageSignatureSigner` primitive signs outbound requests per RFC 9421, round-trips through the existing verifier. See `docs/web-bot-auth.md` and `examples/web-bot-auth-publish/`. - **Three previously-undocumented OSS policies now have docs + runnable examples:** `object_authz` (BOLA + BFLA with enumeration detection), `content_digest` (RFC 9530 request-body verification), `agent_budget` (per-agent semantic rate limit). See `docs/object-authz.md`, `docs/content-digest.md`, `docs/agent-budget.md`. - **Discoverable FAQ.** `docs/faq.md` covers install, common 401 causes, OIDC minimal config, log levels, OSS-vs-enterprise scope, and pointers into the rest of `docs/`. Wired into `docs/README.md` under "Getting started". - **Explicit SIGINT/SIGTERM handling with a structured shutdown event and a 30s default drain budget.** Pingora's `Server::run_forever` already trapped SIGTERM and SIGINT, but the proxy emitted no operator-facing log line on receipt, so a pod eviction or `docker stop` looked the same as a crash in the log stream. This change subscribes to Pingora's execution-phase broadcast and emits `shutdown_signal_received`, `shutdown_grace_period`, and `shutdown_complete` tracing events with the resolved grace budget. The Kubernetes operator (`sbproxy-k8s-operator`) now installs the same SIGINT/SIGTERM handlers via `tokio::signal::ctrl_c` and `tokio::signal::unix::signal(SignalKind::terminate())`; before this change the operator relied on the orchestrator SIGKILL at `terminationGracePeriodSeconds`. The drain budget is the new `SBPROXY_SHUTDOWN_GRACE_MS` env var (or `--shutdown-grace-ms` CLI flag) which defaults to 30000ms, matching Kubernetes' default `terminationGracePeriodSeconds`. The legacy `SB_GRACE_TIME` / `--grace-time` (seconds) still works and takes precedence when explicitly set; an unset legacy var lets the new 30s default apply. Operator exits 0 on a clean drain, 1 when the grace window is exceeded, so the orchestrator can alert. Documented in `docs/manual.md` §3 and `docs/kubernetes.md` §Graceful shutdown. - **Idempotency middleware now engages on AI gateway origins (`action: ai_proxy`).** Before this change, the RFC 8594 middleware only ran on general HTTP origins (`action: proxy`). AI customers using `Idempotency-Key` headers for Stripe-style retries were double-billed by the upstream provider because the proxy did not replay from cache. The fix engages the same primitive in `handle_ai_proxy` after the request body is buffered (the AI gateway already buffers for the JSON parser, model router, and guardrails) and before the upstream call. On a cache hit the gateway writes the cached `(status, headers, body)` triple directly to the client with `x-sbproxy-idempotency: HIT` and never contacts the provider. On a body conflict the gateway returns 409 `ledger.idempotency_conflict` per the RFC. On a miss the gateway forwards, then records the post-translation OpenAI-shape bytes the client actually saw so retries replay byte-identical. Reuses the same per-request and pool caps shipped on `CompiledIdempotency`: `max_request_body_bytes`, `max_response_body_bytes`, `max_concurrent_buffers`. The four skip markers (`SKIPPED-OVERSIZE-REQUEST`, `SKIPPED-POOL-FULL`, `SKIPPED-OVERSIZE-RESPONSE`, `SKIPPED-MULTIPART`) stamp on the outgoing response so operators see graceful degradation in dashboards. Multipart bodies (audio transcription, image edit / variation, file upload) skip caching with `SKIPPED-MULTIPART` because the cache primitive stores raw bytes and multipart boundaries may be regenerated by clients on retry. Streaming (SSE) chat completion responses abandon the cache record on oversize because framing-aware capture is out of scope for v1. - **`proxy_status` and `problem_details` now cover upstream failures.** Before this change, `proxy_status.enabled: true` stamped the `Proxy-Status` header on proxy-generated errors (auth deny, policy deny, default 404) but **not** on upstream failures routed through Pingora's `fail_to_proxy` path (connect refused, connect timeout, TLS handshake error, mid-stream connection loss). The fix wires both blocks into the upstream-failure path so dashboards consuming `Proxy-Status` see consistent coverage across error sources. The status code + RFC 9209 `error` token derive from the Pingora `ErrorType` via a new `map_upstream_failure` translator: 504 + `connection_timeout` for `ConnectTimedout` / `ReadTimedout`; 502 + `connection_refused` for `ConnectRefused`; 502 + `tls_protocol_error` for TLS errors; 502 + `connection_terminated` for mid-stream loss; 502 + `http_request_error` as the catch-all. When `problem_details.enabled: true` the body is now rendered as `application/problem+json` for upstream failures too, with the RFC 9209 error token in the `detail` field so both signals share the same vocabulary. - **Idempotency cache check moved to `request_filter`.** Before this change, the cache lookup ran in `request_body_filter`, after Pingora had already opened the upstream TCP connection. On a cache hit the upstream observed one aborted partial request before the proxy served the cached response to the client. The check now runs before Pingora's upstream-peer phase: cache hits and body conflicts write the response from inside `request_filter` and return `Ok(true)`, so the upstream is never contacted at all. On cache miss the proxy buffers the body (bounded by `max_request_body_bytes` from PR #139), then re-injects it via `request_body_filter` at end-of-stream so Pingora's normal upstream forwarding picks it up. Existing e2e tests now assert the upstream-not-contacted invariant; the previous "may observe one aborted partial request" caveat has been removed from `docs/configuration.md` and the example README. - **Idempotency middleware: per-request and pool caps.** Three new fields on the `idempotency:` block bound memory usage and let the middleware gracefully degrade under pressure rather than buffering unbounded bodies. `max_request_body_bytes` (default 1 MiB) caps the per-request buffer; bodies above the cap skip caching with `x-sbproxy-idempotency: SKIPPED-OVERSIZE-REQUEST` stamped on the response. `max_response_body_bytes` (default 1 MiB) caps the per-response cache buffer; responses above the cap stream through uncached. `max_concurrent_buffers` (default 256) is a per-origin pool over concurrent buffered requests; pool exhaustion skips the cache with `x-sbproxy-idempotency: SKIPPED-POOL-FULL`. Worst-case memory is bounded at `max_concurrent_buffers * max_request_body_bytes` per origin. - **RFC 8594 idempotency middleware (`idempotency:`).** Per-origin block that engages on POST / PUT / PATCH (configurable via `methods:`) when an `Idempotency-Key` header is present. The middleware sits ahead of policies in the handler chain, hashes the request body, and short-circuits the three branches per the RFC: cache hits replay the cached `(status, headers, body)` verbatim with `x-sbproxy-idempotency: HIT`; conflicts (same key, different body) return 409 with the `ledger.idempotency_conflict` JSON body; misses forward to the upstream and capture the response for the next retry. Workspace-isolated keys prevent cross-tenant collisions. Memory backend (default) is per-origin and per-replica; `backend: redis` binds to `proxy.l2_store` at config-compile time for cluster-wide replay. Cached replays do not consume rate-limit slots. Documented in `docs/configuration.md` and demonstrated by `examples/idempotency/`. Known v1 limitation: the cache check fires in `request_body_filter`, after Pingora has already opened the upstream connection. On a cache hit the upstream observes one aborted partial handshake before the proxy serves the cached response to the client; future work moves the check earlier so the upstream never sees the replay. - **RFC 9457 problem-details default renderer (`problem_details:`).** New per-origin block that opts in to `application/problem+json` for proxy-generated errors (authentication denials, policy denials, default 404) that are not matched by an authored `error_pages` entry. The two blocks compose: per-status custom pages still win when authored; `problem_details` catches everything else with a structured `type` / `title` / `status` / `detail` / `instance` body. `type_base_uri` produces stable per-status `type` URIs; `include_detail: false` suppresses the internal error string. Documented in `docs/configuration.md` and demonstrated by `examples/problem-details/`. - **Typed `error_pages` config.** The opaque `error_pages: Option` field is now typed as `Option>`. Public types `ErrorPageEntry`, `StatusSpec`, and `ProblemDetailsConfig` live in `sbproxy-config`. The authored YAML shape is unchanged: every existing `error_pages:` list keeps parsing, including the `status:` single- int / `[status]` list shorthand and `template: true` substitution. The OpenAPI emitter now walks typed entries to populate per-status `responses` keys (the previous code inspected the field as an object and silently produced no entries; this is a bug fix on top of the migration). - **AI gateway Realtime WebSocket dispatch (Phase 7, Option C).** `GET /v1/realtime` requests with `Upgrade: websocket` against an `ai_proxy` origin are now dispatched through the AI gateway pipeline: - Pre-upgrade gating runs the same surface classification, 501 capability check (only providers in `provider_supports_realtime` are eligible; today: OpenAI), per-surface rate limit, and provider selection as the rest of the AI surface set. - After the gating passes, Pingora forwards bytes between client and provider transparently through the upgraded connection. The dispatcher does not terminate the WebSocket; per-frame guardrails and frame-exact audio metering are reserved for a future enterprise terminate-and-relay path so every AI gateway feature added to `handle_action` continues to apply to realtime through one shared code path. - `sbproxy_ai_realtime_sessions_active` (gauge), `sbproxy_ai_realtime_session_duration_seconds` (histogram), `sbproxy_ai_realtime_audio_seconds_total` (counter), and `sbproxy_ai_realtime_frames_forwarded_total` (counter) are registered. The OSS dispatch ticks the gauge on session open and observes the duration histogram on close. Documented in `docs/metrics-stability.md`. - At session close, `logging` emits a session-end `AiBillingEvent` with `AudioSeconds { seconds }` valued at the wall-clock session duration so realtime usage appears on the standard billing-event bus alongside chat/image/audio. - `RealtimeSessionTracker` (lock-free atomic counters) and `audio_seconds_from_frame(bytes, sample_rate, channels)` ship in `sbproxy-ai::realtime` for the eventual terminate-and-relay path to consume. - `docs/ai-gateway.md` documents the new dispatch path with a YAML example and the per-surface rate-limit knob. - **AI gateway OpenAI surface dispatch (Option A).** The `ai_proxy` action now routes every OpenAI-compatible surface through a single classifier with per-surface observability and gating: - New `AiSurface` enum + `classify_surface(method, path)` cover chat completions, models, embeddings, assistants and threads (full v2 surface), batches, fine-tuning, files, realtime, image generation/edits/variations, audio transcription/speech, moderations, and reranking. Marked `#[non_exhaustive]` so future variants don't break downstream pattern matches. - Method coverage extended past GET/POST: DELETE, PUT, PATCH, HEAD, and OPTIONS dispatch through `AiClient::forward_with_method` without engaging the JSON body-parse pipeline. - Multipart bodies (image edits/variations, audio transcription, file uploads) byte-forward via `AiClient::forward_bytes` with the inbound `Content-Type` preserved. Previously these surfaces returned a 400 "invalid JSON body" from the chat-path body parse. - Provider capability matrix in `api_routes.rs` corrected: Anthropic no longer claims audio/reranking/moderations support, Gemini no longer claims moderations. A new `provider_supports_surface` matrix gates non-universal surfaces with **501 Not Implemented** when no configured provider supports the surface. - Per-surface observability: new `sbproxy_ai_surface_requests_total{surface, method}` counter and `sbproxy_ai_surface_request_duration_seconds{surface, method}` histogram. Sibling of the existing per-provider metrics so dashboards can pivot between surface and provider views. Documented in `docs/metrics-stability.md`. - Per-surface input guardrails: image generation, audio speech, reranking, and moderations bodies now have their input field (`prompt`, `input`, `query`, `input`) extracted and run through the same guardrail pipeline as chat-style `messages`. - Per-surface rate limits: new `per_surface_rate_limits` field on the AI handler config, keyed by surface label. 429 fires before any upstream call when the cap is hit. - Surface-aware billing event: new `AiBillingEvent` carrying `AiUsage` with `Tokens`, `Images { count, resolution }`, `AudioSeconds`, `Characters`, `RerankUnits`, and `PerCall` variants. Every dispatched request emits exactly one event. Image generation, audio speech, and reranking emit real cost via per-surface pricing tables (`lookup_image_price`, `lookup_audio_speech_price`, `lookup_rerank_price`, `lookup_audio_transcription_price`). `docs/ai-gateway.md` documents the new surface, methods, guardrails, and rate-limit knobs. - **Policy verdict audit bus + Plugin dispatch.** Wires the previously-dead `Policy::Plugin` arm in `server.rs` to call the trait's `enforce()`, folds the returned `PolicyDecision` into the existing chain reducer, and emits a `PolicyVerdictEvent` for every decision on a bounded `tokio::sync::mpsc` audit bus per `docs/adr-policy-audit-binding.md`. The OSS substrate ships an in-memory drain stub; enterprise replaces the consumer with a NATS-backed audit-chain subscriber. Multi-policy resolution rules from `docs/adr-policy-verdict-shape.md` are implemented at the chain level: any Deny wins, the first Confirm wins over AllowWithHeaders, AllowWithHeaders accumulate, otherwise Allow. `Confirm` in OSS routes through the existing AllowWithHeaders mechanism with `X-Policy-Confirm: ` stamped on the response; an `expires_at` already in the past synthesises a 410 and an SSRF-blocked `webhook_url` synthesises a 502 at decision time. New metrics: `sbproxy_policy_audit_events_total{verdict, surface, policy_id}`, `sbproxy_policy_audit_events_dropped_total{tenant}`, `sbproxy_policy_decision_duration_seconds{surface}`. New Grafana dashboard `sbproxy-policy-verdicts` covers the surface. ([crates/sbproxy-observe/src/events.rs], [crates/sbproxy-observe/src/metrics.rs], [crates/sbproxy-core/src/policy_bus.rs], [crates/sbproxy-core/src/policy_dispatch.rs], [crates/sbproxy-core/src/server.rs], [crates/sbproxy-plugin/src/traits.rs], [dashboards/grafana/sbproxy-policy-verdicts.json]) - **Synthetic-transaction `/readyz` probe.** Optional background driver that fires an in-process request through the compiled handler chain on a fixed cadence and reports the verdict as a `synthetic_pipeline` component on `/readyz`. Disabled by default; opt in via `proxy.synthetic_probe.enabled: true` and define an origin for the configured sentinel hostname (default `__synthetic.local`) pointing at a non-network action (`static`, `mock`, `echo`, `noop`). Failures bump the new `sbproxy_synthetic_probe_failures_total{reason}` counter so they do not pollute real-traffic error metrics. ([crates/sbproxy-config/src/types.rs], [crates/sbproxy-core/src/synthetic.rs], [crates/sbproxy-observe/src/synthetic.rs], [crates/sbproxy-observe/src/metrics.rs], [e2e/tests/synthetic_probe.rs]) - **`GET /admin/drift` config drift endpoint.** Returns whether the on-disk config file has diverged from what the running proxy has loaded, without triggering a reload. Compares a content-hash baseline captured at startup (and refreshed on every `/admin/reload`) against a fresh hash of the current file. K8s operators and dashboards scrape this so they can flag an edited config that has not been hot-reloaded yet. Documented in `docs/configuration.md` § Admin fields. ([crates/sbproxy-core/src/admin.rs], [crates/sbproxy-core/src/server.rs], [docs/configuration.md]) - **Deterministic clock-skew testing hooks.** `ClockSkewMonitor` now accepts an injected clock source for tests while production continues to use the system clock. ([crates/sbproxy-observe/src/clock_skew.rs]) - **Operator runbook hooks and fast-track ADR template.** Added a dashboard-oriented operator runbook, linked all Grafana panels to the relevant triage sections, and added a fast-track ADR amendment template plus OSS threat-model refresh checklist. ([docs/operator-runbook.md], [docs/adr-fast-track-amendment.md], [docs/threat-model.md], [dashboards/grafana/]) - **Live reverse-DNS resolver for agent verification.** `SystemResolver` now uses `hickory-resolver` for PTR and forward-confirmation lookups, replacing the previous typed PTR stub. ([crates/sbproxy-security/src/agent_verify.rs]) - **Multi-window SLO burn-rate replay harness.** `sbproxy-observe` now includes a burn-rate evaluator and `AlertSnapshot` replay helper for substrate availability and latency alert taxonomy tests. ([crates/sbproxy-observe/src/alerting/burn_rate.rs], [e2e/tests/slo_burn_rate.rs]) - **Vault-style quote-token seed references.** `ai_crawl_control.quote_token.secret_ref` now accepts `secret:` references resolved through `sbproxy-vault` with the existing environment fallback, in addition to the older `secret_ref.env` and inline `seed_hex` paths. ([crates/sbproxy-modules/src/policy/ai_crawl.rs]) - **Operator first-24-hours quickstart.** Added a concise `docs/quickstart-operator.md` covering deploy, `/readyz`, metrics, Grafana, logs, and rollback, linked from the README and Kubernetes docs. ([docs/quickstart-operator.md]) - **Hostname cardinality override for metrics.** `proxy.metrics.cardinality.hostname_cap` can lower the `hostname` label budget independently from the default per-label cap, enabling deterministic overflow tests and tighter multi-tenant Prometheus budgets. ([crates/sbproxy-config/src/types.rs], [crates/sbproxy-observe/src/cardinality.rs]) - **`release-fast` build profile for CI images.** Docker-based CI and local kind smoke-test builds can now use `CARGO_PROFILE=release-fast` to skip fat LTO and use more codegen units, cutting link memory/time while leaving production release artifacts on the existing `release` profile. ([Cargo.toml], [Dockerfile.ci], [Dockerfile.cloudbuild]) - **Reproducible build probe workflow.** CI now has an informational double-build lane that builds the release binary twice on independent GitHub-hosted runners, uploads each binary and SHA-256, and publishes a comparison report without yet treating non-identical output as a failure. ([.github/workflows/reproducible-build.yml], [SUPPLY-CHAIN.md]) - **Phase 2: CEL `features[...]` namespace.** Per-request flags parsed from the `x-sb-flags` header and `?_sb.` query prefix are now exposed to CEL expressions. Built-in flags surface as bools (`features.debug`, `features.trace`, `features["no-cache"]`, `features.any_set`); free-form `k=v` extras surface as strings (`features["env"]`). Wired into the rate-limit CEL evaluator and `ExpressionPolicy::evaluate_with_views`. ([crates/sbproxy-extension/src/cel/context.rs]) - **`SB_WORKER_THREADS` env var.** Positive integer overrides the auto-detected Pingora worker thread count (`std::thread::available_parallelism()`). Useful for benchmarking with a fixed worker count or capping the pool below a cgroup quota. ([crates/sbproxy-core/src/server.rs]) - **`/live`, `/livez`, `/ready`, `/healthz`, and rich `/health` admin endpoints.** `/livez` returns `{"alive":true}` on every call and never 503s, so K8s liveness probes don't trip on transient readiness failures. `/live` is a bare alias. `/ready` is an alias for `/readyz`. `/healthz` stays a fixed liveness body, while `/health` now returns version, build hash, timestamp, uptime, and readiness checks for dashboards / SIEM ingestion. Existing `/readyz` behavior unchanged. ([crates/sbproxy-observe/src/health.rs], [crates/sbproxy-core/src/admin.rs]) - **`--request-log-level` and `SB_REQUEST_LOG_LEVEL`.** Operators can now tune request/access logging independently from application logs. The setting appends an `access_log=` target directive to the effective `tracing-subscriber` filter while preserving the existing per-target `RUST_LOG` escape hatch. ([crates/sbproxy/src/main.rs]) - **Access-log forced emission and file output.** `access_log` now supports `slow_request_threshold_ms` and `always_log_errors` so slow requests and 5xxs bypass sampling after status/method filters match. It also supports `output: { type: file, path, max_size_mb, max_backups, compress }` for direct JSON-line access-log files with size-based rotation and optional gzip compression of rotated files. ([crates/sbproxy-config/src/types.rs], [crates/sbproxy-core/src/server.rs], [crates/sbproxy-observe/src/access_log.rs]) - **OCSP stapling for the manual fallback cert.** `OcspStapler` (which previously existed but was unwired) now does an immediate fetch on startup, refreshes every 12 hours, and pushes the bytes into `CertResolver::update_fallback_ocsp` so subsequent rustls handshakes staple the response on the wire. No-op when no manual cert is configured or when the cert lacks an AIA extension. ([crates/sbproxy-tls/src/ocsp.rs], [crates/sbproxy-tls/src/cert_resolver.rs]) - **Readiness synthetic probe primitive.** `sbproxy-observe` now ships a `SyntheticProbe` type so startup or test wiring can register an in-process readiness probe that exercises a caller-provided path and reports through the same `/readyz` component model as built-in probes. ([crates/sbproxy-observe/src/health.rs]) ### Removed - **`sbproxy_ai::IdempotencyCache`.** The OSS AI gateway never wired this cache; it was publicly re-exported but had zero callers in the workspace. The new `idempotency:` block on general HTTP origins (above) supersedes it. AI gateway integration is a follow-up tracked in `docs/missing.md`. Plugin authors that imported the removed type can switch to `sbproxy_middleware::idempotency::{IdempotencyCache, InMemoryIdempotencyCache, KvIdempotencyCache}` which carries the richer surface (workspace isolation, body-hash conflict detection, conflict body builder). ### Changed - **mTLS now wired on the ACME path.** Previously, an operator who configured `mtls:` alongside `acme:` got plain TLS until they noticed clients reaching the upstream without the expected cert headers. The ACME branch now mirrors the manual-cert branch: builds `TlsSettings` with the configured `ClientCertVerifier` and falls back to plain TLS only when mTLS setup itself fails. ([crates/sbproxy-core/src/server.rs]) - **Examples and Kubernetes smoke checks are local-only.** The Docker-backed examples smoke lane and kind-based Kubernetes operator smoke lane no longer run automatically on pull requests. They remain available as `make examples-smoke` and `make k8s-operator-smoke` for explicit local / release validation. ([Makefile], [docs/kubernetes.md]) - **Reload drain state is now one coherent atomic snapshot.** The drain flag and active request count are packed into one `AtomicU64`, so `is_draining()` no longer combines two independent relaxed loads. Added loom coverage for the last-request-finish interleaving. ([crates/sbproxy-core/src/reload.rs]) - **Optional readiness dependencies no longer fail `/readyz` by default.** The default admin health registry now registers absent ledger and bot-auth-directory probes as `not_configured`, matching the existing future-wave stubs and keeping `/readyz` green when those optional services are not wired in a deployment. ([crates/sbproxy-observe/src/health.rs], [crates/sbproxy-core/src/admin.rs]) - **`docs/manual.md` rewrites** matching what actually ships: - §6 Health checks: `/livez`, `/readyz`, `/healthz`, and rich `/health` semantics, replacing the old per-endpoint URL fork diagram and stale `/health` alias wording. - §10 Feature flags: CEL accessor table, kill-switch note, and a "planned, not yet wired" note for Lua / JS / WASM features namespaces and workspace-level pub/sub flags. - §3 CPU detection: documents the new `SB_WORKER_THREADS` knob. - §13 env-var table: adds `SB_WORKER_THREADS` and `SB_DISABLE_SB_FLAGS`; later updates add `SB_REQUEST_LOG_LEVEL` and access-log file/forced-emit examples. ### Fixed - **CAP `sub` binding only fires for a genuinely resolved agent.** The CAP verifier binds a token's `sub` to the request's resolved agent id (rejecting a mismatch with `403`). Because the agent-class resolver is installed with the built-in catalog by default and always stamps *some* id (falling through to the `human` sentinel when no signal matches), the binding would have rejected every CAP token whose `sub` was not literally `"human"`, even on origins that never configured agent classes. The binding now skips the resolver's fallback / `human` verdict and engages only when the resolver actually identified an agent, so an unauthenticated caller falls through to the normal CAP validation path. Set `cap.require_agent_binding: true` to fail closed when no agent is resolved. - **Virtual-key model allow/block lists are now enforced.** A virtual key (or `ai_provider` credential) with `models.allow` / `models.block` declared its scope but the AI dispatch path never checked it, so a key confined to a subset of the gateway's models could still call any model the gateway served. The matched key's allow/block lists are now enforced against the effective model (after any `route_to_model` rewrite): a request for a disallowed model is rejected with `403` before any upstream call, the block-list taking precedence over the allow-list. Keys with no `models.allow` are unaffected. See `examples/ai-virtual-keys/`. - **Licensing-projection wire formats now match the canonical specs [BREAKING].** Two projection emitters were producing document shapes that didn't match their cited specifications. `/licenses.xml` previously declared the namespace `https://rsl.ai/spec/1.0` and emitted a flat `...` document. The canonical RSL Collective spec at uses the namespace `https://rslstandard.org/rsl` and a nested `...` shape; the `` `url` attribute is the canonical wildcard `https:///*` for the origin-wide license. `/.well-known/tdmrep.json` previously wrapped its policies in a `{"version", "generated", "policies": [...]}` envelope; the W3C TDMRep CG-FINAL spec mandates a bare JSON array at the document root with `location`, `tdm-reservation` (integer 0 or 1), and `tdm-policy` (URL of the policy document) fields per entry. Both emitters now produce the canonical shapes. Operators consuming `/licenses.xml` or `/.well-known/tdmrep.json` programmatically must update their parsers to the new shapes; the in-process JSON envelope and the response middleware that stamps `TDM-Reservation: 1` and the URN-bearing `license` field are unaffected. Conformance is asserted by the active structure-shape tests; the earlier schema-validation tests were removed because neither standard publishes a machine-readable schema to validate against (RSL 1.0 is prose-only; W3C TDMRep ships no JSON Schema). ([crates/sbproxy-modules/src/projections/licenses.rs], [crates/sbproxy-modules/src/projections/tdmrep.rs], [e2e/tests/rsl_licenses_projection_e2e.rs], [e2e/tests/tdmrep_projection_e2e.rs]) - **Build under prometheus 0.14 type inference.** Sites in `sbproxy-observe::metrics` and `sbproxy-core::server` that passed heterogeneous `&[&String, &str]` arrays to `prometheus::with_label_values` no longer compile on prometheus 0.14 because Rust unifies the array element type to `&String` and rejects bare `&str` literals. Coerced all such call sites to uniform `&[&str]` via `.as_str()` so the workspace builds clean again. No behavioural change. ([crates/sbproxy-observe/src/metrics.rs], [crates/sbproxy-core/src/server.rs]) - **WASM extension docs corrected.** `CLAUDE.md` previously labeled the WASM surface as "WASM stub" while marketing docs claimed production-grade support; the runtime is real (`wasmtime` + WASI preview-1 with sandboxed memory and CPU caps, stderr capture, no FS or network). `llms.txt` also incorrectly claimed "WASI networking with host allowlist" but `allowed_hosts` is parsed-but-inert until WASI sockets land. CLAUDE.md and llms.txt now match the shipped surface. ([CLAUDE.md], [llms.txt], [crates/sbproxy-extension/src/wasm/mod.rs]) - **E2E proxy startup flake under CPU contention.** The e2e `ProxyHarness` keeps its HTTP-level readiness probe, but now gives release/debug proxy boots a 10-second window instead of 5 seconds so tests like `action_graphql` do not fail spuriously while cargo is competing for CPU. ([e2e/src/lib.rs]) - **Docs CI Rust snippet failures.** Workspace-dependent documentation examples that cannot compile as standalone `rust-script` programs are now tagged `rust,no_run`, keeping docs-ci focused on executable snippets instead of illustrative API fragments. ([docs/architecture.md], [docs/audit-log.md], [docs/cache-reserve.md]) - **Unsafe-code drift guardrails.** Crates that do not need unsafe now forbid it at the crate root, while `sbproxy-vault` explicitly allows its narrowly-scoped volatile zeroization unsafe with an inline justification. ([crates/sbproxy-*/src/lib.rs]) - **Outbound webhook delivery identity headers.** Signed customer webhooks now include `Sbproxy-Subscription-Id`, `Sbproxy-Delivery-Id`, and 1-based `Sbproxy-Attempt` headers, with a fresh delivery ULID on every retry attempt. ([crates/sbproxy-observe/src/notify.rs]) - **AI client retry resilience.** `MemoryBatchStore` now uses `parking_lot::Mutex` so a panic in one worker cannot poison the in-memory batch map for every later operation. Provider retries now honor `provider.max_retries` as same-provider retry attempts with bounded jittered exponential backoff before recording provider failure and moving to the next eligible provider. ([crates/sbproxy-ai/src/batch.rs], [crates/sbproxy-ai/src/client.rs]) - **Dynamic Web Bot Auth directory dispatch.** The main request auth path now invokes `BotAuthProvider::verify_async` when a configured hosted directory and `Signature-Agent` header are present, so dynamic directory failures surface distinctly instead of falling through the static inline-agent verifier. ([crates/sbproxy-core/src/server.rs]) - **ACME/Pebble order polling.** Certificate issuance now polls the authorization to `valid` after responding to the HTTP-01 challenge before polling the order to `ready`, matching Pebble's stricter state progression. Finalization also parses the order returned by the finalize response and falls back to polling the original order URL, avoiding accidental POST-as-GET polling of the finalize URL when `Location` is absent. ([crates/sbproxy-tls/src/acme.rs]) - **JWKS unknown-`kid` key rotation.** JWTs that reference an unseen `kid` now trigger one rate-limited JWKS refetch before failing closed, with a Prometheus counter for success / failure / rate-limited outcomes. This avoids requiring operator intervention for routine IdP key rotation. ([crates/sbproxy-modules/src/auth/jwks.rs], [crates/sbproxy-modules/src/auth/mod.rs], [crates/sbproxy-observe/src/metrics.rs]) - **Rate-limit LRU pollution bypass.** Per-key local token buckets now preserve deny state in a bounded cold tier after hot LRU eviction, so a spray of attacker keys cannot reset an already-throttled legitimate client. ([crates/sbproxy-modules/src/policy/mod.rs]) ### Open follow-ups Tracked in Linear, not in this changeset: - the upstream issue full configurable synthetic transaction through the live request pipeline. The `SyntheticProbe` readiness primitive has landed; config and pipeline execution remain. - Phase 2.5: Lua / JS / WASM `features` namespace, plus workspace-level flags via messenger pub/sub - the upstream issue remaining rate-limiter proptest coverage. The reload-drain loom portion has landed. ## [1.0.1] - 2026-05-04 Patch release. No runtime behavior changes. ### Fixed - **Container image publish**: the `release.yml` workflow's docker prepare step extracted the flat-layout tarballs into `/tmp/` directly, which tripped a sticky-bit `Cannot utime` error on the archive's `./` entry and caused `ghcr.io/soapbucket/sbproxy:1.0.0` to never publish. Each platform tarball now extracts to a per-arch staging dir before the binary moves into the docker context. ## [1.0.0] - 2026-05-03 First Rust release of SBproxy on this repository. ### What changed - **Implementation**: SBproxy is now written in Rust on Cloudflare's Pingora. The Go implementation that previously occupied this repo (`v0.1.0` through `v0.1.2`) has moved to [`soapbucket/sbproxy-go`](https://github.com/soapbucket/sbproxy-go), preserved as the `v0.1.2-go-final` branch and tag, and is now in maintenance-only mode. - **Data plane**: routing, AI gateway, MCP gateway, guardrails, security policies, and scripting (CEL, Lua, JavaScript, WebAssembly) all ship open source in this release. See [`docs/architecture.md`](docs/architecture.md) for the request pipeline shape. - **Enterprise tier**: see [`docs/enterprise.md`](docs/enterprise.md) for what enterprise adds on top of the OSS data plane and how to request access. ### Upgrading from v0.1.x (Go) The internal config schema (`schema-v1`) is supported by both the Go `v0.1.x` line and this Rust `v1.x` line, so existing `sb.yml` files should compile unchanged. See [`MIGRATION.md`](MIGRATION.md) for the full upgrade path. ================================================================ # docs/README.md ================================================================ ## SBproxy documentation *Last modified: 2026-06-08* The AI gateway built like a real proxy. One binary, built on Pingora. ## Where to start New here? Read [manual.md](manual.md) for install and CLI, then [configuration.md](configuration.md) for the schema. The [examples](../examples/) folder has runnable configs you can point the binary at right away. ## Documentation index ### Getting started - [manual.md](manual.md) - install, CLI, runtime, TLS, deployment patterns. - [getting-started-api-estate.md](getting-started-api-estate.md) - put SBproxy in front of existing APIs with auth, rate limits, and header rewrites. - [getting-started-content-estate.md](getting-started-content-estate.md) - HTML-to-markdown and content transformation for agents. - [getting-started-ai-estate.md](getting-started-ai-estate.md) - run SBproxy as the LLM gateway in front of model providers. - [getting-started-agent-identity.md](getting-started-agent-identity.md) - issue and enforce agent identity at the edge. - [getting-started-sovereign-multicloud.md](getting-started-sovereign-multicloud.md) - Kubernetes, sidecar, and secret-backend deployment. - [configuration.md](configuration.md) - every `sb.yml` field with examples. - [json-schema.md](json-schema.md) - JSON Schema for editor autocomplete + validation of `sb.yml`. - [mcp-schema-drift.md](mcp-schema-drift.md) - CI-friendly schema-drift detection for converted MCP servers (the `sbproxy-mcp-drift` CLI). - [features.md](features.md) - tour of every feature with copy-paste configs. - [troubleshooting.md](troubleshooting.md) - common failure modes and fixes. - [faq.md](faq.md) - quick answers to the questions operators hit most often. ### AI gateway - [ai-gateway.md](ai-gateway.md) - providers, routing strategies, guardrails, budgets, streaming. - [ai-lb-benchmark.md](ai-lb-benchmark.md) - P50/P95/P99/P99.9 latency comparison across AI router strategies under skewed load. - [providers.md](providers.md) - the catalog of supported LLM providers. - [scripting.md](scripting.md) - CEL, Lua, JavaScript, and WASM scripting reference. - [wasm-development.md](wasm-development.md) - writing WebAssembly modules for the `wasm` transform against the WASI preview-1 contract. - [mcp.md](mcp.md) - the MCP gateway: wire shape, capabilities, and `experimental.agentSkillsUrl` advertising. - [a2a-gateway.md](a2a-gateway.md) - the `a2a` action: typed AgentCard, capability discovery, and modality negotiation helpers. - [agent-skills.md](agent-skills.md) - Agent Skills v0.2.0 well-known projection: schema, integrity, archive safety, no-script-execution contract. - [cloudflare-code-mode.md](cloudflare-code-mode.md) - typed TypeScript module emission for Cloudflare Code Mode agents over the MCP federation registry. - [ai-crawl-control.md](ai-crawl-control.md) - the `ai_crawl_control` policy: Pay Per Crawl token challenge, ledger trait, OSS-advertises / enterprise-settles split. - [content-for-agents.md](content-for-agents.md) - operator guide to agent-aware content delivery: shape negotiation, body transforms, well-known license posture. - [rsl.md](rsl.md) - RSL 1.0 licensing cookbook: expressing license stance via YAML and the resulting `/licenses.xml` projection. - [web-bot-auth.md](web-bot-auth.md) - the `bot_auth` provider: verifying RFC 9421-signed AI crawlers against a published key directory. - [auth-oidc.md](auth-oidc.md) - the `oidc` auth provider: OpenID Connect Relying-Party login flow (authorization-code + PKCE, sealed session cookie, optional userinfo trust-header projection, RP-initiated logout). - [prompt-injection-v2.md](prompt-injection-v2.md) - the v2 guardrail: swappable detector returning score + label, with score-to-action mapping. ### Operations - [access-log.md](access-log.md) - structured JSON access log: filters, sampling, header capture, redaction. - [audit-log.md](audit-log.md) - tamper-evident audit log of admin actions. - [observability.md](observability.md) - metrics, logs, traces, and the bundled dashboards. - [clickhouse-attribution.md](clickhouse-attribution.md) - access-log schema, pre-aggregations, and sample attribution queries. - [migration-credentials.md](migration-credentials.md) - migrating the legacy `virtual_keys:` shape to the unified `credentials:` block. - [migration-mcp-rbac.md](migration-mcp-rbac.md) - upgrading MCP `ToolAccessPolicy` to the principal-aware ACL and the default-deny flip. - [secrets.md](secrets.md) - vault backend setup for HashiCorp Vault, AWS Secrets Manager, and Kubernetes Secrets. - [multi-tenant.md](multi-tenant.md) - when to use the multi-tenant shape, the three scopes, isolation guarantees, the synthetic `__default__` tenant. - [operator-runbook.md](operator-runbook.md) - dashboard triage and rollback actions. - [threat-model.md](threat-model.md) - OSS trust boundaries and per-wave review checklist. - [events.md](events.md) - the event bus, callback hooks, and emitted event types. - [openapi-emission.md](openapi-emission.md) - publishing an OpenAPI 3.0 document from the live config. - [policy.md](policy.md) - the policy engine: `semantic_constraint`, the NL linter L001-L009, and the OSS / enterprise capability boundary. - [object-authz.md](object-authz.md) - `object_authz` policy: BOLA + BFLA enforcement with tenant-isolation and enumeration detection. - [headless-detection.md](headless-detection.md) - header-only headless / stealth-browser indicator heuristics surfaced under `request.agent.headless_*`. - [content-digest.md](content-digest.md) - `content_digest` policy: RFC 9530 request-body verification for integrity-critical inboxes. - [agent-budget.md](agent-budget.md) - `agent_budget` policy: semantic rate-limit primitive keyed on resolved agent identity. - [performance.md](performance.md) - tuning guide, benchmark methodology, profiling. - [degradation.md](degradation.md) - failure modes and graceful degradation behavior. - [upgrade.md](upgrade.md) - migration notes between releases. - [quickstart-operator.md](quickstart-operator.md) - first 24 hours running the Kubernetes operator. - [kubernetes.md](kubernetes.md) - the OSS Kubernetes operator and its CRDs. - [sidecar-deployment.md](sidecar-deployment.md) - running sbproxy as a per-pod sidecar: traffic capture (iptables / eBPF), service-mesh integration (Istio, Linkerd), and the kustomize overlay under `deploy/k8s/sidecar/`. ### Reference - [402-challenge.md](402-challenge.md) - wire-format contract for the `402 Payment Required` body, including the OSS-advertises / enterprise-settles split. - [l402.md](l402.md) - L402 (Lightning HTTP 402) macaroon bearer credential surface: issuer, verifier, attenuation, payment-hash binding. - [outbound-peer-pricing.md](outbound-peer-pricing.md) - the `peer_pricing_preflight` policy: parse a peer's `llms.txt`, gate egress on budget, return a structured 402 to the agent on overflow. - [admin-api-reference.md](admin-api-reference.md) - per-route schema for the embedded admin server (`/api/*`, `/admin/*`, and the unauthenticated probe routes). - [config-stability.md](config-stability.md) - field stability guarantees and versioning. - [listings.md](listings.md) - the repo-native `Listing` primitive: schema, loader, three pinning modes, plan-validation rules. - [bulk-redirects.md](bulk-redirects.md) - the `redirect` action's source-to-destination row list, compiled at load time into an O(1) path lookup. - [cache-reserve.md](cache-reserve.md) - long-tail cold tier under the response cache: backends (memory, filesystem, Redis) and admission sampling. - [exposed-credentials.md](exposed-credentials.md) - the `exposed_credentials` policy: detect known-leaked basic-auth passwords and tag or block. - [feature-flags.md](feature-flags.md) - the sticky-bucketing flag store plus the `flag_enabled(name, key)` CEL helper. - [routing-strategies.md](routing-strategies.md) - the `RoutingStrategy` trait: opt-in extension point for custom upstream selection inside `load_balancer`. - [openapi-validation.md](openapi-validation.md) - the `openapi_validation` policy: validating request bodies against an OpenAPI 3.0 document at startup. - [enterprise.md](enterprise.md) - what the enterprise tier adds on top of the OSS data plane and how to request access. - [glossary.md](glossary.md) - vocabulary used in this documentation set. - [headers-reference.md](headers-reference.md) - every response header the proxy can emit, with the config that triggers it. - [metrics-stability.md](metrics-stability.md) - Prometheus metric naming and stability. - [model-pinning.md](model-pinning.md) - how SHA-256 hashes get computed and pinned for the classifier known-model registry. - [adr-ai-hub-format.md](adr-ai-hub-format.md) - hub `ChatFormat` trait and the canonical `ChatRequest` / `ChatResponse` shape that backs `/v1/chat/completions`, `/v1/messages`, and `/v1/responses`. - [adr-outbound-credential-resolver.md](adr-outbound-credential-resolver.md) - the OSS vs enterprise line for the outbound credential resolver (RFC 8693 exchange, client-credentials, and vault resolution in OSS). - [comparison.md](comparison.md) - how SBproxy compares to other proxies and AI gateways. ### Contributing - [architecture.md](architecture.md) - internals: pipeline, hot reload, plugin system. - [build.md](build.md) - building from source, supported platforms, optional features. - [CONTRIBUTING.md](../CONTRIBUTING.md) - how to set up a dev environment and submit changes. ### AI-discoverable corpora - [llms.txt](llms.txt) - flat capability catalog (one line per shipped feature), per the [llmstxt.org](https://llmstxt.org/) convention. The small index AI tools fetch first. - [llms-full.txt](llms-full.txt) - the entire docs corpus (this directory + the top-level `README.md`, `MIGRATION.md`, `CHANGELOG.md`) flattened into one file so AI tools that want the full set get it in one HTTP request. Generated; do not hand-edit. Regenerate with `scripts/regen-llms-full.sh` after any docs change. Mirrored live at . ## Quick start ```bash ## Build make build-release ## Run with a config make run CONFIG=examples/basic-proxy/sb.yml ``` Minimal `sb.yml`: ```yaml proxy: http_bind_port: 8080 origins: "api.example.com": action: type: proxy url: http://backend:3000 ``` ## What's in the box - Reverse proxy: HTTP/1.1, HTTP/2, WebSocket, gRPC, connection pooling, hot reload. - AI gateway: 200+ LLM models, 15 routing strategies, OpenAI-compatible API, guardrails, budgets, virtual keys, MCP server. - Authentication: API key, basic, bearer, JWT, digest, forward auth, noop. - Policies: rate limiting, IP filter, CEL expressions, WAF, DDoS, CSRF, security headers. - Transforms: 18 request and response transforms (JSON, HTML, Markdown, CSS, Lua, JavaScript, encoding, and more). - Scripting: CEL via cel-rust, Lua via mlua/Luau, JavaScript via QuickJS, WebAssembly via wasmtime. - Caching: response cache with pluggable backends (memory, file, Redis). - Load balancing: 7 algorithms with sticky sessions and health checks. - Observability: Prometheus metrics, structured logging, typed event bus, OpenTelemetry tracing. - Hot reload: config changes apply with no dropped connections. ================================================================ # docs/402-challenge.md ================================================================ ## 402 Challenge contract *Last modified: 2026-05-25* The wire format the proxy uses when it returns `402 Payment Required` to an AI crawler. This document is the canonical reference for the challenge body shape and for the line that splits OSS-advertises from enterprise-settles. The behavioural policy that emits these bodies is `ai_crawl_control`; see [`ai-crawl-control.md`](ai-crawl-control.md) for configuration, agent classes, ledger, and tiered pricing. ## Two challenge shapes The OSS proxy emits one of two 402 shapes, picked per request: 1. **Single-rail (default).** Returned to legacy crawlers and to any request that has not opted in to multi-rail negotiation. Carries the `Crawler-Payment` response header and a flat JSON body with the price and currency. This is the long-standing Pay Per Crawl shape. 2. **Multi-rail (opt-in).** Returned when the agent opts in via either the `Accept-Payment` request header (a q-value list of rail names) or one of the multi-rail `Accept` MIME types (`application/sbproxy-multi-rail+json`, `application/x402+json`, `application/mpp+json`). Carries `Content-Type: application/sbproxy-multi-rail+json` and a JSON body that lists one entry per advertised rail, each with its own per-rail quote-token JWS. The multi-rail body is the negotiation contract. It is fully defined in OSS so the same proxy binary can advertise rails whether or not the operator is running an enterprise build that can settle them. ## OSS advertises, enterprise settles The split between what OSS does and what the enterprise build does is deliberate, and matches the framing the rail-Lightning example PR uses (see `examples/rail-lightning/README.md`). What the OSS proxy does today: - Parses the `Accept-Payment` header (RFC-style q-values) and the multi-rail `Accept` MIME types. - Filters the agent's preference set against the operator's per-tier `rails:` override and the top-level `rails:` block. - Emits the multi-rail 402 body with one entry per surviving rail, each carrying its own quote-token JWS (separate nonce per rail). - Responds 406 `no_acceptable_rail` when the preference set has no overlap with the offered rails, listing the operator's offered set on the response. - Falls back to the single-rail format for legacy crawlers that did not opt in. - Honours the in-memory ledger (`valid_tokens:`) and the HTTPS-only HTTP ledger client for accept-payment redemption. What the OSS proxy cannot do today: - Settle a real-money payment on a stablecoin or fiat rail. - Verify an x402 redemption token against a facilitator. - Capture a Stripe `payment_intent`. - Open or close a Lightning invoice. Settlement on those rails requires the enterprise build, gated behind cargo features: | Feature | Settles | |----------------------|------------------------------------------------| | `stripe` | Stripe fiat (cards, ACH). | | `x402` | x402 v2 stablecoin-on-chain via a facilitator. | | `mpp` | Stripe Multi-Party Payments. | | `lightning-cln` | Core Lightning node. | | `lightning-lnd` | LND node. | | `lightning-phoenixd` | Phoenix self-custodial daemon. | Each enterprise feature registers a `BillingRail` impl into the OSS plugin trait registry under the canonical rail name the OSS schema already understands (`x402`, `mpp`, `lightning`). The OSS YAML schema in `sb.yml` does not change across enterprise backends; only the settlement code does. That is the property this contract pins: operators write the same `sb.yml` whether they run OSS or an enterprise build. ## Single-rail body The default 402 body for legacy crawlers. Returned with the `Crawler-Payment` response header and `Content-Type: application/json`. ```json { "error": "payment_required", "price": "0.001", "currency": "USD", "target": "blog.example.com/article", "header": "crawler-payment" } ``` The `header` field tells the crawler which header name to set on its retry. The default is `crawler-payment`; operators override it via the policy's `header:` config field. ## Multi-rail body Emitted when the agent opted in. `Content-Type: application/sbproxy-multi-rail+json`. ```json { "rails": [ { "kind": "x402", "version": "2", "chain": "base", "facilitator": "https://facilitator-base.x402.org", "asset": "USDC", "amount_micros": 1000, "currency": "USD", "pay_to": "0x0000000000000000000000000000000000000000", "expires_at": "2026-05-08T12:34:56Z", "quote_token": "eyJhbGc..." }, { "kind": "mpp", "version": "1", "amount_micros": 1000, "currency": "USD", "expires_at": "2026-05-08T12:34:56Z", "quote_token": "eyJhbGc..." } ], "agent_choice_method": "header_negotiation", "policy": "first_match_wins" } ``` Notes: - `rails[].kind` is a closed enum: `x402`, `mpp`, `lightning`. Adding a rail follows the closed-enum amendment rule in [`adr-fast-track-amendment.md`](adr-fast-track-amendment.md). - `rails[].quote_token` is a JWS. One nonce per rail per response, so the agent cannot replay a quote across rails. JWKS publication and token replay are covered by the `examples/quote-token-replay-jwks/` example. - `rails[]` order is the operator's declared preference. Agents break ties on this order after q-value sorting their own preference set. - Lightning entries appear in the body only when an enterprise `lightning-*` feature has registered a `BillingRail` named `lightning` into the trait registry. With the OSS-default build, a per-tier `rails: [lightning, x402]` declaration parses cleanly (the `Rail::Lightning` enum variant ships in OSS) and the proxy still negotiates against the `lightning` token on the wire; the body just carries the next surviving rail (here `x402`). ## Cloudflare Pay Per Crawl interop Set `cloudflare_compat: true` on the `ai_crawl_control` policy to speak Cloudflare's exact Pay Per Crawl wire contract. A crawler that already transacts with a Cloudflare origin works against an SBproxy origin unchanged, and the differentiator is that SBproxy settles on the operator's own rails with no Merchant-of-Record cut. In this mode the negotiation uses Cloudflare's header set instead of the single-rail JSON body: - The 402 response carries `crawler-price: `, for example `crawler-price: USD 0.01`. A JSON body mirrors the price for clients that read the body instead of the header. - The crawler retries with `crawler-exact-price` (commit to a precise amount) or `crawler-max-price` (a cap), plus its payment token on the configured header (`crawler-payment` by default). The token settles through the same self-hosted ledger the single-rail path uses. - A `crawler-max-price` below the quote, or a `crawler-exact-price` that does not equal the quote, re-quotes with a fresh 402 and does not spend the token. - A settled request is served with `crawler-charged: ` so the crawler learns exactly what it paid. ```yaml policies: - type: ai_crawl_control price: 0.01 currency: USD cloudflare_compat: true free_paths: - "/feed/*" valid_tokens: - ppc-token-1 ``` ### Always-free paths These well-known operational endpoints are never charged, so a crawler can always discover the site's policy without paying to read it: - `/robots.txt` - `/sitemap.xml` - `/security.txt` - `/.well-known/security.txt` - `/crawlers.json` The per-policy `free_paths:` list extends this built-in allowlist (Cloudflare's Configuration-Rules equivalent). A trailing `*` is a prefix match (`/feed/*`); otherwise the entry matches exactly. The built-in allowlist always applies, so an operator cannot accidentally start charging for `robots.txt`. ### Binding the price headers to a Web Bot Auth signature The crawler's pre-authorization headers (`crawler-max-price` and `crawler-exact-price`) are inbound request headers, so an operator who also runs the `bot_auth` verifier can require them to be signed components by listing the header name in that agent's `required_components`. A retry whose Web Bot Auth signature does not cover the listed price header is then rejected before the ledger is consulted. Binding the proxy's outbound price headers (`crawler-price`, `crawler-charged`) into a signature the crawler can verify is a separate piece of work: it needs the outbound response-signing path, which is not part of this contract yet. ### Pluggable pricing model Pricing can be flat (`price:`) or per-path (`tiers:`). For a learned model (an LM-Tree-style pricing model is the motivating example), an embedder injects a `PricingModel` implementation through `AiCrawlControlPolicy::with_pricing_model`. The model is consulted before the static tier table; returning a price overrides the static resolution for that request, and returning nothing defers to the tier table and the flat-price fallback. The OSS build ships only the seam, not a model. ## 406 fallback When the agent's `Accept-Payment` preference set has no overlap with the operator's offered rails, the proxy returns `406 Not Acceptable` with `Content-Type: application/json`: ```json { "error": "no_acceptable_rail", "supported_rails": ["x402", "mpp"], "target": "blog.example.com/article" } ``` `supported_rails` reflects the operator's declared offered set on the matched tier (the per-tier `rails:` override, or the route default if no override is set), not the runtime-emittable subset. The agent retries with one of the listed rails on its `Accept-Payment` header. ## Opt-in signals Per A3.1, any of the following signals on the request opts the agent in to the multi-rail body: - `Accept-Payment` request header carries a q-value list of rail names. Example: `Accept-Payment: lightning;q=1.0, x402;q=0.5`. - `Accept` request header includes `application/sbproxy-multi-rail+json`, `application/x402+json`, or `application/mpp+json`. The latter two are narrowly opt-in: an agent that sends `Accept: application/x402+json` is asking specifically for the x402 entry, not for the full multi-rail body. Without any opt-in signal, the proxy emits the single-rail body so legacy crawlers keep working unchanged. ## Quote-token JWS Each rail entry in the multi-rail body carries its own `quote_token`, signed by the proxy under a key whose JWKS the operator publishes at `/.well-known/sbproxy-quote-jwks`. The token binds the rail kind, the amount, the route, and a per-rail nonce so the agent cannot replay a quote across rails or reuse it after expiry. The `accept_payment` policy verifies the JWS on the agent's retry before consulting the ledger. A token whose claims do not match the retry context (different rail, different route, expired) is rejected without a ledger round-trip. The token schema is OSS. The settlement that the token underwrites is enterprise. ## Related - [`ai-crawl-control.md`](ai-crawl-control.md) - policy configuration, agent classes, ledger, tiered pricing. - [`enterprise.md`](enterprise.md) - the OSS / enterprise split, including the rail settlement features. - `examples/rail-x402-base-sepolia/` - x402 rail with a hermetic mock facilitator. - `examples/rail-mpp-stripe-test/` - MPP rail with Stripe test mode and a wiremock fallback. - `examples/multi-rail-accept-payment/` - x402 + MPP wired together with q-value negotiation. - `examples/rail-lightning/` - Lightning rail negotiation contract (settlement is enterprise-only). - `examples/quote-token-replay-jwks/` - JWKS endpoint and single-use quote-token enforcement. ================================================================ # docs/a2a-gateway.md ================================================================ ## A2A gateway *Last modified: 2026-05-31* The `a2a` action proxies agent-to-agent requests to an upstream A2A endpoint and surfaces the agent's typed AgentCard for capability discovery and modality negotiation. Pairs with MCP federation (one gateway, two protocols) and the AP2 / ACP / RAR payment surfaces. ## Wire shape The A2A protocol is JSON-RPC over HTTP. Clients call `POST //tasks/sendSubscribe` (or the streaming variant) with a JSON-RPC envelope; the agent responds with a `Task` document. The gateway sits in front of one or more agent endpoints and is responsible for two things the bare proxy cannot do on its own: telling a calling agent what each upstream advertises, and gating the call when the caller and the agent disagree on modality. ## AgentCard ```yaml origins: "agent.example.com": action: type: a2a url: http://backend:9000/a2a agent_card: name: "Reservation assistant" description: "Books and modifies restaurant reservations." version: "0.3.0" url: "https://agent.example.com/" capabilities: streaming: true pushNotifications: false stateTransitionHistory: false defaultInputModes: - "application/json" - "text/plain" defaultOutputModes: - "application/json" skills: - id: "find_table" description: "Find a free table by time + party size" ``` The whole card round-trips through the gateway: SBproxy types only the fields it consumes (`capabilities`, `defaultInputModes`, `defaultOutputModes`, `name`, `description`, `version`, `url`, `skills`). Anything else the operator pastes (the A2A spec's optional `provider`, `authentication`, `supportsAuthenticatedExtendedCard`, etc.) lives on `extensions` and serialises back verbatim. ## Capability discovery The gateway can serve the card itself at `/.well-known/agent.json` so an A2A client can probe SBproxy and get back the agent it would route to. The handler emission is configured by the operator on the action; absent it, the well-known path falls through to the upstream so a real agent that already serves its own card keeps doing so. `capabilities.streaming` and `capabilities.pushNotifications` are surfaced under CEL so policies can branch on what the agent advertises before forwarding. A typical use is gating an A2A request that requests streaming when the agent does not advertise it; the policy rejects with a 400 before the upstream is contacted. ## Modality negotiation SBproxy ships pure-function helpers `AgentCard::negotiate_input` and `AgentCard::negotiate_output` that pair the caller's `Content-Type` and `Accept` against the agent's advertised `defaultInputModes` and `defaultOutputModes`. Each call returns one of four typed outcomes: | Outcome | When | Effect on the upstream call | |---|---|---| | `Matched(mode)` | the caller's preference overlaps with the agent's advertised modes | proceed with `mode` | | `NoCallerPreference(mode)` | the caller omitted `Content-Type` / `Accept` | proceed; gateway echoes `mode` | | `AgentUndeclared(mode)` | the agent's mode list is empty (no restriction) | proceed with the caller's preference | | `Mismatch { requested, advertised }` | no overlap | gateway returns 406 with both lists in the error body | The negotiator is case-insensitive on the MIME `type/subtype` head and strips `;`-parameters before comparing, so `application/json; charset=utf-8` matches `application/json`. The output side honours `*/*` by collapsing to the agent's first declared output mode. ## See also - The A2A x402 payment bridge. - The agentgateway / Bifrost / SBproxy capability benchmark. - `crates/sbproxy-modules/src/action/a2a.rs` - the proxy action itself. - `crates/sbproxy-modules/src/action/a2a_card.rs` - typed AgentCard + negotiator. ================================================================ # docs/access-log.md ================================================================ ## Access log *Last modified: 2026-05-04* Structured-JSON access logs give every completed request a single line on stdout, ready to ship to ELK, Loki, Datadog, or any pipeline that already speaks JSON. The proxy emits the line via the `access_log` tracing target so log routers can split access logs from application logs without additional plumbing. ## Default behaviour Off. SBproxy emits no access-log lines unless the top-level `access_log` block is present and `enabled: true`. Metrics, traces, and the audit log are unaffected by this knob. ## Enabling Add the block to `sb.yml`: ```yaml access_log: enabled: true origins: api.example.com: action: type: proxy url: http://localhost:3000 ``` A request to `api.example.com` now produces a line such as: ```json {"timestamp":"2026-04-27T12:00:03.521Z","request_id":"7f7c","origin":"api.example.com","method":"GET","path":"/health","status":200,"latency_ms":24.7,"auth_ms":1.2,"upstream_ttfb_ms":18.9,"response_filter_ms":4.1,"bytes_in":0,"bytes_out":1024,"client_ip":"203.0.113.10"} ``` The three `*_ms` phase fields (`auth_ms`, `upstream_ttfb_ms`, `response_filter_ms`) split `latency_ms` into the parts of the pipeline that contributed to it. They are emitted whenever the matching phase ran on the request; an origin with no auth provider omits `auth_ms`, an early WAF block omits `upstream_ttfb_ms` and `response_filter_ms`, a cache hit served from the proxy omits both upstream fields. The same observations also feed the `sbproxy_phase_duration_seconds` Prometheus histogram (see [metrics-stability.md](./metrics-stability.md)) so the aggregate view does not require log scraping. Optional fields (`provider`, `model`, `tokens_in`, `tokens_out`, `cache_result`, `trace_id`, `request_headers`, `response_headers`, `upstream_host`) are omitted when not applicable, keeping non-AI lines compact. ## Filters `status_codes` and `methods` narrow the set of requests that get logged: ```yaml access_log: enabled: true status_codes: [500, 502, 503, 504] methods: ["POST", "PUT", "PATCH", "DELETE"] ``` Empty or omitted lists match every value. Method comparison is case-insensitive. ## Sampling `sample_rate` is a probability in `[0.0, 1.0]` applied after the status/method filters: ```yaml access_log: enabled: true sample_rate: 0.05 # log 5% of matching requests ``` `1.0` (the default) logs every match. `0.0` is equivalent to disabling emission entirely. ### Forced emission Two knobs bypass `sample_rate` after the status/method filters match: ```yaml access_log: enabled: true sample_rate: 0.05 slow_request_threshold_ms: 1000 always_log_errors: true ``` `slow_request_threshold_ms` logs every matching request whose end-to-end latency is at or above the threshold. `always_log_errors: true` logs every matching `5xx` response. Both knobs are off by default, preserving the sampler-only behavior for existing configs. ## Header capture Opt in by listing header names in `access_log.capture_headers.request` and / or `access_log.capture_headers.response`. Captured values land in the `request_headers` and `response_headers` fields of the emitted entry. ```yaml access_log: enabled: true capture_headers: request: ["user-agent", "x-request-id", "x-ratelimit-*"] response: ["x-sbproxy-cache", "content-length"] max_value_bytes: 1024 redact_pii: false ``` Three pattern shapes are accepted: * Exact name: `"user-agent"`, `"x-cache"`. * `"*"`: capture every header (subject to the sensitive-header denylist below). * Trailing glob: `"x-ratelimit-*"` captures every header whose name starts with the prefix before the `*`. Only one trailing `*` is supported; embedded wildcards are treated as literal. Header names are matched case-insensitively. Captured values are truncated to `max_value_bytes` (default 1024) with a trailing `"..."` that counts toward the cap. A hardcoded denylist of sensitive headers (`authorization`, `cookie`, `set-cookie`, `proxy-authorization`, `x-api-key`) is excluded from `*` and glob matches. To capture one of these, list it by exact name; the proxy logs a `WARN` at config load so the choice is visible. When `redact_pii: true`, the `sbproxy-security` PII redactor runs over captured header values. `redact_pii_rules` (empty by default) optionally restricts the rule set; accepted names are `email`, `us_ssn`, `credit_card`, `phone_us`, `ipv4`, `openai_key`, `anthropic_key`, `aws_access`, `github_token`. ## Record shape | Field | Type | Notes | |-------|------|-------| | `timestamp` | string | RFC 3339 (UTC) of when the response was sent. | | `request_id` | string | Unique per request. Reuses the propagated `X-Request-Id` when set; otherwise a fresh UUIDv4. | | `origin` | string | Hostname routing matched. | | `method` | string | HTTP method. | | `path` | string | Request path, no query string. | | `status` | int | HTTP response status code. | | `latency_ms` | float | Wall-clock end-to-end latency in milliseconds. | | `auth_ms` | float? | Time spent in the auth check (provider dispatch, JWT verify, forward-auth subrequest, OIDC cookie open). Absent when the origin has no auth provider. | | `upstream_ttfb_ms` | float? | Time from request start to the first byte of the upstream response header. Absent when the request never reached an upstream (early auth/policy short-circuit, cache hit). | | `response_filter_ms` | float? | Time spent running response transforms between first upstream byte and end of `response_filter`. Absent when no response_filter ran. | | `query` | string? | Request query string without the leading `?`. Captured separately from `path` so per-route aggregations on `path` are not split by every distinct query. Absent when no query was supplied. | | `protocol` | string? | HTTP version on the wire (`HTTP/1.1`, `HTTP/2.0`, `HTTP/3.0`). | | `scheme` | string? | Scheme the client used to reach the proxy (`http` or `https`). Distinct from `upstream_host`'s scheme. | | `host` | string? | Client-supplied `Host` header. May differ from `origin` (the matched virtual-host pattern, which can be a wildcard) and from `upstream_host` (where the proxy forwarded to). | | `user_agent` | string? | Client `User-Agent` header. Pulled out as a primary field because nearly every analytics consumer wants it; the header allowlist still works as a redundant capture path. | | `referer` | string? | Client `Referer` header (the canonical RFC 7231 misspelling). | | `upstream_status` | int? | Upstream's response status code, when it differs from `status`. Populated when a retry chain, fallback, or `response_modifier` rewrote the status the client sees; absent when the proxy passed the upstream status through unchanged. | | `response_content_type` | string? | Response `Content-Type` as sent to the client. | | `response_content_encoding` | string? | Response `Content-Encoding` (`gzip`, `br`, `zstd`, ...) when the body was compressed; absent when uncompressed. | | `bytes_in` | int | Inbound request body bytes (post header-decode). | | `bytes_out` | int | Bytes written to the client. | | `client_ip` | string | Post-trust-boundary client IP. | | `provider` | string? | AI provider when an AI gateway route handled the request. | | `model` | string? | Selected AI model identifier. | | `tokens_in` | int? | Prompt tokens, when known. | | `tokens_out` | int? | Completion tokens, when known. | | `trace_id` | string? | W3C trace id when distributed tracing is active, for span correlation. | | `cache_result` | string? | One of `hit`, `miss`, `stale`, `bypass` for cached responses. | | `upstream_host` | string? | Upstream host the proxy contacted; absent on short-circuited requests (auth deny, WAF block, cache hit). | | `request_headers` | object? | Captured request headers, lowercased keys. Absent when no allowlist or no matches. | | `response_headers` | object? | Captured response headers, same shape as `request_headers`. | | `attribution` | object? | Resolved business attribution tags (project, feature, okr, team, customer, environment, agent_type, risk_tier, trace_id) merged from the credential `attrs:` and `SB-Attr-*` headers. Same tag set the per-attribution spend metric is labeled by. Absent when none resolved. | | `custom` | object? | Operator-defined custom fields from `observability.log.custom_fields:`. See below. Absent when none configured or none resolved. | Optional fields are omitted from the JSON object when their value is `None`. ## Custom fields `observability.log.custom_fields:` adds operator-defined keys to each line's `custom` object, so you can pivot logs on dimensions the built-in schema does not carry (region, deployment, a derived tier, a routing decision) without forking the binary. Each field's value is computed per request from either a static string with `${...}` variable interpolation or a script. ```yaml proxy: observability: log: custom_fields: - name: region # static value + interpolation value: "${env.REGION}" - name: caller_tier # CEL expression engine: cel source: 'has(request.headers["x-tier"]) ? request.headers["x-tier"] : "standard"' - name: route_class # Lua script (returns the value) engine: lua source: 'return string.find(ctx.request.method, "GET") and "read" or "write"' - name: upper_method # JS script engine: js source: "ctx.request.method.toUpperCase()" ``` Rules: - Each field sets exactly one of `value` or (`source` + `engine`). Both, or neither, is a config error. - `engine` is one of `cel`, `lua`, `js`. WASM is not supported for log fields because it is a compiled module, not inline source. - Static `value` interpolation variables: `${env.NAME}`, `${tenant_id}`, `${method}`, `${path}`, `${host}`, `${status}`, `${provider}`, `${model}`, `${request.header.NAME}`, `${attribution.KEY}`. An unresolved variable becomes the empty string. - CEL expressions see the context keys as top-level variables (`request`, `response`, `tenant_id`, `provider`, `model`, `attribution`). Lua and JS scripts see the whole context as a `ctx` global and `return` (Lua) / evaluate to (JS) the value to log. - A field whose script errors, or that resolves to the empty string, is omitted from the line rather than failing the request. - Custom values pass through the same redaction as every other field. ### Scopes `custom_fields:` can be declared at three scopes: `proxy.observability.log`, `tenants[].observability.log`, and `origins..observability.log`. They compose per request as **proxy then tenant then origin**: the tenant set is resolved from the request's `tenant_id`, the origin set from the matched origin, and a more-specific scope's field overrides a less-specific field of the same `name` (the broader definition is not evaluated at all for that name). Fields with distinct names from every scope are unioned. This is the same composition order redaction uses (see the sink-scope and tenant/origin redaction sections in the observability guide). A worked example covering all three scopes is in `examples/custom-log-fields/`. ## Redaction Every line is passed through the same secret redactor that protects metric labels and audit events. Bearer tokens, API keys with recognisable prefixes (`sk-`, `pk-`, `ghp_`, ...), and JWT-shaped strings are replaced with `[REDACTED]` before the line reaches stdout. Apply additional masking at your log shipper if your origin embeds custom secrets in URLs or other places the line carries verbatim. The PII redactor described under [Header capture](#header-capture) runs before secret redaction, but only over captured header values. Other fields (`path`, `request_id`, `client_ip`) are not PII-redacted. ## Routing the lines Every line carries `target = "access_log"` in tracing metadata. Common patterns: * Filter via `RUST_LOG=info,access_log=info,sbproxy=warn` to keep operator logs quiet while keeping access logs. * Use the JSON log subscriber (default in `sbproxy-observe`) and let your collector tag by `target`. * Pipe stdout through `vector` or `fluent-bit` to split on `target`. ### File output To write access logs directly to disk instead of the tracing target: ```yaml access_log: enabled: true output: type: file path: /var/log/sbproxy/access.log max_size_mb: 100 max_backups: 7 compress: true ``` When the active file reaches `max_size_mb`, SBproxy rotates it before writing the next line. Rotated files use suffixes like `access.log.1` or `access.log.1.gz`; `max_backups` caps how many rotated files are retained. `compress: true` gzips rotated files. Omitting `output` keeps the default behavior: emit JSON through the `access_log` tracing target. ================================================================ # docs/admin-api-reference.md ================================================================ ## Admin API reference *Last modified: 2026-06-06* The embedded admin server publishes a small set of HTTP routes for operator tooling: liveness probes, request log, per-target health, hot reload, drift detection, and the emitted OpenAPI document. This page is the per-route reference. For the operator workflow (enabling the server, picking a port, IP allowlisting), see [manual.md section 9 - Hot reload](manual.md#9-hot-reload) and [manual.md section 5 - Metrics and observability](manual.md#5-metrics-and-observability). ## Enabling the admin server ```yaml proxy: admin: enabled: true port: 9090 username: admin password: !env ADMIN_PASSWORD max_log_entries: 1000 ``` When `enabled: false` (the default) the admin listener does not bind and every route below is unreachable. The server binds on `127.0.0.1:` so the admin surface is loopback-only by default; expose it via a reverse proxy or sidecar with an IP allowlist when an operator console needs remote access. ## Authentication Routes split into two tiers: - **Unauthenticated probe routes** are reachable without credentials so load balancers and orchestrators can probe liveness without configuring secrets: `/healthz`, `/health`, `/readyz`, `/ready`, `/livez`, `/live`, `/.well-known/sbproxy/quote-keys.json`. - **Authenticated routes** require HTTP Basic auth using the `username` and `password` from the config block. Every route under `/api/*` and `/admin/*` is in this tier. Send credentials with `curl -u admin:secret ` or an `Authorization: Basic ` header. ## Rate limiting The admin server enforces an in-process rate limit with both per-IP and global caps. The per-IP cap is 60 requests / minute by default; the global cap is 10x that (600 / minute). A request that exceeds either cap returns `429` and is not counted against future windows. The per-IP tracking map is capped at 10000 entries to prevent unique-IP floods from growing memory. ## Error envelope All authenticated routes return JSON errors as: ```json {"error":""} ``` Status codes follow conventional HTTP: `401` for missing or invalid credentials, `405` for wrong method on a method-gated route, `409` when a hot reload is already in flight, `429` when rate-limited, `5xx` for server-side failures. --- ## Probe routes (unauthenticated) ### `GET /healthz` Kubernetes-style liveness probe. Returns `200` with body `{"status":"ok"}` whenever the process is up. Does **not** consult the live config or any dependency; treat it as "the process is running and the listener accepted my connection". ### `GET /health` Component-aware liveness with version and git SHA. Returns `200` with a JSON document that includes the proxy version, build commit, and a per-component status table: ```json { "status": "ok", "version": "1.1.0", "commit": "abc1234", "components": [ {"name": "config", "status": "ok"}, {"name": "cache_store", "status": "ok"} ] } ``` A component reporting `"status": "degraded"` returns the same `200` because the proxy still serves traffic on degraded components. Components in `"status": "failed"` flip the top-level status. ### `GET /readyz`, `GET /ready` Kubernetes-style readiness probe. Returns `200` once all required components are ready to serve traffic, `503` while any required component is still initialising or has failed. K8s polls this to gate traffic shifting during rolling restarts. ### `GET /livez`, `GET /live` Bare liveness probe. Like `/healthz` but with a different name for load balancers that hardcode this path. ### `GET /.well-known/sbproxy/quote-keys.json` JWKS document publishing every Ed25519 public key the live config uses to sign Wave 3 quote tokens (the `402 Payment Required` flow's agent-verifiable payment quotes). External verifiers (ledger clients, agent SDKs) fetch this to verify a quote without contacting the issuer. Response: ```json { "keys": [ { "kty": "OKP", "crv": "Ed25519", "kid": "", "x": "" } ] } ``` Served unauthenticated because the keys themselves are public. The document aggregates keys across every `ai_crawl_control` policy so a multi-tenant deployment publishes one document for all of its issuers. --- ## Read routes (authenticated) ### `GET /api/requests` Returns the most recent request log entries, newest first. The ring buffer size is `proxy.admin.max_log_entries` (default `1000`). Response body: an array of `RequestLogEntry`: ```json [ { "timestamp": "2026-05-12T10:15:32.456Z", "origin": "api.example.com", "method": "GET", "path": "/v1/orders?limit=10", "status": 200, "latency_ms": 42.7, "client_ip": "10.0.0.5" } ] ``` | Field | Type | Description | |---|---|---| | `timestamp` | string | RFC 3339 timestamp when the request finished. | | `origin` | string | Configured origin hostname that handled the request. | | `method` | string | HTTP method. | | `path` | string | Request path including query string. | | `status` | int | Response status code. | | `latency_ms` | float | End-to-end latency in milliseconds. | | `client_ip` | string | Client IP as observed by the proxy. | This is an in-memory ring buffer; entries are lost when the process exits. For durable request logs, enable the structured access log (see [access-log.md](access-log.md)). ### `GET /api/health` Aggregate liveness summary. Returns `200` with: ```json {"status":"ok","origins":[]} ``` The `origins` array is currently a placeholder; per-origin health detail lives at `/api/health/targets` below. ### `GET /api/health/targets` Per-target health for every origin whose action is a `load_balancer`. Walks the live pipeline and reports the exact state that `select_target` consults: active health probe result, outlier detector eject state, and circuit breaker state. Use this to confirm that an upstream operators believe is healthy actually is, or to diagnose why a load balancer is short on candidates. ```json { "config_revision": "abc123...", "origins": [ { "hostname": "api.example.com", "origin_id": "api", "targets": [ { "index": 0, "url": "https://upstream-1.internal:8443", "eligible": true, "healthy": true, "outlier_ejected": false, "circuit_breaker_state": "closed", "weight": 10, "backup": false, "group": null, "zone": "us-west-1a" } ] } ] } ``` | Field | Type | Description | |---|---|---| | `config_revision` | string | Current pipeline revision; matches the `x-sbproxy-debug-config-rev` header when debug mode is on. | | `origins[].hostname` | string | Origin hostname. | | `origins[].origin_id` | string | Stable identifier for this origin within its workspace. | | `origins[].targets[].index` | int | Position in the configured target list. | | `origins[].targets[].url` | string | Upstream URL. | | `origins[].targets[].eligible` | bool | True when `healthy && !outlier_ejected && circuit_breaker_state != "open"`; matches what `select_target` honours. | | `origins[].targets[].healthy` | bool | Latest active-health-check verdict. | | `origins[].targets[].outlier_ejected` | bool | True when the outlier detector has temporarily ejected this target. | | `origins[].targets[].circuit_breaker_state` | string \| null | `"closed"`, `"open"`, `"half_open"`, or null when the breaker is unconfigured. | | `origins[].targets[].weight` | int | Authored weight. | | `origins[].targets[].backup` | bool | True when this is a backup target. | | `origins[].targets[].group` | string \| null | Authored group tag, if any. | | `origins[].targets[].zone` | string \| null | Authored zone tag, if any. | Origins whose action is not `load_balancer` (e.g. `proxy`, `ai_proxy`, `static`, `redirect`) are omitted from `origins`. ### `GET /api/stats` Basic counters summary. ```json {"request_log_entries": 42} ``` This is a placeholder; the authoritative metrics surface is the Prometheus `/metrics` endpoint exposed on the health port (see [metrics-stability.md](metrics-stability.md)). ### `GET /api/openapi.json`, `GET /api/openapi.yaml` The live pipeline's emitted OpenAPI 3.0 document. The proxy renders the document once per pipeline revision and caches both JSON and YAML renderings; the cache invalidates on hot reload. The shape and the per-origin mapping are documented in [openapi-emission.md](openapi-emission.md). The `.json` route returns `Content-Type: application/json`; the `.yaml` route returns `Content-Type: application/yaml`. --- ## Control routes (authenticated) ### `POST /admin/reload` Re-reads `proxy.admin.config_path` from disk, recompiles the pipeline, and hot-swaps the in-memory pipeline. The route uses the same single-flight guard as the file watcher, so a manual reload during a file-watcher reload returns `409`. `GET /admin/reload` returns `405`; the route is gated on POST. Success response (`200`): ```json { "config_revision": "abc123...", "loaded_at": "2026-05-12T10:15:32.456Z" } ``` | Status | When | |---|---| | `200` | Reload succeeded; pipeline swapped. | | `400` | YAML parse failed. Error body carries the parse error with the config path scrubbed. | | `405` | Method other than POST. | | `409` | Another reload is already in flight. | | `500` | Could not read the config file (permissions, ENOENT), or pipeline compile failed. | | `503` | The admin server has no `config_path` wired (in-memory / test mode). | See [manual.md section 9](manual.md#9-hot-reload) for the full operator workflow including curl examples and the Kubernetes operator integration. ### `GET /admin/drift` Compares the on-disk config file at `proxy.admin.config_path` against the content hash captured the last time the proxy loaded a config (startup, file-watcher reload, or `POST /admin/reload`). Use this to detect when the running proxy has diverged from the declared config without triggering a reload. ```json { "config_path": "/etc/sbproxy/sb.yml", "loaded_revision": "abc123...", "loaded_content_hash": "sha256:...", "on_disk_content_hash": "sha256:...", "drift": false, "on_disk_size_bytes": 8421, "checked_at": "2026-05-12T10:15:32.456Z" } ``` | Field | Type | Description | |---|---|---| | `config_path` | string | Absolute path the admin server reads. | | `loaded_revision` | string | Pipeline `config_revision` of the running proxy. | | `loaded_content_hash` | string | Content hash of the bytes that produced the running pipeline. | | `on_disk_content_hash` | string | Content hash of the bytes the admin server just read off disk. | | `drift` | bool | True when `loaded_content_hash != on_disk_content_hash`. | | `on_disk_size_bytes` | int | Size in bytes of the on-disk config. | | `checked_at` | string | RFC 3339 timestamp of this check. | | Status | When | |---|---| | `200` | Drift check completed. The body always describes the comparison. | | `500` | Could not read the on-disk config file. Path is scrubbed from the error message. | | `503` | The admin server has no `config_path` wired, or no content-hash baseline has been captured yet. | Operators typically scrape this every few seconds from their dashboard or alert pipeline. When `drift: true` is sustained for more than the expected reload window, page the operator: either the watcher is stuck, the deploy pipeline forgot to call `POST /admin/reload`, or someone hand-edited the file out of band. --- ## Admin UI (`GET /admin/ui`, `GET /`) The OSS admin server serves a minimal browser UI at `/admin/ui` for configuration inspection, drift status, recent requests, and the runtime prompt-store overlay (see `/admin/prompts` below). `GET /` redirects to `/admin/ui` so browsing to the admin port lands on the UI without typing the path. Both routes are authenticated like the rest of `/api/*` and `/admin/*`. Response: `200 text/html`. The UI is a static SPA bundled into the binary; it does not require a separate build step or asset directory. --- ## Prompt store admin (`GET /admin/prompts`, `POST /admin/prompts/...`) Exposes the runtime prompt-store overlay. `GET /admin/prompts` returns the in-memory snapshot (every active prompt + pinned version + last-mutation metadata) as JSON. `POST /admin/prompts` mutators add a new version, pin a version, or roll back; mutations persist to the operator-configured redb file when `admin.prompt_store_path` is set, so changes survive restart. The full set of POST shapes and request schemas is documented in [ai-gateway.md](./ai-gateway.md) under "Stored prompts". This reference only catalogues the route surface; the request/response contracts live with the feature. --- ## Chat playground (`POST /admin/api/playground/chat`) A stub handler for the dashboard's interactive chat surface. The admin UI scaffold + cargo feature ship today; the wiring that routes the request through `proxy_router.oneshot` and streams a model's response back is deferred to a follow-up ticket so the front-end scaffold and the production integration can land independently. Today the route returns `501 Not Implemented` with a JSON envelope naming the follow-up: ```json { "error": "not implemented", "detail": "chat playground stub; real handler will route through proxy_router.oneshot and stream the model response back to /admin/ui" } ``` Other verbs return `405 Method Not Allowed`. The route shares the admin port's basic-auth gate, so a curious operator pinging it without credentials still sees `401 Unauthorized` first. This route is OSS, ships in every build, and lives on the admin server (next to `/admin/reload`) rather than the production proxy listener. The path is stable; the follow-up that lights up the real handler does not move it. --- ## Curl recipes ```bash ## Reload the running config. curl -s -X POST -u admin:secret \ http://127.0.0.1:9090/admin/reload ## Check for config drift. curl -s -u admin:secret \ http://127.0.0.1:9090/admin/drift | jq ## Watch per-target health. curl -s -u admin:secret \ http://127.0.0.1:9090/api/health/targets | jq '.origins[].targets' ## Inspect the last 50 requests. curl -s -u admin:secret \ http://127.0.0.1:9090/api/requests | jq '.[0:50]' ## Pull the emitted OpenAPI spec for a Postman import. curl -s -u admin:secret \ http://127.0.0.1:9090/api/openapi.json > openapi.json ``` --- ## See also - [manual.md](manual.md) - install, CLI, hot reload workflow. - [configuration.md](configuration.md) - the `proxy.admin:` block. - [openapi-emission.md](openapi-emission.md) - the emitted OpenAPI document's shape and per-origin mapping. - [access-log.md](access-log.md) - the durable structured request log. - [metrics-stability.md](metrics-stability.md) - the Prometheus `/metrics` surface. - [audit-log.md](audit-log.md) - tamper-evident log of admin actions. ================================================================ # docs/adr-ai-hub-format.md ================================================================ ## ADR: AI gateway hub format and the `ChatFormat` trait *Last modified: 2026-05-12* Status: proposed. Drives the hub `ChatFormat` trait plus `/v1/messages` and `/v1/responses` inbound surfaces. ## Context SBproxy's AI gateway today accepts the OpenAI `POST /v1/chat/completions` shape from clients and either passes it through (OpenAI-compatible upstreams: Groq, Together, DeepSeek, Mistral, Perplexity, OpenRouter, vLLM, Ollama) or hands it to a per-provider translator that rewrites request and response bytes (Anthropic Messages today; Gemini and Bedrock left as TODO in `crates/sbproxy-ai/src/translators/mod.rs:36`). The translator API is two free functions, `translate_request` and `translate_response`, branching on a small `ProviderFormat` enum. That worked while the only inbound shape was OpenAI chat-completions and the only translated upstream was Anthropic. It does not generalize. Operators are already asking for two more inbound shapes: 1. `POST /v1/messages` (the Anthropic Messages shape, so the Anthropic SDK and Claude Code can point at SBproxy directly). 2. `POST /v1/responses` (the OpenAI Responses API, which the OpenAI Python and TypeScript SDKs are migrating to). And five outbound shapes are in scope: 1. OpenAI (and every OpenAI-compatible upstream). 2. Anthropic Messages. 3. Google Gemini and Vertex AI (same wire, two transports). 4. AWS Bedrock InvokeModel / Converse. 5. Custom (per-provider plugin, owned by the operator). Three inbound shapes times five outbound shapes is fifteen translation pairs. Building each pair by hand would mean fifteen code paths, fifteen test matrices, and fifteen places where a new tool-call field has to be threaded. We have already seen the cost in miniature: the existing Anthropic translator strips seven OpenAI-only fields, hoists `system` messages, defaults `max_tokens`, and rewrites a path; adding a Gemini translator in the same style would duplicate ninety percent of that code. The cost shows up most clearly in three places. First, streaming. SSE event shapes differ for every provider. OpenAI emits `delta.content` chunks; Anthropic emits `event: content_block_delta` with a JSON-Patch-like body; Bedrock wraps everything in an AWS event-stream envelope with `:event-type` headers; Gemini emits its own `streamGenerateContent` shape. A per-pair translator means writing the same stream demuxer N times. Second, observability. We want to emit OpenInference / OTel GenAI spans that name the model, tokens, tools, and finish reason regardless of inbound or outbound format. With per-pair translators we either repeat the extraction logic per translator or add a parallel "extract telemetry from raw bytes" code path. Third, guardrails. The prompt-injection classifier, PII redactor, response-cache key, semantic cache, cost router, and budget gate all need a stable view of "what the user said" and "what the model said." Today those features only see the inbound OpenAI shape; they will go blind the moment the inbound is Anthropic Messages. The hub format solves all three by collapsing N times M into N plus M. Every inbound parser writes into one canonical Rust value; every outbound emitter reads from the same canonical Rust value; everything in between (telemetry, guardrails, caching, routing) speaks one shape. ## Decision We will introduce a `ChatFormat` trait under `crates/sbproxy-ai/src/format/` that owns translation in both directions, and a canonical `ChatRequest` / `ChatResponse` pair that every translator round-trips through. Each format implements the same trait twice over: once as an inbound parser (bytes from the client become a `ChatRequest`) and once as an outbound emitter (a `ChatRequest` becomes bytes for the upstream). Streaming follows the same pattern with `ChatEvent` chunks. The pseudo-Rust surface is short on purpose. The trait is the contract the whole pipeline depends on, so the smaller it is the fewer places have to change when we add a sixth provider. ```rust,ignore // crates/sbproxy-ai/src/format/mod.rs /// A bidirectional translator between a wire format and the hub. /// /// Implementors are stateless and cheap to construct; the gateway /// holds one instance per registered format inside a registry. pub trait ChatFormat: Send + Sync + 'static { /// Stable identifier used in config and logs (`openai`, /// `anthropic`, `gemini`, `bedrock`, `responses`). fn id(&self) -> &'static str; /// Inbound path this format claims (`/v1/chat/completions`, /// `/v1/messages`, `/v1/responses`). Returned as a slice because a /// format may claim several paths (Bedrock has both /// `InvokeModel` and `Converse`). fn inbound_paths(&self) -> &'static [&'static str]; // --- Request direction --- /// Parse client bytes on an inbound path into the hub request. /// Errors here are HTTP 400 to the client: malformed JSON, missing /// required fields, an unsupported feature the format cannot /// represent in the hub at all. fn parse_request(&self, bytes: &[u8]) -> Result; /// Emit upstream bytes for the hub request, plus the upstream /// path. Returned path is the path the AI client should hit on the /// upstream (Anthropic rewrites to `/v1/messages`; OpenAI keeps /// `/v1/chat/completions`). fn emit_request(&self, req: &ChatRequest) -> Result; // --- Response direction --- /// Parse a non-streaming upstream response body into the hub /// response. fn parse_response(&self, bytes: &[u8]) -> Result; /// Emit the hub response back to the client in this format's /// wire shape. fn emit_response(&self, resp: &ChatResponse) -> Result, ChatError>; // --- Streaming --- /// Parse a single SSE frame (the bytes between two blank lines) /// into zero or more hub events. A single upstream frame can /// expand to several hub events (Anthropic's `message_start` /// frame emits both `MessageStart` and a first `Usage` event). fn parse_event(&self, frame: &SseFrame) -> Result, ChatError>; /// Emit hub events back to the client as SSE frames. The /// translator owns terminator framing (`data: [DONE]` for OpenAI, /// `event: message_stop` for Anthropic). fn emit_event(&self, ev: &ChatEvent) -> Result, ChatError>; } pub struct EmittedRequest { pub path: String, pub body: Vec, pub headers: Vec<(String, String)>, // `anthropic-version`, etc. } ``` The trait makes four deliberate choices. First, parse-and-emit are separate methods, not a single round-trip. The pipeline often parses on one format and emits on another; baking that asymmetry into the trait means there is no temptation to write a "translator" that only works for one direction. Second, the trait is bytes-in / bytes-out at the edges and a typed `ChatRequest` / `ChatResponse` in the middle. That keeps wire formats out of the rest of the codebase: telemetry, guardrails, and cache code never look at raw JSON. Third, streaming is opaque-frame in, hub-event out, not "parse the whole stream." A frame is the unit Pingora's response body filter sees, and the SSE framing layer (`event:` / `data:` / blank line) is identical across providers. Only the payload differs. Fourth, `ChatError` is the formats' error type, with HTTP status carried inline. Format errors map directly to client errors; transport errors are caught upstream and never reach the format layer. ## Hub format shape The hub `ChatRequest` and `ChatResponse` shape are deliberately close to the OpenAI chat-completions JSON shape. OpenAI's chat-completions is the closest existing shape to a lowest common denominator: it has roles, message-level content arrays, tool calls, tool results, finish reasons, usage tokens, and streaming deltas, and every other provider's shape can be projected into it without losing the load-bearing fields. ```rust,ignore // crates/sbproxy-ai/src/format/types.rs pub struct ChatRequest { pub model: String, pub messages: Vec, pub tools: Vec, pub tool_choice: ToolChoice, pub max_tokens: Option, pub temperature: Option, pub top_p: Option, pub top_k: Option, // hub keeps it even though OpenAI lacks it pub stop: Vec, pub stream: bool, pub system: Option, // hoisted out of messages on parse pub metadata: ChatMetadata, // request id, user id, workspace id pub extensions: BTreeMap, // see below } pub struct ChatMessage { pub role: Role, // System | User | Assistant | Tool pub content: Vec, pub name: Option, pub tool_call_id: Option, // set when role == Tool } pub enum ContentPart { Text { text: String }, Image { source: ImageSource, media_type: String }, ToolUse { id: String, name: String, input: Value }, ToolResult { tool_call_id: String, content: String, is_error: bool }, } pub struct ToolCall { pub id: String, pub name: String, pub arguments: Value, // typed JSON, not the OpenAI string-of-JSON } pub struct ChatResponse { pub id: String, pub model: String, pub content: Vec, pub tool_calls: Vec, pub finish_reason: FinishReason, pub usage: Usage, pub extensions: BTreeMap, } pub enum FinishReason { Stop, Length, ToolCalls, ContentFilter, Other(String), // a provider can survive a finish_reason we have not seen } ``` Three places the hub deliberately diverges from OpenAI's shape: 1. **Tool-call `arguments` are typed JSON, not a string.** OpenAI ships `function.arguments` as a string containing JSON, because the OpenAI streaming protocol assembles that string token by token. Anthropic ships it as a real JSON object. Storing the typed value in the hub means the OpenAI emitter is responsible for stringification (a one-line `serde_json::to_string`) and every other consumer (Anthropic, Gemini, Bedrock, telemetry, guardrails) gets the structured form for free. 2. **`top_k` is in the hub even though OpenAI lacks it.** Anthropic, Gemini, and Bedrock all accept `top_k`, and dropping it on the OpenAI inbound would silently degrade sampling control for users routing OpenAI-shape requests at an Anthropic upstream. The OpenAI emitter drops it on the way out. 3. **`system` is a single optional string, not interleaved.** OpenAI permits `system` messages anywhere in the array; Anthropic requires a single top-level `system` field. The hub stores `system` as a single string (concatenated with `\n\n` on parse if the inbound had several system turns) and every emitter that wants per-turn system has to re-derive it. In practice no upstream wants per-turn system; the round-trip is lossy at the wire level (you cannot tell after the fact whether the original had one system message or three concatenated ones), but lossless at the semantic level (the model sees the same prompt). The `extensions` map is the escape valve for provider-specific knobs the hub does not model. Anthropic `cache_control` blocks land in `extensions["anthropic.cache_control"]`; OpenAI `response_format: json_object` lands in `extensions["openai.response_format"]`. Each emitter looks for the extensions namespaced to its own format and applies them; everyone else ignores them. The namespacing rule is enforced at parse time so a misnamed key is a 400 to the client, not a silent drop on the upstream. `ChatEvent` is the streaming counterpart and has a deliberately small vocabulary, covered in its own section below. ## Inbound endpoints Three inbound parsers, registered into a parser registry keyed by inbound path: - `/v1/chat/completions` (OpenAI): the existing route, refactored to call `OpenAiFormat::parse_request`. This is the pass-through path; the registry can short-circuit it when both inbound and outbound are OpenAI, skipping the hub entirely so the no-translation hot path is byte-for-byte identical. - `/v1/messages` (Anthropic): new route. Backed by `AnthropicFormat::parse_request`. Existing Anthropic clients (the Anthropic SDK, Claude Code, Cursor) point at this path and Just Work, including when the configured upstream is OpenAI or Gemini. - `/v1/responses` (OpenAI Responses): new route. Backed by `OpenAiResponsesFormat::parse_request`. The Responses shape is OpenAI's stateful-conversation API; the hub parser flattens it into a stateless `ChatRequest` and the response emitter re-wraps the result. The registry is a small struct in `crates/sbproxy-ai/src/format/registry.rs` that holds a map from inbound path to `Arc`. Outbound is selected from the provider config (each provider declares its format in `ai_providers.yml`), so the runtime never has to guess which emitter to use. Configuration touches one new field on the AI gateway block, and inbound-path support is opt-in: ```yaml ai: inbound_formats: - openai # /v1/chat/completions, always on for back-compat - anthropic # /v1/messages, opt-in - openai_responses # /v1/responses, opt-in providers: - id: claude-sonnet format: anthropic url: https://api.anthropic.com models: [claude-3-5-sonnet] ``` Opt-in inbound formats is the conservative default. If we turn on `/v1/messages` for every operator who upgrades, we hijack any operator who happens to already route `/v1/messages` to a real Anthropic upstream through SBproxy as a transparent proxy. ## Streaming translation Streaming is the highest-leverage and the highest-risk part of this design, so the hub event vocabulary is deliberately tiny. ```rust,ignore pub enum ChatEvent { MessageStart { id: String, model: String }, ContentDelta { index: usize, part: ContentPartDelta }, ToolCallDelta { index: usize, delta: ToolCallDelta }, Usage(Usage), MessageStop { finish_reason: FinishReason }, } pub enum ContentPartDelta { Text(String), // Image / ToolResult are non-streaming today; they appear in full // inside MessageStart-adjacent metadata, not as deltas. } pub struct ToolCallDelta { pub id: Option, // present in the first delta pub name: Option, // present in the first delta pub arguments_chunk: Option, // raw JSON chunk for OpenAI; // Anthropic emits whole objects } ``` Five events cover every provider we have looked at. The mapping table: | Hub event | OpenAI SSE | Anthropic SSE | Gemini SSE | Bedrock event-stream | |---|---|---|---|---| | `MessageStart` | first `data:` with `id` | `event: message_start` | first chunk with `responseId` | `:event-type: messageStart` | | `ContentDelta` | `delta.content` | `event: content_block_delta` (text) | `candidates[0].content.parts[].text` | `:event-type: contentBlockDelta` (text) | | `ToolCallDelta` | `delta.tool_calls[]` | `event: content_block_delta` (input_json_delta) | `functionCall.args` partials | `:event-type: contentBlockDelta` (toolUse) | | `Usage` | last chunk (`usage` block when `stream_options.include_usage`) | `event: message_delta` (`usage`) | `usageMetadata` on final chunk | `:event-type: metadata` | | `MessageStop` | `data: [DONE]` after `finish_reason` chunk | `event: message_stop` | `finishReason` field | `:event-type: messageStop` | Three rules keep the streaming path honest. First, **frames are the unit, not bytes.** Every translator gets a complete SSE frame (parsed by the same SSE framer in `sbproxy-transport`, which already exists for HTTP/2 push and gRPC). A translator never sees a partial frame, so it never has to buffer. Second, **a single upstream frame may produce zero or many hub events.** Anthropic's `message_start` frame carries enough state to emit both `MessageStart` and a "seed" usage record; OpenAI's first chunk emits only `MessageStart`. Returning `Vec` makes that explicit. Third, **emitters own terminator framing.** OpenAI requires a trailing `data: [DONE]`; Anthropic does not. Bedrock has a binary event-stream framing layer that wraps the SSE payload. Each emitter is responsible for getting the goodbye right. The pass-through hot path is unchanged: when inbound and outbound are both OpenAI, the registry detects the match and the streaming bytes are forwarded with zero parsing. This matters because OpenAI-compatible upstreams are still the common case and any streaming overhead is paid per token. ## Cross-format lossiness Three classes of feature do not survive every cross-format hop, and the hub will say so out loud rather than dropping silently. **Anthropic `cache_control` blocks** mark message content for Anthropic's prompt caching. There is no OpenAI analog. When the inbound is Anthropic and the outbound is OpenAI: 1. The parser stashes the blocks in `extensions["anthropic.cache_control"]` so they round-trip if the outbound is also Anthropic. 2. The OpenAI emitter drops the extension and adds one entry to the request's `lossiness` log (a `Vec` on `ChatRequest` that telemetry exports as a span attribute). 3. The classifier logs a `sbproxy_ai_format_lossy_field_total{field="anthropic.cache_control",direction="downgrade"}` counter so operators can see it on a dashboard. This is "warn and best-effort." The request still goes through; the model still answers; the operator can see in metrics and traces that the cache hint was dropped. **Anthropic thinking blocks** (`type: thinking` content blocks) come back from extended-thinking models. OpenAI o1 and o3 emit a similar concept (`reasoning_content`) but with different framing and no streamable shape. The hub keeps thinking as a first-class `ContentPart::Thinking { signature, text }` variant so any inbound parser that sees it preserves it on the way to any outbound emitter that knows what to do with it; emitters that do not (OpenAI Chat Completions today) drop it with a `lossiness` note. **OpenAI `response_format: json_schema`** is a structured-output mode OpenAI implements at decoding time. Anthropic and Gemini have similar features with different schemas and different field names. The hub does not model structured output as a first-class field today; it lives in `extensions["openai.response_format"]` and only the OpenAI emitter applies it. Cross-emitting from OpenAI to Anthropic with a `response_format` request adds a lossiness note and the operator's tests are likely to fail. This is the loudest of the three: we will document it in `ai-gateway.md` as a known limitation and revisit when WOR-... follow-ups land. Lossiness notes carry three fields: the field name, the direction (`downgrade` or `unsupported`), and a short string explaining the effect. They surface in OpenInference spans (as a `lossiness` attribute on the parent span) and in structured logs at WARN level once per request. They do not block the request. ## Migration path The existing Anthropic translator at `crates/sbproxy-ai/src/translators/anthropic.rs` becomes two halves of one `AnthropicFormat` implementor. `request_to_native` is the bones of `emit_request`; `response_to_openai` is the bones of `parse_response` plus a no-op `emit_response`. The free-function API in `translators/mod.rs` stays as a deprecated shim for one release so any out-of-tree callers do not break. Implementation breaks into roughly six to eight chunks. Each one is small enough to land on its own and CI gate, in line with the workspace's tracer-bullet preference. 1. **Hub types and registry.** Land `ChatRequest`, `ChatResponse`, `ChatMessage`, `ContentPart`, `ToolCall`, `ChatEvent`, the `ChatFormat` trait, and an empty `FormatRegistry`. No wire integration yet; the crate compiles and has unit tests for the types. 2. **OpenAI format as the identity.** Implement `OpenAiFormat: ChatFormat` so the existing `/v1/chat/completions` path can go through the hub on a feature flag. Round-trip every existing AI e2e test through the hub under the flag; flip the flag once green. 3. **Anthropic format migration.** Port the current translator into `AnthropicFormat`. Add an outbound test matrix (OpenAI inbound, Anthropic outbound) that proves byte-equivalent behavior with the legacy free-function path. Delete the free functions once the matrix is green for two releases. 4. **`/v1/messages` inbound.** Register `AnthropicFormat` as an inbound parser, gated by `inbound_formats: [..., anthropic]`. Add a route handler that picks the format from path. New e2e: Anthropic SDK against SBproxy against an OpenAI upstream. 5. **`/v1/responses` inbound.** Add `OpenAiResponsesFormat`. The Responses shape has stateful conversation handling that the hub will flatten; add a stateless emitter back to Responses for the round-trip. 6. **Streaming.** Implement `parse_event` / `emit_event` for OpenAI, Anthropic, and OpenAI Responses. Add a streaming conformance test (one fixture per provider, replayed deterministically). 7. **Gemini format.** Add `GeminiFormat` (request + response + streaming). Lights up Gemini and Vertex upstreams without a Google-side translator code path elsewhere. 8. **Bedrock format.** Add `BedrockFormat`. Bedrock's binary event-stream wrapping is the tricky part; SigV4 stays in the existing auth layer. Six chunks ship a working hub with three inbound shapes and three outbound shapes. Chunks seven and eight are independent and can ship in either order. ## Alternatives considered **Per-pair translators (the status quo).** Keep adding `translate_request_anthropic_to_openai`, `translate_request_gemini_to_openai`, and so on, fanning out to one function per pair. The translator file already has Gemini and Bedrock as TODO comments. Cost: N times M code paths, duplicated streaming logic, observability hooks duplicated per pair. Wins: zero new types, no abstraction, easy to grep. We rejected this because the duplication compounds with every provider and the streaming demuxer in particular is too large to write five times. **Upstream-only routing through OpenRouter or LiteLLM.** Send every non-OpenAI provider through OpenRouter or a sidecar LiteLLM. Wins: zero in-process translation; OpenRouter's pricing is already integrated. Cost: an extra network hop, opaque routing decisions, no control over guardrails or PII redaction (they fire after the hop), no streaming visibility, vendor lock to OpenRouter's evolution. We rejected this because the whole pitch of "the AI gateway built like a real proxy" is that everything happens in process; an external hop defeats that. **Fork OpenAI's Python SDK shapes and use them verbatim as the hub.** Mirror OpenAI's Python `Pydantic` types in Rust and treat the OpenAI shape (with `.arguments` as a string, no `top_k`) as the canonical form. Wins: zero invention; copy from a working spec. Cost: locks the hub to OpenAI's evolution (Responses already obsoletes parts of it), forces every Anthropic-only field through a string-of-JSON keyhole, and makes structured tool arguments awkward to inspect. We rejected this because the OpenAI shape is the closest existing shape, not a correct hub. The hub diverges in three places (typed `arguments`, hub-only `top_k`, single `system`) on purpose. **One trait, but bytes-in / bytes-out at the trait surface (no hub types).** Make `ChatFormat` a `(format_a, format_b, bytes_in) -> bytes_out` API and skip the canonical types. Wins: minimum allocations on the no-translation path. Cost: telemetry, guardrails, caching, and cost routing all have to re-parse the bytes; we are back to N times M for those features. We rejected this because the bytes-in / bytes-out surface only solves the translation problem and leaves four other features uncovered. ## Open questions These are genuinely undecided and need an answer before this ADR closes; do not treat the absence of an answer as a sign the design will not change. 1. **Cost routing and inbound model names.** Today the cost router keys on the OpenAI model name. When the inbound is Anthropic Messages with `model: claude-3-5-sonnet`, does the router look up Anthropic pricing, or does it expect the operator's `ai_providers.yml` to declare an alias? Probably the latter, but the alias-resolution path needs a design. 2. **Guardrail input scope on multi-turn conversations.** The prompt-injection classifier inspects the latest user message today. With Anthropic-style messages where a `tool_result` block can carry attacker-controlled text from a previous tool call, the "latest user message" is the wrong scope. Hub-level: scan every `Tool` role message too? Open. 3. **Streaming back-pressure.** The hub emits `Vec` per upstream frame. If a slow client cannot keep up with the upstream's frame rate, we either buffer (memory pressure) or drop (correctness loss). Pingora already has body-write back-pressure; need to confirm that the trait surface composes with it cleanly when the emitter produces several SSE frames per hub event. 4. **`extensions` versioning.** Provider wire formats evolve. If Anthropic adds a new `cache_control` mode, every old parser will silently drop it. Do we pin a wire-version per format, fail closed on unknown extensions, or warn? Probably "warn and pass through under a versioned key," but the policy is not written yet. 5. **`/v1/responses` stateful mode.** The Responses API has a `previous_response_id` field that points at a prior conversation. The hub flattens to stateless requests; the operator-facing question is whether SBproxy stores those conversations itself or refuses the field. Refusing is the conservative answer for v1, but it breaks `client.responses.create(previous_response_id=...)` calls. 6. **Schema discipline for `extensions`.** Today the rule is "namespace by format id" but it is not enforced beyond a runtime check. A JSON Schema fragment per format would let the config compiler validate at load time. Worth doing in chunk one or worth deferring? Open. 7. **Where does the AWS event-stream wrapper live?** Bedrock's streaming layer is non-trivial. Inside `BedrockFormat::parse_event`, or in a `sbproxy-transport` helper that other AWS services could share? Leaning toward the helper, but not certain until the second AWS-shape provider lands. ================================================================ # docs/adr-outbound-credential-resolver.md ================================================================ ## ADR: outbound credential resolver, OSS vs enterprise line *Last modified: 2026-05-24* Status: accepted. Drives the move of outbound-credential-resolver basics into OSS. ## Context SBproxy's stated differentiator is the outbound credential resolver: the gateway mints or exchanges the right credential for each upstream so the agent or client never handles a per-upstream secret. A request arrives with one identity; the proxy presents a different, correctly-scoped credential to each upstream it talks to. Until now the whole resolver was an enterprise capability. The OSS binary shipped `sbproxy-vault` (secret resolution and rotation) but no outbound *minting*: RFC 8693 token exchange, the OAuth client-credentials grant, broker JWT re-sign, DPoP, and stored per-user OAuth grants were all paid. Two things changed that make this line wrong: 1. **The basic mechanism is no longer category-unique.** Per-upstream outbound credential brokering is now offered by AWS Bedrock AgentCore Gateway, Pomerium, Auth0 / Okta Token Vault, Arcade, and Scalekit. RFC 8693 token exchange is generally available in Keycloak 26.2 and Okta. A self-hostable gateway whose headline differentiator is paywalled looks behind on its own pitch. 2. **Two open competitors are racing the same square.** agentgateway (Rust, open) and Bifrost (Go, open) target the self-hostable agent gateway niche. If the OSS binary cannot even demonstrate the resolver, the wedge is undefended. The differentiator has to move up the stack. The basic minting mechanism becomes table-stakes that OSS must show; the durable, monetizable value moves to operating that mechanism at scale. ## Decision OSS ships the **mechanism**: enough to resolve a per-upstream outbound credential three ways, single-tenant, statically configured, with the safety rails that make exchange safe to run. Enterprise keeps **operation at scale**: per-user delegated identity, sender-constrained tokens, broker-as-issuer, multi-tenant and multi-source entitlements, and the hardware-backed and compliance tooling around all of it. This mirrors the split already used elsewhere in the product: the mechanism is OSS; the operational, multi-tenant, hardware-backed, and compliance-grade layers are enterprise. ### OSS (the basics) - **RFC 8693 token exchange.** Exchange a subject token for an upstream-audience token (`grant_type=urn:ietf:params:oauth:grant-type:token-exchange`). - **OAuth client-credentials grant** per upstream. - **Vault-resolved static secret** per upstream (already in OSS; exposed through the unified resolver). - **The unified `outbound_credential_resolver` config surface**: per origin, select one of the three modes. This is the artifact that demonstrates the wedge. - **The safety rails that ride with exchange**, shipped together with it and never separable: `subject_token_issuers` and `allowed_token_exchange_audiences` allowlists, the `act` delegation chain with a depth cap, and a single-process minted-token cache with TTL. A basic feature must not ship in an unsafe configuration; security rails are not a paid add-on. ### Enterprise (operation at scale) - **Stored OAuth grants / per-user token vault**: device-code and interactive-consent flows, refresh-token lifecycle, per-user delegated identity. This is the operationally hard, high-value capability that comparable products charge for. - **Broker JWT re-sign and issuer-vouched / broker-augmented identity (CIMD)**: the broker becomes the issuer. Needs hardware-backed keys and is compliance-grade. - **Sender-constrained tokens (DPoP, mTLS-bound).** - **Multi-source entitlements, multi-tenant credential isolation, and hardware-backed broker keys.** Combining identity across an identity provider, workload identity, and an entitlement service, isolated per tenant, is the enterprise operational job. ### The crux: RFC 8693 itself is OSS The one genuinely debatable item is token exchange. It is OSS. Keeping it paid is indefensible now that it is generally available across the IdP market, and an open binary that cannot show token exchange cedes the narrative to the open competitors. The differentiator survives because the operational layer (stored per-user grants, broker-as-issuer, multi-tenant, hardware-backed, audited) stays enterprise, and that is where buyers actually spend. ## Consequences - The OSS binary can demonstrate, end to end and without a license: "per-upstream credentials, minted three ways, no client-side secret handling, self-hosted." That is the wedge, defended. - Enterprise sells the operational story: "operate that for thousands of users across dozens of upstreams, sender-constrained, broker-issued, and audited." - The OSS resolver is single-tenant and statically configured by design. Multi-tenant isolation and dynamic, per-user credential lifecycle are the natural upgrade boundary, so the line is legible to operators rather than arbitrary. - The resolver is a closed enum of modes, so an operator who needs a mode the OSS binary does not implement gets a config-load error rather than a silent fallback to an unsafe default. ## Implementation PR 1 lands this ADR and the OSS resolver subsystem: the config surface, the three minting modes, the allowlists, and the `act`-chain depth cap, with unit coverage including a mock token endpoint. A follow-up wires the resolver into the outbound request path per upstream and adds the end-to-end test (request to upstream A gets credential A; request to upstream B gets credential B). ================================================================ # docs/agent-budget.md ================================================================ ## agent_budget policy *Last modified: 2026-05-31* The `agent_budget` policy is a semantic rate-limit primitive keyed on the resolved `agent_id`. Standard per-IP / per-user / per-key limits assume humans pause between requests; agents driven by an LLM loop fire at network speed and trip those buckets immediately. Datadog reports roughly a third of LLM-span errors in production are rate-limit denials for exactly that reason. One bucket per named agent collapses "every request from the Cursor instance" or "every request from the same OpenAI Assistant" into a single budget that an operator can actually size. The `agent_id` comes from the agent-class resolver (`sbproxy-agent-detect` / `sbproxy-classifiers`); when no `agent_id` resolved, the policy applies the `on_anonymous` rule. ## Config ```yaml origins: "ai.example.com": upstream: https://api.openai.com auth: type: bearer policies: - type: agent_budget # Token-bucket refill rate, per agent_id. requests_per_minute: 60 # Rolling LLM-token budget per agent_id. The token bucket # exists in the policy API; consumption is wired in via the # AI-usage tracker. Configuring without that wiring is a no-op # on the token field today. tokens_per_hour: 100000 # Max simultaneous in-flight requests per agent_id. RAII guard # releases the slot when the request completes. burst: 10 # What to do when the cap fires. # - deny (default): respond 429. # - log: emit the decision metric, pass the request through. # - downgrade: dispatcher routes to a cheaper model. on_exceed: deny # What to do when the request has no resolved agent_id. # - skip (default): no enforcement. # - shared: all anonymous requests share one bucket. on_anonymous: skip ``` ## Decisions The policy reports its verdict to the dispatcher; the dispatcher maps the verdict to a real action: | Verdict | `on_exceed` | HTTP outcome | |---|---|---| | Within budget | n/a | pass through | | Cap fired, deny | `deny` | 429 with `Retry-After` | | Cap fired, log | `log` | pass through, metric increments | | Cap fired, downgrade | `downgrade` | dispatcher picks the cheaper AI provider for this request | ## Observability * `sbproxy_policy_triggers_total{origin, policy_type="agent_budget", action="block"}` increments on `deny` denials. * `sbproxy_ai_budget_utilization_ratio{origin, agent_id}` gauge reports the current utilisation per agent. * Access log: `policy_action` set to the verdict; `agent_id`, `agent_class`, `agent_vendor` carry the resolved agent identity. ## Why per-agent A standard rate-limit policy keyed on IP or API key cannot distinguish "Cursor making 200 background completions while the user types" from "an attacker fanning out 200 distinct concurrent prompts". Both look identical to an IP-keyed bucket. Keying on `agent_id` (the resolved agent identity, not the network address) lets the operator size the legitimate background traffic without hardening to it, and lets the abuse path get blocked cleanly because the attacker cannot produce a fresh `agent_id` per request without re-resolving against the agent registry. ## Out of scope for slice 1 * Cluster-shared budgets. Each proxy enforces its own local view; an attacker spreading across replicas sees N times the per-instance budget. A cluster-shared backend (Redis or shared KV) is the obvious follow-up; for now, treat the per-instance budget as the floor. * Upstream token accounting. `tokens_per_hour` is wired into the policy API but only consumed when the AI gateway calls `AgentBudgetPolicy::consume_tokens`. A follow-up wires that into `sbproxy-ai`'s usage tracker. ## See also * [features.md](./features.md) - tour with policy examples. * [examples/agent-budget/](../examples/agent-budget/) - runnable per-agent rate-limit fixture. * [ai-gateway.md](./ai-gateway.md) - the AI surfaces the budget protects. * [configuration.md](./configuration.md) - the full schema. ================================================================ # docs/agent-skills.md ================================================================ ## Agent Skills v0.2.0 *Last modified: 2026-05-09* SBproxy serves an Agent Skills v0.2.0 discovery manifest at `/.well-known/agent-skills/index.json`. Cooperative agents fetch the manifest to discover the skills the origin advertises, then fetch each artifact at the URL the manifest pins. Every artifact body is hashed (SHA-256) at config-load time and re-hashed on every serve. The schema lives at `https://schemas.agentskills.io/discovery/0.2.0/schema.json`. The originating RFC is at `https://github.com/cloudflare/agent-skills-discovery-rfc`. ## What it does The Agent Skills projection is a sibling of the four Wave 4 projections (`robots.txt`, `llms.txt`, `licenses.xml`, `tdmrep.json`). All five are derived from the compiled config snapshot and refreshed atomically on every config reload. Each entry in the manifest carries: - `name` - stable identifier. - `type` - closed enum, `skill-md` or `archive`. - `description` - one-line capability summary. - `url` - relative, path-absolute, or fully-qualified. - `digest` - `sha256:` of the artifact body. URLs are resolved per RFC 3986 against the request authority at serve time, so the manifest's URLs stay portable across hostnames and schemes. ## Configuration ```yaml proxy: http_bind_port: 8080 origins: "test.sbproxy.dev": action: type: proxy url: https://test.sbproxy.dev agent_skills: - name: "deploy-via-pr" type: skill-md description: "Open a PR to deploy a config change." url: "/skills/deploy-via-pr.md" visibility: public - name: "internal-rotate-secret" type: skill-md description: "Rotate a service credential via vault." url: "/skills/internal-rotate-secret.md" visibility: authenticated ``` Every field except `name`, `type`, `description`, and `url` is optional. Skills can declare an inline `body:` literal, an explicit filesystem `path:`, or rely on the workspace-relative resolution that the URL implies (the example above resolves `/skills/deploy-via-pr.md` against the directory `sbproxy serve` was invoked from). ### Visibility `public` (the default) returns the entry to every caller. `authenticated` filters the entry out of the manifest served to anonymous callers. Callers that present an `Authorization` header receive the full set. The serve-time filter walks the manifest fresh on every request, so an authenticated upgrade does not require a manifest reload. SHA-256 digests are computed once at config-load and pin the artifact body across all callers. ### Archive entries (`type: archive`) `archive` entries point at a `.tar.gz` or `.zip` bundle. The proxy sniffs the magic bytes, validates the bundle once at config-load time, and serves it as opaque bytes on every request. The archive parser refuses to load a bundle that: - traverses outside the archive root via `..` or absolute paths, - contains a symlink whose target escapes the archive root (or any symlink at all in the zip case), - exceeds the configured decompression ratio (default 100:1), - exceeds the configured entry count (default 1000), or - exceeds the configured expanded byte budget (default 10 MiB). Each cap is configurable per entry: | Field | Default | Purpose | |---|---|---| | `max_decompression_ratio` | 100 | Compressed:expanded ratio cap. | | `max_entries` | 1000 | Max entries per archive. | | `max_expanded_bytes` | 10485760 | Max expanded archive bytes. | | `max_clock_skew_secs` | 60 | Tolerance for time-sensitive headers. | ## Integrity contract Every artifact `GET` re-hashes the served body and compares to the manifest digest. On mismatch the proxy: 1. Returns HTTP 503 with a generic "service unavailable" body. 2. Emits a structured `agent_skill.digest_mismatch` audit event with `{ skill_name, hostname, expected_digest, observed_digest }`. 3. Increments `sbproxy_agent_skill_digest_mismatch_total{skill=""}`. The runtime check is the contract that lets cooperative agents trust the digest. Operators who wire an audit sink see the mismatch land on their existing audit pipeline. ## No script execution Per the v0.2.0 spec, SBproxy does not execute pre-/post-hooks or any embedded scripts shipped inside an artifact. Artifacts are served as opaque bytes. Archives are validated for size and traversal safety at config-load time but are never extracted to disk during a request, and the request handler never invokes a subprocess on the artifact body. ## MCP `experimental.agentSkillsUrl` advertising When the origin's action is an MCP gateway and `agent_skills:` is configured, the `initialize` JSON-RPC response includes a `capabilities.experimental.agentSkillsUrl` field pointing at the manifest. The advertised URL is the absolute URL of the origin's `/.well-known/agent-skills/index.json`, resolved from the request `Host` and the proxy's TLS posture. ```json { "protocol_version": "2025-06-18", "capabilities": { "tools": {}, "experimental": { "agentSkillsUrl": "https://api.example.com/.well-known/agent-skills/index.json" } }, "server_info": { "name": "sbproxy-mcp", "version": "1.0" } } ``` The advertised path is the same regardless of caller identity; the manifest itself filters by visibility at serve time. When `agent_skills:` is not configured for the origin, the field is omitted entirely (no empty advertisement). ## `resources.listChanged` capability and manifest refresh When `agent_skills:` is configured, the `initialize` response also advertises `capabilities.resources.listChanged: true`. The manifest is exposed to MCP clients as a resource; `listChanged` is the signal that the resource set can change and the client should subscribe to refresh notifications instead of caching the manifest forever. ```json "capabilities": { "resources": { "listChanged": true }, "experimental": { "agentSkillsUrl": "..." } } ``` How a client uses this depends on its transport: * **Persistent server-push transport** (the MCP streamable HTTP transport's GET-SSE channel, when present): the client opens the SSE channel and waits for a `notifications/resources/list_changed` push. The proxy will emit that frame when the manifest regenerates, once the server-side SSE push channel ships in a future release. * **Request/response only** (the common case today): the client treats the manifest like any other long-cached HTTP resource and uses the `Cache-Control` / `Last-Modified` headers on the well-known endpoint, polling with `If-Modified-Since` when its internal cadence allows. The advertised `listChanged: true` is the hint that polling IS expected; without it, a client might cache the manifest indefinitely. The capability is omitted entirely when `agent_skills:` is not configured, so a legacy client that keys off field presence does not subscribe to a channel that has nothing to emit. ## Inspection ```bash curl -s -H 'Host: api.example.com' \ http://127.0.0.1:8080/.well-known/agent-skills/index.json | jq curl -s -H 'Host: api.example.com' -H 'Authorization: Bearer demo' \ http://127.0.0.1:8080/.well-known/agent-skills/index.json | jq ``` The example bundle at `examples/agent-skills/` is runnable with `sbproxy serve -f sb.yml` and demonstrates the manifest, the visibility filter, and the digest contract end-to-end. ## See also - [`mcp.md`](mcp.md) for the broader MCP gateway story. - [`threat-model.md`](threat-model.md) for the OSS trust boundaries that constrain the digest verifier. - [`features.md`](features.md) for the projection family overview. ================================================================ # docs/ai-crawl-control.md ================================================================ ## AI Crawl Control + Pay Per Crawl *Last modified: 2026-05-08* The `ai_crawl_control` policy implements the "Pay Per Crawl" pattern: AI crawlers that arrive without a valid `Crawler-Payment` token receive `402 Payment Required` along with a JSON challenge body. A crawler that wants the content reads the challenge, posts a payment to your billing system, and retries with the issued token in the `Crawler-Payment` header. Each token redeems exactly once. The OSS implementation ships an in-memory ledger seeded from config and an HTTPS-only HTTP ledger client for production. The enterprise build extends the same `Ledger` trait with managed adapters so the proxy can authorise tokens against Stripe, x402, MPP, and Lightning rails. ## OSS scope: challenge body only The OSS proxy emits two challenge shapes: 1. **Single-rail (default).** A 402 with the `Crawler-Payment` header and a flat JSON body describing the price. This is the path legacy crawlers see. 2. **Multi-rail (opt-in).** When the agent sends `Accept-Payment:` or one of the multi-rail `Accept` MIME types (`application/sbproxy-multi-rail+json`, `application/x402+json`, `application/mpp+json`), the OSS proxy emits a 402 with `Content-Type: application/sbproxy-multi-rail+json` and a body that lists one entry per rail the operator declared (x402, MPP, Lightning), each with its own quote-token JWS. The multi-rail body is the wire-format contract. The OSS build can negotiate it, advertise rails, mint per-rail quote tokens, and respond 406 when the agent's preference set has no overlap with the operator's offered rails. What the OSS build cannot do is settle a payment on x402, MPP, Stripe, or Lightning. Settlement code lives in the enterprise build behind the `stripe`, `x402`, `mpp`, `lightning-cln`, `lightning-lnd`, and `lightning-phoenixd` cargo features. With an OSS-only build, the rails advertised in the multi-rail body are honoured by the in-memory or HTTP ledger; the enterprise BillingRail registrations are what actually authorise a real-money settlement. This is the same framing the rail-Lightning example uses: see `examples/rail-lightning/README.md`. For the wire-shape contract on its own, see [`402-challenge.md`](402-challenge.md). ## Request flow ``` crawler GET /article User-Agent: GPTBot/1.0 proxy <- 402 Payment Required Crawler-Payment: realm="ai-crawl" currency="USD" price="0.001" Content-Type: application/json body: {"error":"payment_required","price":"0.001","currency":"USD","target":"blog.example.com/article","header":"crawler-payment"} crawler GET /article (after paying out-of-band) User-Agent: GPTBot/1.0 crawler-payment: tok_a89be2... proxy <- 200 OK body:
crawler GET /article (replay attempt) User-Agent: GPTBot/1.0 crawler-payment: tok_a89be2... proxy <- 402 (single-use ledger; token already spent) ``` ## Configuration ```yaml policies: - type: ai_crawl_control price: 0.001 currency: USD header: crawler-payment # default crawler_user_agents: # case-insensitive substring match - GPTBot - ChatGPT-User - ClaudeBot - anthropic-ai - Google-Extended - PerplexityBot - CCBot valid_tokens: # in-memory ledger - tok_a89be2f1 - tok_b7cf012e - tok_c34f9a82 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `price` | float | unset | Price emitted in the challenge body and the `price=` parameter of the challenge header. Used as the fallback when no tier matches. | | `currency` | string | `USD` | ISO-4217 code surfaced in the challenge header and body. | | `header` | string | `crawler-payment` | Header the crawler reads from the 402 response and writes to its retry. | | `crawler_user_agents` | list | covers GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, Google-Extended, PerplexityBot, CCBot, FacebookBot | Case-insensitive substring matches against the request User-Agent. Empty list treats every GET / HEAD as a crawler. | | `valid_tokens` | list | `[]` | Seeds the in-memory ledger. Each token redeems once, then leaves the set. | | `tiers` | list | `[]` | Pricing tiers. First match wins. See "Tiered pricing" below. | | `ledger` | block | unset | HTTP ledger client config. See "HTTP ledger" below. Mutually exclusive with `valid_tokens`. | Only `GET` and `HEAD` requests are subject to charging today. `POST`, `PUT`, `PATCH`, and `DELETE` pass through without charge. ## Tiered pricing A flat per-site price is the right starting point but not the right long-term shape. Different routes carry different commercial value, and the same article in three formats (HTML, Markdown, PDF) is worth three different prices to a training crawler. The `tiers:` field lets you price by route pattern and content shape without forking the policy. ```yaml policies: - type: ai_crawl_control price: 0.0005 # fallback when no tier matches currency: USD tiers: - route_pattern: /premium/* price: amount_micros: 5000 # $0.005 per crawl currency: USD free_preview_bytes: 1024 # cooperative crawlers get 1 KiB free paywall_position: hard - route_pattern: /articles/* price: amount_micros: 1000 # $0.001 per crawl currency: USD content_shape: markdown # Markdown form only free_preview_bytes: 4096 paywall_position: soft - route_pattern: /articles/* price: amount_micros: 500 # $0.0005 per crawl currency: USD content_shape: html - route_pattern: /docs/* price: amount_micros: 250 currency: USD ``` | Field | Type | Description | |---|---|---| | `route_pattern` | string | Path matcher. Supports literal paths (`/about`) and a `*` suffix wildcard (`/articles/*`). First match wins; later tiers act as fallbacks. | | `price.amount_micros` | u64 | Price in micros (1e-6 of one unit of `currency`). 1000 micros = $0.001. Floats never enter the wire format. | | `price.currency` | string | ISO-4217 code. Must match the policy-level `currency` for now. | | `content_shape` | enum | One of `html`, `markdown`, `json`, `pdf`, `other`. Advisory; surfaced in metrics and the redeem payload but not yet used as a tier filter. | | `free_preview_bytes` | u64, optional | Byte budget the crawler may read without paying. Surfaced in the challenge body so cooperative crawlers can decide up front whether the preview alone meets their need. | | `paywall_position` | enum, optional | Hint to the crawler about where the paywall sits: `hard` (no content without payment), `soft` (preview, then paywall), `metered` (N free per period). | The first tier whose `route_pattern` matches wins. When no tier matches, the policy falls back to the top-level `price` and `currency`. An empty `tiers` list keeps the original flat-price behaviour. ### Per-shape pricing `content_shape` is advisory: configurations may set the field on a tier so metrics and the redeem payload carry the shape, but the policy does not yet match against it. The wire format is stable, so configurations that set `content_shape` today will keep working when the resolver lands. ## HTTP ledger The OSS in-memory ledger (`valid_tokens:`) is fine for tests, fixed-token issuance, or one-off content gates. Production deployments with multiple proxy replicas need a network-callable ledger so one token spends across all nodes. The HTTP ledger client speaks a JSON-over-HTTPS protocol with HMAC-SHA256 envelope signatures over a fixed eight-line canonical form. ```yaml policies: - type: ai_crawl_control price: 0.001 currency: USD ledger: endpoint: "https://ledger.internal" key_id: "sb-ledger-2026-q2" key_file: "${SBPROXY_LEDGER_HMAC_KEY_FILE}" workspace_id: "default" agent_id: "openai-gptbot" # forwarded into the redeem payload agent_vendor: "OpenAI" per_attempt_timeout_ms: 5000 total_timeout_ms: 30000 max_attempts: 5 # hard-capped at 5 by the ADR breaker: failure_threshold: 10 success_threshold: 1 open_duration_ms: 5000 ``` The client refuses to construct against a non-HTTPS endpoint at config-load time. Plain HTTP is a hard error because the request envelope carries an HMAC over the body, and TLS is the only thing keeping the body itself confidential. ### Request envelope Every redeem call carries the eight-line canonical envelope: ```json { "v": 1, "request_id": "01HZX...", "timestamp": "2026-04-30T12:34:56.789Z", "nonce": "8f4a...32-hex...", "agent_id": "openai-gptbot", "agent_vendor": "OpenAI", "workspace_id": "default", "payload": { "token": "tok_abc...", "host": "blog.example.com", "path": "/articles/foo", "amount_micros": 1000, "currency": "USD", "content_shape": "markdown" } } ``` The signature is HMAC-SHA256 over the canonical signing string (eight `\n`-separated fields, last one being the SHA-256 of the request body). The signature lands in the `X-Sb-Ledger-Signature: v1=` header. The `v1=` prefix reserves room for future MAC migrations without breaking peers. ### Idempotency Every attempt carries an `Idempotency-Key` header (a fresh ULID per logical operation). Retries reuse the same key; the ledger short-circuits the second attempt with the cached response. A different body under the same key returns 409 `ledger.idempotency_conflict`, which protects against accidental key reuse across operations. `Idempotency-Key` is distinct from the envelope's `request_id`: the request id identifies the inbound 402 from the agent, while the idempotency key identifies a single conversation with the ledger about that request. ### Retry and circuit breaker Exponential backoff with full jitter, max 5 attempts, per-attempt deadline 5 s, total deadline 30 s. The base schedule is 0 ms, 250 ms, 500 ms, 1 s, 2 s, each with `[0, base)` jitter added. Retries fire only on: - network errors (DNS, TCP RST, TLS handshake, read timeout) - HTTP 429 (with `Retry-After` honoured) - HTTP 502 / 503 / 504 - error envelopes with `retryable: true` Hard failures (`ledger.token_already_spent`, `ledger.signature_invalid`, `ledger.bad_request`) translate directly to a 402 to the crawler. There is no point retrying a token the ledger already rejected as spent. The circuit breaker opens after 10 consecutive failures over a 30 s window, half-opens after 5 s with one probe, and closes on probe success. While the breaker is open, the client returns a synthetic `ledger.unavailable` error without making the network call. The policy treats that as "ledger is down" and applies the configured `on_ledger_failure` action (default fail-closed). A 503 response with `Retry-After` propagates straight to the crawler: the 402 response carries `Retry-After` so the crawler knows when to come back. This is the one case where the policy emits `Retry-After` on a 402. ### Failure modes | Ledger response | Policy action | |---|---| | 200 success, redeemed | Pass the request through. | | 200 success, not redeemed | 402 with the challenge body. The token was valid format but the ledger refused (out of balance, expired). | | 409 `token_already_spent` | 402, no retry. | | 4xx other | 402, no retry, log at WARN. | | 5xx, transient envelope, breaker open | Apply `on_ledger_failure` (default fail-closed -> 503). | ## Agent classes and per-vendor pricing An `agent_class` taxonomy lets metrics, audit logs, and ledger payloads attribute revenue per vendor. The agent class is resolved at request time via three signals (in order of confidence): 1. Verified Web Bot Auth `keyid` matches an `expected_keyids` entry. Highest confidence. 2. Forward-confirmed reverse-DNS suffix matches an `expected_reverse_dns_suffixes` entry. Strong confidence. 3. User-Agent regex match. Advisory unless the policy explicitly trusts UAs. Three reserved sentinels round out the resolver: - `human` is emitted when no automated-agent signal is present. - `unknown` is the fall-through bucket for an automated UA without a registry match. - `anonymous` is emitted for anonymous Web Bot Auth requests with no known `keyid`. Operators see all three values in metrics and dashboards; alerting on a sustained climb in `unknown` is the normal way to spot a new crawler that needs a registry entry. ### Per-vendor pricing example ```yaml agent_classes: - id: openai-gptbot vendor: OpenAI purpose: training expected_user_agent_pattern: "(?i)\\bGPTBot/\\d" expected_reverse_dns_suffixes: [".gptbot.openai.com"] - id: anthropic-claudebot vendor: Anthropic purpose: training expected_user_agent_pattern: "(?i)\\bClaudeBot/\\d" - id: commoncrawl-ccbot vendor: Common Crawl purpose: archival expected_user_agent_pattern: "(?i)\\bCCBot/\\d" policies: - type: ai_crawl_control currency: USD tiers: # Training crawlers pay full price. - route_pattern: /articles/* agent_id: openai-gptbot price: { amount_micros: 2000, currency: USD } - route_pattern: /articles/* agent_id: anthropic-claudebot price: { amount_micros: 2000, currency: USD } # Archival crawlers get a discount. - route_pattern: /articles/* agent_id: commoncrawl-ccbot price: { amount_micros: 500, currency: USD } # Sentinel buckets price differently for diagnostics. - route_pattern: /articles/* agent_id: anonymous price: { amount_micros: 1000, currency: USD } - route_pattern: /articles/* agent_id: unknown price: { amount_micros: 1500, currency: USD } ``` `agent_id` on a tier matches against the resolver's verdict. The first tier whose route pattern AND agent id both match wins. A tier without `agent_id` matches every agent. The eight default agent classes (`openai-gptbot`, `openai-chatgpt-user`, `anthropic-claudebot`, `perplexity-perplexitybot`, `google-googlebot`, `google-extended`, `microsoft-bingbot`, `duckduckgo-duckduckbot`, `apple-applebot`, `commoncrawl-ccbot`) ship embedded in the binary. Operators extend or override entries inline in `sb.yml`. ## Observability Every redeem fires a metric and a structured-log line. The label set: | Label | Source | Cardinality cap | |---|---|---| | `agent_id` | Agent-class resolver. Bounded to registry plus `human`, `unknown`, `anonymous` sentinels. | 200 | | `agent_class` | Closed enum from the taxonomy. | 8 | | `agent_vendor` | Free-form vendor name from the taxonomy. | 20 | | `payment_rail` | Closed enum: `none`, `x402`, `mpp_card`, `mpp_stablecoin`, `stripe_fiat`, `lightning`. | 6 | | `content_shape` | Closed enum: `html`, `markdown`, `json`, `pdf`, `other`. | 5 | Cardinality budgets are enforced by `sbproxy-observe::cardinality::CardinalityLimiter`; over-cap label values demote to `__other__` and increment `sbproxy_label_cardinality_overflow_total`. ### Metrics | Metric | Type | Notes | |---|---|---| | `sbproxy_ledger_redeem_total{result, agent_id, agent_vendor, payment_rail}` | counter | Per-redeem outcome. `result` is one of `success`, `denied`, `error`. | | `sbproxy_ledger_redeem_duration_seconds_bucket` | histogram | Tail-latency of the ledger round-trip. Carries trace exemplars. | | `sbproxy_ledger_circuit_breaker_state{endpoint}` | gauge | 0 closed, 1 half-open, 2 open. | | `sbproxy_ledger_circuit_breaker_transitions_total{endpoint, from, to}` | counter | Breaker flap counter. | | `sbproxy_requests_total{agent_id, agent_class, agent_vendor, payment_rail, content_shape}` | counter | Per-request outcome. | The per-agent dashboard (`deploy/dashboards/per-agent.json`) groups every panel by `agent_class` plus `agent_vendor`, so operators see one row per vendor and one row each for the sentinels. The audit-log dashboard (`deploy/dashboards/audit-log.json`) shows admin actions on `ai_crawl_control` tier edits. ### Tracing The HTTP ledger client emits one outbound span per attempt, named `sbproxy.ledger.redeem`. The span carries `sbproxy.ledger.idempotency_key` so operators correlating across the proxy and the ledger can grep both sides for the same key. W3C TraceContext propagates on the outbound request; if the ledger emits OTel spans, the trace stitches end-to-end without manual correlation. Exemplars on `sbproxy_ledger_redeem_duration_seconds_bucket` let Grafana jump from "this latency outlier" straight to the matching trace in Tempo. ## Limitations - Detection is User-Agent based by default. Crawlers that lie about their UA bypass the check unless reverse-DNS or Web Bot Auth signals catch them; layer this with bot-detection or WAF policies for defence in depth. - The OSS in-memory ledger is single-process. Multi-replica deployments without an HTTP ledger need sticky session affinity to one replica. - `content_shape` is advisory. The field flows through metrics and the redeem payload but is not yet used as a tier filter. - Per-agent pricing requires the agent-class resolver to be enabled; the resolver runs unconditionally by default, but operators who explicitly disable it fall back to UA-only matching and lose the per-vendor distinction. ## See also - [configuration.md](configuration.md#ai_crawl_control) - schema reference. - [ai-gateway.md](ai-gateway.md) - how this policy interacts with `ai_proxy` upstreams. - [observability.md](observability.md) - metrics, logs, traces, dashboards. - `examples/ai-crawl-control/` - runnable example. ================================================================ # docs/ai-gateway.md ================================================================ ## SBproxy AI gateway guide *Last modified: 2026-06-06* SBproxy includes an AI gateway that sits between your application and LLM providers. You get one API endpoint with automatic failover, cost tracking, rate limits, and programmable routing across OpenAI, Anthropic, and other providers. The proxy ships with 66 native providers behind one OpenAI-compatible API, including a native Anthropic translator. You bring your own provider keys and the model name passes straight through, so you reach 200+ models without waiting on us to add them. ## Provider setup Configure one or more providers under the `action` block. Each provider needs a name, API key, and model list: ```yaml origins: "ai.example.com": action: type: ai_proxy providers: - name: openai api_key: ${OPENAI_API_KEY} models: [gpt-4o, gpt-4o-mini, gpt-4-turbo] - name: anthropic api_key: ${ANTHROPIC_API_KEY} models: [claude-sonnet-4-20250514, claude-3-5-haiku-20241022] default_model: gpt-4o-mini routing: strategy: round_robin ``` API keys support environment variable interpolation with `${VAR_NAME}` syntax. Never put raw keys in config files. ### Native providers 66 native providers ship in-tree alongside a native Anthropic translator. You bring your own key per provider and the `model` field passes straight through, so the gateway reaches 200+ models (and any model a provider ships next) without enumerating them. Direct adapters include `openai`, `anthropic`, `gemini`, `azure`, `bedrock`, `cohere`, `mistral`, `groq`, `deepseek`, `together`, `fireworks`, `cerebras`, `sambanova`, `nvidia`, `vertex`, `databricks`, `huggingface`, `vllm`, and `openrouter`. Any model a listed provider serves works without extra config. For a self-hosted or proprietary endpoint, point `vllm` or any provider at it with a custom `base_url`. `openrouter` is available as one of the providers when you want many vendors behind a single key. See `providers.md` for the full per-provider table. ## Routing strategies The `routing.strategy` field controls how the proxy picks a provider for each request. ### round_robin Spreads requests evenly across healthy providers. A reasonable default. ```yaml routing: strategy: round_robin ``` ### weighted Assigns a weight to each provider. Higher weight means more traffic. ```yaml routing: strategy: weighted ``` ### fallback_chain Tries providers in priority order. When the selected provider fails or returns 5xx, the router moves to the next provider. ```yaml routing: strategy: fallback_chain ``` ### cost_optimized Picks the cheapest provider that is not already loaded. The router scores each provider as `in_flight_requests * 1000 + weight` and routes to the lowest score. Set a lower `weight` on cheaper providers so they win ties when utilization is similar. ```yaml routing: strategy: cost_optimized ``` ### lowest_latency Routes to the provider with the lowest observed latency based on recent request history. ```yaml routing: strategy: lowest_latency ``` ### least_connections Routes to the provider with the fewest in-flight requests. ```yaml routing: strategy: least_connections ``` ### sticky Pins a user or session to the same provider. Falls back to round_robin for the initial pick. ```yaml routing: strategy: sticky ``` ### random Picks a provider uniformly at random. Useful for spreading load when no other signal applies. ```yaml routing: strategy: random ``` ### token_rate Routes to the provider with the most remaining token-per-minute capacity. Pair with per-provider token limits so the router can score headroom. ```yaml routing: strategy: token_rate ``` ### race Fans the request out to every eligible provider in parallel, returns the first 2xx, cancels the in-flight losers. Optimizes p99 latency at the cost of N times the API spend per request. Pair with `resilience` so persistently slow providers fall out of the eligible set. ```yaml routing: strategy: race ``` See [examples/ai-race](../examples/ai-race/sb.yml). ### least_token_usage Routes to the provider with the lowest absolute observed token throughput in the current minute, regardless of any configured limit. Unlike `token_rate`, which scores remaining headroom against a declared per-provider TPM cap, this scores raw observed throughput, so it suits self-hosted vLLM or SGLang pools that do not pre-declare a token cap. Untried providers sort lowest and are explored first. ```yaml routing: strategy: least_token_usage ``` ### prefix_affinity Hashes a stable prefix of the request body to an enabled provider so requests that share a prompt prefix land on the same upstream and reuse its KV cache (vLLM, SGLang). The hash is deterministic and stable across reloads as long as the provider list does not reorder. Falls back to round_robin when no prefix can be extracted. ```yaml routing: strategy: prefix_affinity ``` ### peak_ewma Power-of-two-choices over observed latency: sample two eligible providers and route to the one with the lower recently observed latency. Cuts tail latency under skewed load versus always picking the single lowest-latency provider, which herds traffic. An untried provider is explored first. ```yaml routing: strategy: peak_ewma ``` ### cascade Tries a sequence of `(provider, model)` tiers from cheapest to most expensive. Each tier's response is graded against its `quality_threshold`; a response that is below threshold, empty, or refused retries on the next tier. `max_total_cost` (micro-USD) is an optional cumulative budget cap. Streaming requests dispatch only to the first tier. ```yaml routing: strategy: cascade max_total_cost: 100000 tiers: - provider_id: openai model: gpt-4o-mini quality_threshold: 0.7 - provider_id: openai model: gpt-4o quality_threshold: 0.85 ``` See [examples/ai-cascade-routing](../examples/ai-cascade-routing/sb.yml). ### cost_quality Scores each prompt's difficulty and routes simple prompts to a cheap model and hard prompts to a frontier model, on a single `cost_threshold` dial (`0.0` sends almost everything to the frontier, `1.0` sends almost everything to the cheap model). ```yaml routing: strategy: cost_quality cheap_provider: openai-mini frontier_provider: openai cost_threshold: 0.5 ``` ## Resilience Per-provider circuit breaker, outlier detection, and active health probes layered on top of the routing strategy. Each signal independently ejects a provider; when every provider is ejected, the router falls back to the unfiltered enabled list rather than refusing the request. ```yaml resilience: circuit_breaker: failure_threshold: 5 success_threshold: 2 open_duration_secs: 30 outlier_detection: threshold: 0.5 window_secs: 60 min_requests: 5 ejection_duration_secs: 30 health_check: path: /models interval_secs: 30 timeout_ms: 5000 unhealthy_threshold: 3 healthy_threshold: 2 ``` See [examples/ai-resilience](../examples/ai-resilience/sb.yml). Field reference in [configuration.md#resilience-resilience](configuration.md#resilience-resilience). ## Shadow eval Mirror each request to a second provider concurrently. The primary's response is what the client sees; the shadow body is drained and metrics are emitted at `target=sbproxy_ai_shadow` (status, latency, prompt/completion tokens, finish_reason). Useful for prompt regression checks before swapping a primary model. ```yaml shadow: provider: anthropic sample_rate: 0.1 timeout_ms: 30000 ``` See [examples/ai-shadow](../examples/ai-shadow/sb.yml). ## Proxy-native AI patterns SBproxy is a proxy first, so AI traffic composes with everything else the proxy offers: CEL policies, forward rules, regex guardrails, request modifiers. Patterns that are awkward or impossible to express in a pure AI gateway library: | Pattern | Mechanism | Example | |---------|-----------|---------| | Tenant access control before any AI call | `policies` (CEL expression) | [93-ai-cel-tenant-gate](../examples/ai-cel-tenant-gate/sb.yml) | | Mixed AI + non-AI on one hostname (health probes, docs, model catalog) | `forward_rules` with inline child origins | [94-ai-mixed-traffic](../examples/ai-mixed-traffic/sb.yml) | | Custom DLP beyond built-in PII (codenames, ticket IDs, internal hostnames) | `guardrails.input` with `regex` patterns | [95-ai-regex-dlp](../examples/ai-regex-dlp/sb.yml) | | Topic enforcement (allow-list of approved keywords) | `regex` guardrail with `action: allow` | [95-ai-regex-dlp](../examples/ai-regex-dlp/sb.yml) | CEL policies and request modifiers run before the AI handler dispatches, so a rejection costs no provider tokens. Forward rules dispatch by path, which means health checks and probe traffic can stay on the same hostname without billing a model. Regex guardrails inspect the parsed prompt body and slot in next to PII, injection, jailbreak, and schema guardrails. ## Native format translation Clients always speak the OpenAI chat completions shape; sbproxy rewrites the body, path, and response back to OpenAI shape when the upstream provider speaks a different protocol. | Provider format | Direction | Status | |-----------------|-----------|--------| | OpenAI | pass-through | always | | Anthropic Messages API | bidirectional, non-streaming | shipped | | Anthropic SSE events | streaming | not yet translated, passes through native | | Google Gemini | bidirectional | not yet implemented | | AWS Bedrock | bidirectional | not yet implemented | For Anthropic, the request hoists `system` role messages to the top-level `system` field, defaults `max_tokens` when missing, strips OpenAI-only knobs (`logit_bias`, `n`, `presence_penalty`, `frequency_penalty`, `response_format`, `seed`, `user`), and rewrites the path from `/v1/chat/completions` to `/v1/messages`. The response converts text and tool_use blocks back into the OpenAI `choices[].message.content` and `tool_calls` shape, maps `stop_reason` to `finish_reason`, and renames `usage.input_tokens` / `output_tokens` to `prompt_tokens` / `completion_tokens`. See [examples/ai-claude](../examples/ai-claude/sb.yml) and [providers.md](providers.md). ## Rate limits Apply rate limits per client or globally to control costs and prevent abuse: ```yaml origins: "ai.example.com": action: type: ai_proxy providers: - name: openai api_key: ${OPENAI_API_KEY} models: [gpt-4o-mini] default_model: gpt-4o-mini routing: strategy: round_robin policies: - type: rate_limiting requests_per_minute: 100 ``` Clients exceeding the limit receive a `429 Too Many Requests` response with a `Retry-After` header. ### Per-surface rate limits Per-model and per-tenant rate limits cap each user, key, or model independently. The AI gateway also supports per-surface caps that apply to a classified API surface (chat completions, assistants, image generation, audio speech, ...) so expensive paths can be throttled without affecting cheap ones. ```yaml origins: "ai.example.com": action: type: ai_proxy providers: - name: openai api_key: ${OPENAI_API_KEY} per_surface_rate_limits: image_generation: requests_per_minute: 30 audio_speech: requests_per_minute: 60 chat_completions: requests_per_minute: 600 ``` Keys are the `AiSurface` labels emitted on metrics (`chat_completions`, `models`, `embeddings`, `assistants`, `threads`, `batches`, `fine_tuning`, `files`, `realtime`, `image_generation`, `image_edits`, `image_variations`, `audio_transcription`, `audio_speech`, `moderations`, `reranking`). Surfaces without an entry are uncapped. When the cap fires, the proxy returns 429 before any upstream call. The sliding window is one minute, shared across all configured origins (state is process-global). Audio-seconds-per-hour caps for realtime sessions are reserved for the realtime dispatch phase. ## Guardrails The proxy supports nine guardrail types: `pii`, `injection`, `jailbreak`, `toxicity`, `content_safety`, `schema`, `regex`, `context_poisoning`, and `agent_alignment`. Guardrails run on input (before the provider call) or output (after), and they can block, flag, or rewrite content. See the CEL guardrails section below for inline CEL conditions, and `features.md` for the higher-level configuration of each guardrail type. Input guardrails apply to whichever body field the surface carries user text in: | Surface | Field guarded | |---|---| | `chat_completions`, `assistants`, `threads` | `body["messages"][].content` | | `image_generation`, `image_edits`, `image_variations` | `body["prompt"]` | | `audio_speech` | `body["input"]` | | `reranking` | `body["query"]` | | `moderations` | `body["input"]` | A single guardrail block on the AI handler config covers every supported surface; the proxy picks the right field automatically based on the classified surface. Multipart-bodied surfaces (image edits, image variations, audio transcription) bypass the input-guardrail check today because their bodies are forwarded byte-transparently; output-side scanning for those surfaces is reserved for a follow-up. ### Streaming policy A guardrail is *streaming-safe* when its block decision is stable as soon as the chunk it sees is decided. The proxy classifies the built-in guardrails as follows: | Guardrail | Streaming-safe | Reason | |---|---|---| | `regex` | yes | per-chunk regex match is stable | | `pii` | yes | PII patterns match per-chunk | | `schema` | yes | JSON schema validation is decided on the parsed value | | `context_poisoning` | yes | rule matches are per-message | | `injection` | no | multi-token context windows; partial windows produce false negatives | | `toxicity` | no | full-text classifier; partial-window scores are misleading | | `jailbreak` | no | multi-pattern + multi-token detector | | `content_safety` | no | full-text classifier (self-harm, violence, etc.) | | `agent_alignment` | no | runs on the input body only (it inspects assistant tool_calls); streaming output is not in scope | On the buffered (non-streaming) path the proxy runs every configured output guardrail against the full response. On the streaming output path the proxy runs only the streaming-safe guardrails on each chunk; non-safe guardrails are skipped because evaluating them against a partial window produces both false positives (tripping on benign mid-stream substrings) and false negatives (missing late-stream signal). Input guardrails always run against the full request regardless of `stream`. Operators that want a non-safe guardrail to apply to streaming responses anyway should accept the partial-window risk explicitly and run a second buffered pass once the stream closes; the per-entry `streaming_safe` override surface for that case rides a follow-up. ### Context-poisoning guardrail The `context_poisoning` input guardrail flags untrusted retrieval content that tries to manipulate the model before a downstream tool call. This is the indirect prompt injection vector from Greshake et al. (2023): a RAG pipeline pulls a poisoned page into the model's context, and the model then issues a tool call influenced by that content. The check runs on the full input, including any `role: tool` or `role: function` messages that the AI gateway treats as retrieval content. Findings carry a stable `rule_id` and a confidence weight; the `min_confidence` setting filters out low-weight rules. ```yaml guardrails: input: - type: context_poisoning enabled: true action: deny # log | score | deny (default deny) min_confidence: 0.5 rules: # optional allowlist; omit for all rules - cp_instruction_ignore_previous - cp_tool_call_scaffold - cp_encoded_instruction - cp_conflicting_directive ``` The rule catalogue covers four families: | Family | Sample rule IDs | Detects | |---|---|---| | Instruction-like patterns | `cp_instruction_ignore_previous`, `cp_instruction_you_are_now`, `cp_instruction_system_prompt_leak`, `cp_suspicious_url` | "ignore previous instructions" style payloads, role-swap framings, exfiltration URL shapes | | Tool-call hints | `cp_tool_call_scaffold`, `cp_tool_call_json_shape` | Literal ``, `function_call:`, or JSON tool invocations inside passive content | | Encoded instructions | `cp_encoded_instruction` | Base64 and hex blobs that decode to instruction-like text | | Conflicting directives | `cp_conflicting_directive`, `cp_instruction_imperative_regex` | Imperative second-person language in `role: tool` or `role: function` content | Every hit emits `sbproxy_ai_context_poisoning_findings_total{rule_id, action}`. When `action: deny`, the request is also counted in `sbproxy_ai_context_poisoning_blocked_total` and the proxy returns a 4xx before any upstream call. `action: log` and `action: score` keep the request flowing; they differ only in the metric label so dashboards can separate observability volume from scoring volume. See `examples/ai-context-poisoning/` for a complete sample configuration and curl commands. ### Agent-alignment guardrail The `agent_alignment` input guardrail audits the assistant's `tool_calls` array against operator-declared rules: an allow list of tools the agent is permitted to invoke, an explicit deny list that always trips even when allowed elsewhere, a forbidden-substring scan over the tool arguments, and a per-turn budget on the number of tool calls. The check is the LlamaFirewall (arXiv:2505.03574) "Agent Alignment Check" use case rendered as a deterministic ruleset so the per-request cost is bounded; an LLM-judge advisory variant rides a follow-up and slots into the same configuration. Unlike the other guardrails this one runs against the raw request body so it can read the OpenAI / Anthropic / MCP tool-call shapes; the flat-text view that backs `pii` / `injection` / etc. strips `tool_calls` and would silently miss the goal-divergence cases. ```yaml guardrails: input: - type: agent_alignment enabled: true mode: flag # flag (default, observability only) | block allowed_tools: [search, fetch] denied_tools: [delete_account] forbidden_arg_substrings: - "/etc/passwd" - "AKIA" # leaked AWS-key shapes max_tool_calls_per_turn: 4 ``` `mode: flag` records every violation as a log line + access-log entry but lets the request through; once the operator has tuned the rule lists they flip to `mode: block` so the dispatch loop short-circuits to a 400 on the next violation. Tool calls in any of three shapes are recognised: OpenAI (`tool_calls[*].function.name` + `function.arguments`), Anthropic (`tool_calls[*].name` + `input`), and MCP (`tool_calls[*].tool` or `tool_calls[*].name` + `arguments`). The forbidden-substring scan is case-insensitive against the JSON encoding of whichever argument field is present. See `examples/ai-agent-alignment/` for a runnable configuration that exercises every rule. ## Lua hooks Use Lua scripts for more complex routing logic. Lua hooks run in a sandbox with access to request context variables. Example: route coding questions to Anthropic based on the request path: ```yaml origins: "ai.example.com": action: type: ai_proxy providers: - name: openai api_key: ${OPENAI_API_KEY} models: [gpt-4o-mini] - name: anthropic api_key: ${ANTHROPIC_API_KEY} models: [claude-sonnet-4-20250514] default_model: gpt-4o-mini routing: strategy: round_robin request_modifiers: lua: script: | local path = request.path if string.find(path, "/code") then return { add_headers = { ["X-Preferred-Provider"] = "anthropic" } } end return {} ``` ## CEL guardrails Block or modify AI requests with CEL expressions: ```yaml origins: "ai.example.com": action: type: ai_proxy providers: - name: openai api_key: ${OPENAI_API_KEY} models: [gpt-4o-mini] default_model: gpt-4o-mini routing: strategy: round_robin policies: - type: rate_limiting requests_per_minute: 100 request_modifiers: cel: - expression: > request.headers['x-department'] == '' ? {"set_headers": {"X-Block": "true"}} : {} ``` ## Budgets Set token or dollar caps that apply across a workspace, a single virtual key, an end user, a model, an origin, or a metadata tag. The `budget` block sits under `action` and is parsed by `BudgetConfig` in `crates/sbproxy-ai/src/budget.rs`. ```yaml action: type: ai_proxy budget: on_exceed: downgrade limits: - scope: workspace max_cost_usd: 500 period: monthly - scope: api_key max_tokens: 1000000 period: daily downgrade_to: gpt-4o-mini - scope: user max_cost_usd: 5 period: daily - scope: model max_tokens: 200000 period: daily - scope: origin max_cost_usd: 50 period: daily - scope: tag max_cost_usd: 25 period: monthly ``` ### `budget` fields | Field | Type | Default | Notes | |-------|------|---------|-------| | `limits` | list | `[]` | One or more `BudgetLimit` entries. Each is checked on every request. | | `on_exceed` | enum | `block` | One of `block`, `log`, `downgrade`. Applies to whichever limit fires. | ### `BudgetLimit` fields | Field | Type | Default | Notes | |-------|------|---------|-------| | `scope` | enum | required | One of `workspace`, `api_key`, `user`, `model`, `origin`, `tag`. | | `max_tokens` | u64 | unset | Total prompt + completion tokens allowed for the scope. | | `max_cost_usd` | f64 | unset | Total cost ceiling in USD across all requests in the scope. | | `period` | string | unset | One of `daily`, `weekly`, `monthly`, `total`. Window over which usage accumulates. | | `downgrade_to` | string | unset | Model name routed to when this limit fires and `on_exceed` is `downgrade`. | ### Behaviour notes - A limit fires the first time `usage >= max_tokens` or `usage >= max_cost_usd`. Limits are checked in declaration order and the first match wins. - `on_exceed: log` records a warning and a `sbproxy_ai_budget_utilization_ratio` gauge update, then lets the request through. - `on_exceed: downgrade` swaps the request's model to the firing limit's `downgrade_to` and proceeds. If `downgrade_to` is unset, the request is blocked. - Setting only `max_tokens` and leaving `max_cost_usd` unset (or vice versa) is supported. A limit with neither field is a no-op. - A hierarchical view (`org`, `team`, `project`, `user`, `model` keys with 80% warning band) is exposed to in-process callers via `HierarchicalBudget` in `hierarchical_budget.rs`. There is no top-level YAML knob for it today; it is wired by the runtime when the gateway tracks spend. ## Virtual API keys Issue per-team or per-app keys that the gateway validates locally. Each key can restrict allowed providers and models, set its own request and token rates, carry its own budget ceiling, and tag requests for downstream attribution. The `virtual_keys` list sits under `action` and is parsed by `VirtualKeyConfig` in `crates/sbproxy-ai/src/identity.rs`. ```yaml action: type: ai_proxy virtual_keys: - key: ${TEAM_A_KEY} name: team-a enabled: true allowed_providers: [openai, anthropic] allowed_models: [gpt-4o-mini, claude-3-5-haiku-20241022] blocked_models: [gpt-4-turbo] max_requests_per_minute: 60 max_tokens_per_minute: 200000 budget: max_tokens: 5000000 max_cost_usd: 100 tags: [team-a, beta] ``` ### `virtual_keys[]` fields | Field | Type | Default | Notes | |-------|------|---------|-------| | `key` | string | required | The token clients send. Treat it like a secret and inject via `${VAR}`. | | `name` | string | unset | Human label used in logs and metrics. | | `enabled` | bool | `true` | Disable a key without deleting the entry. | | `allowed_providers` | list of string | `[]` | Empty list allows all configured providers. | | `allowed_models` | list of string | `[]` | Empty list allows all models. Otherwise the request model must match one entry. | | `blocked_models` | list of string | `[]` | Takes precedence over `allowed_models`. A blocked model is rejected even if it appears in the allow list. | | `max_requests_per_minute` | u64 | unset | Per-key RPM cap. The 60-second window starts on the first request and resets after one minute of wall time. | | `max_tokens_per_minute` | u64 | unset | Per-key TPM cap. Tokens are recorded after the response is read. | | `budget` | object | unset | `KeyBudget` with `max_tokens` and `max_cost_usd`. Independent of the global `budget` block. | | `tags` | list of string | `[]` | Free-form labels attached to every request the key authenticates. Surfaced in logs and emitted in the `sbproxy_ai_key_*` metric labels. | Per-key usage shows up in the `sbproxy_ai_key_*` metrics. ## Caching Three independent caches sit in front of providers. Each has its own runtime configuration in `crates/sbproxy-ai/src/`. Hit and miss counts land in `sbproxy_ai_cache_results_total`. ### Exact prompt cache Hashes the request body and serves byte-for-byte hits. Implemented in `prompt_cache.rs`. The cache key is the SHA-256 of the canonicalised JSON `messages` array, so request key ordering does not affect lookups. The module also detects Anthropic's native `cache_control` blocks (top-level `system`, per-message, or per-content-part) and lets those pass through to the upstream provider. The exact-match path is a runtime construct rather than an `action` field today. It is enabled implicitly when the gateway is built with a cache backing store. There are no YAML knobs for the exact prompt cache. ### Semantic cache Stores responses keyed by the SHA-256 of the messages array with TTL and capacity bounds. Implemented in `semantic_cache.rs` as `SemanticCache`. The constructor takes `max_entries: usize` and `ttl_secs: u64`; entries are evicted with an insert-order LRU when the cache is full, and lazily expired on lookup. | Field | Type | Default | Notes | |-------|------|---------|-------| | `max_entries` | usize | constructor arg | Hard cap on cached responses. The oldest insert is evicted on overflow. | | `ttl_secs` | u64 | constructor arg | Seconds before an entry is treated as a miss and removed. | The semantic cache is configured via per-origin `extensions.semantic_cache` rather than `action.semantic_cache`. Example: ```yaml origins: ai.example.com: action: type: ai_proxy providers: [...] extensions: semantic_cache: enabled: true ttl_secs: 1200 key_template: "{embedding_model}:{lsh_bucket}" ``` The `extensions` map is opaque to the OSS config parser; runtime components that recognise the key apply it. ### Idempotency middleware (RFC 8594) Engages on `action: ai_proxy` origins when an `Idempotency-Key` header is present on a POST / PUT / PATCH request. The middleware sits ahead of the upstream provider call: on a cache hit the gateway replays the cached `(status, headers, body)` triple directly to the client with `x-sbproxy-idempotency: HIT` and never contacts the provider, so Stripe-style retries do not double-bill the upstream. On a body conflict the gateway returns 409 `ledger.idempotency_conflict`. On a miss the gateway forwards and records the post-translation OpenAI-shape bytes the client saw so retries replay byte-identical. Per-origin caps (`max_request_body_bytes`, `max_response_body_bytes`, `max_concurrent_buffers`) bound memory and skip caching gracefully when a request exceeds them. Skip reasons stamp on the outgoing response as `x-sbproxy-idempotency: SKIPPED-...` so operators can spot graceful degradation in dashboards. Configuration is identical to general HTTP origins: see the `idempotency:` block reference under [`configuration.md`](configuration.md). v1 limitations: multipart request bodies (audio transcription, image edit / variation, file upload) are not cached, and SSE streaming responses abandon the cache record above the response cap. ## Per-provider limits The proxy reads rate limit headers off provider responses and pre-emptively throttles when remaining capacity falls under a configured fraction. Implemented in `provider_ratelimit.rs` as `ProviderRateLimitTracker`. Recognised response headers (case-insensitive): - `x-ratelimit-remaining-requests`, `x-ratelimit-remaining-tokens` - `x-ratelimit-reset-requests`, `x-ratelimit-reset-tokens` (formats: `1s`, `500ms`, plain seconds) - `retry-after` (plain seconds) - `anthropic-ratelimit-requests-remaining`, `anthropic-ratelimit-tokens-remaining` - `anthropic-ratelimit-requests-reset` The tracker takes a single `throttle_threshold: f64` between 0.0 and 1.0. The implementation throttles when remaining requests fall to or below `floor(1000 * threshold)`, treating 1000 req/min as a baseline. Default: `0.1`, which throttles at 100 remaining requests or fewer. | Field | Type | Default | Notes | |-------|------|---------|-------| | `throttle_threshold` | f64 | `0.1` | Clamped to `[0.0, 1.0]`. Lower values delay throttling until the provider is closer to its hard limit. | Per-provider throttling is a runtime construct. There is no top-level YAML field; the tracker is instantiated alongside the provider pool and updated from every upstream response. For per-model rate limits configurable in YAML, use `model_rate_limits` on the `action` block. The struct is `ModelRateConfig` in `ratelimit.rs`: ```yaml action: type: ai_proxy model_rate_limits: gpt-4o: requests_per_minute: 200 tokens_per_minute: 400000 claude-sonnet-4-20250514: requests_per_minute: 100 tokens_per_minute: 200000 ``` | Field | Type | Default | Notes | |-------|------|---------|-------| | `requests_per_minute` | u64 | unset | Sliding one-minute window cap on requests for the model. | | `tokens_per_minute` | u64 | unset | Sliding one-minute window cap on tokens for the model. | ## Model aliases Map friendly names onto specific provider plus model pairs, with optional deprecation pointers. Implemented in `model_alias.rs` as `ModelAliasRegistry`, with each entry typed as `ModelAlias`. The registry is constructed by the runtime; entries deserialise from YAML or JSON when loaded. ```yaml model_aliases: - alias: fast provider: openai model_id: gpt-4o-mini - alias: smart provider: anthropic model_id: claude-sonnet-4-20250514 - alias: claude-old provider: anthropic model_id: claude-3-opus-20240229 deprecated: true replacement: smart ``` ### `ModelAlias` fields | Field | Type | Default | Notes | |-------|------|---------|-------| | `alias` | string | required | The friendly name clients send. | | `provider` | string | required | Provider name to route to. | | `model_id` | string | required | The model ID actually sent upstream. | | `deprecated` | bool | `false` | When true, a warning is logged on every resolution. | | `replacement` | string | unset | Suggested alias to migrate to. Surfaces in the deprecation log line. | Resolution returns `None` for unknown names so the request falls back to literal model ID matching. Re-registering the same alias overwrites the previous entry. The alias registry is wired by the runtime rather than read off the `action` block. Treat the YAML above as the canonical shape when serialising aliases for code paths that load them. ## Supported endpoints Every inbound request to an `action: ai_proxy` origin is classified into an `AiSurface` by `classify_surface(method, path)` in `crates/sbproxy-ai/src/handler.rs`. The classifier accepts canonical OpenAI paths with optional `/v1` or `/api/v1` prefix and any trailing slash. The surface label appears on the per-surface metrics, on the request tracing span, and on every per-surface decision (rate limit, guardrail extractor, 501 gate). Provider capability is the source of truth for which surfaces a configured provider can serve. The matrix lives in `crates/sbproxy-ai/src/api_routes.rs::provider_supports_surface`. When no configured provider supports the requested surface, the proxy returns **501 Not Implemented** before any upstream call. Universal surfaces (chat completions and models) bypass the gate. Unknown surfaces fall through to the existing dispatch and 404 at the upstream. | Surface label | Method(s) | Path(s) | Providers (today) | |---|---|---|---| | `chat_completions` | POST | `/v1/chat/completions` | All | | `models` | GET | `/v1/models`, `/v1/models/{id}` | All | | `embeddings` | POST | `/v1/embeddings` | OpenAI, Gemini, Cohere | | `assistants` | POST, GET, DELETE | `/v1/assistants[/{id}[/files[/{file_id}]]]` | OpenAI | | `threads` | POST, GET, DELETE | `/v1/threads[/{id}[/messages[/{id}] \| /runs[/{id}[/cancel]]]]`, `/v1/threads/runs` | OpenAI | | `batches` | POST, GET | `/v1/batches[/{id}[/cancel]]` | OpenAI | | `fine_tuning` | POST, GET | `/v1/fine_tuning/jobs[/{id}[/cancel \| /events]]` | OpenAI | | `files` | POST, GET, DELETE | `/v1/files[/{id}[/content]]` | OpenAI | | `realtime` | GET (WebSocket upgrade) | `/v1/realtime` | OpenAI | | `image_generation` | POST | `/v1/images/generations` | OpenAI, Gemini | | `image_edits` | POST (multipart) | `/v1/images/edits` | OpenAI, Gemini | | `image_variations` | POST (multipart) | `/v1/images/variations` | OpenAI, Gemini | | `audio_transcription` | POST (multipart) | `/v1/audio/transcriptions`, `/v1/audio/translations` | OpenAI, Gemini | | `audio_speech` | POST | `/v1/audio/speech` | OpenAI, Gemini | | `moderations` | POST | `/v1/moderations` | OpenAI | | `reranking` | POST | `/v1/rerank`, `/v1/reranking` | Cohere | ### Response shape contract "Supported" in the table above means the gateway accepts the surface and routes it. It does NOT mean the gateway normalises the response. Per-surface translation behaviour: | Surface | Response shape | |---|---| | `chat_completions` | normalised to / from the OpenAI shape on Anthropic and Google (gemini) formats; passthrough on OpenAI-compatible upstreams | | `messages`, `responses` | native-format inbound shims that translate down to the same hub shape as chat completions | | `models` | **passthrough only**: the gateway forwards the upstream's native model-list body unchanged. Clients calling `/v1/models` through a non-OpenAI provider see the upstream's shape, not the OpenAI `{"object": "list", "data": [...]}` envelope | | everything else | passthrough on the providers listed in the table; clients see the upstream's native response shape | The Models passthrough decision is deliberate. OpenAI returns `{"object": "list", "data": [{"id": "...", "owned_by": "..."}]}`; Anthropic returns `{"data": [{"id": "...", "display_name": "..."}], "has_more": false}`; Google's `models.list` returns `{"models": [{"name": "models/...", "displayName": "..."}]}`. A lossy normalisation would conflate these and mislead clients about per-model metadata. Callers that need a unified shape across providers should consume the proxy's own model registry instead of the passthrough. ### Method coverage The gateway accepts any standard HTTP method for any supported surface. GET, POST, PUT, DELETE, PATCH, HEAD, and OPTIONS all dispatch through the same provider-selection and observability surface. Methods other than GET/POST forward via `AiClient::forward_with_method` and do not engage the chat-completions body-parse pipeline (no JSON parsing, no budget enforcement, no input guardrails). Method-aware dispatch is what makes `DELETE /v1/assistants/{id}`, `POST /v1/threads/{id}/runs/{id}/cancel`, and the other non-POST verbs work end-to-end. ### Multipart bodies Image edits, image variations, audio transcription, and audio translation send multipart request bodies. The proxy detects multipart by inspecting the inbound `Content-Type` header; when it starts with `multipart/`, the body is forwarded byte-for-byte via `AiClient::forward_bytes` with the original Content-Type preserved. Provider format translation (Anthropic, etc.) does not run for multipart, since these surfaces are OpenAI-only. ### Per-surface configuration Per-surface knobs live under `per_surface_rate_limits` (see [Per-surface rate limits](#per-surface-rate-limits)) and apply automatically based on the classified surface. Surfaces have no dedicated YAML config block beyond that; they share the top-level `providers`, `routing`, `virtual_keys`, `budget`, `model_rate_limits`, `max_concurrent`, and `guardrails` settings. ### Surfaces marked enterprise-only `reranking` is gated to ship dispatch in the enterprise build. In the OSS build the surface is classified (so observability still tags requests with `surface = "reranking"`) and the 501 gate fires unless an enterprise license check passes. The same surface label and matrix entry exist in both builds. ## Context handling Three modules handle prompts that approach or exceed a model's context window. They are layered: relay carries history across rotations, overflow decides what to do when the next request will not fit, and compress trims when the answer is to keep going with a smaller history. ### Context relay `crates/sbproxy-ai/src/context_relay.rs` is a thread-safe map of session ID to message history. When the router rotates between providers or virtual keys mid-session, it pulls the prior message list out of the relay and replays it to the new provider so the conversation does not reset. Messages are kept as raw `serde_json::Value` so provider-specific shapes survive the round trip. No YAML config: it is internal state used by the router. ### Context overflow `crates/sbproxy-ai/src/context_overflow.rs` ships a registry of context windows for the OpenAI, Anthropic, Gemini, Mistral, and Llama families and decides what to do when a request would overflow. Three actions are available: - `Error`: return a 4xx to the client. - `FallbackToLarger(model)`: resend to a larger-window model named in config. - `Truncate`: drop oldest turns and retry, available through `check_overflow_with_truncate`. The choice is driven by a `context_overflow` block on the AI handler: ```yaml action: type: ai_proxy context_overflow: fallback_model: gpt-4o # used when the current model overflows and gpt-4o has a larger window on_overflow: truncate # error | fallback | truncate ``` If the requested model is not in the registry, overflow checks are skipped (no window to compare against) and the request is forwarded as-is. ### Context compress `crates/sbproxy-ai/src/context_compress.rs` does cost-aware history trimming. `estimate_message_tokens` uses a four-characters-per-token approximation. `trim_to_budget` always keeps the leading system message, then walks remaining messages newest-to-oldest, including each one only if it fits in the remaining token budget, then restores chronological order before returning. This module exposes pure functions; it is invoked by the routing strategy and overflow handler. There is no `context_compress:` YAML block. ## Streaming analytics `crates/sbproxy-ai/src/streaming_analytics.rs` tracks per-stream timing for SSE responses. `StreamTracker` records start time, first-token instant, and last-token instant; from these it computes Time to First Token (`ttft_ms`), Tokens Per Second (`tps`), and average inter-token latency (`avg_itl_ms`). `StreamRegistry` is the global map of in-flight streams keyed by request ID. These values feed the `sbproxy_ai_request_duration_seconds` histogram and request-scoped log records. The module has no YAML config; it is wired in whenever streaming responses are observed. ## Structured output `crates/sbproxy-ai/src/structured_output.rs` validates responses against a JSON Schema. The config struct sits on the AI handler: ```yaml action: type: ai_proxy structured_output: schema: # JSON Schema the response must conform to type: object required: [name, age] properties: name: {type: string} age: {type: integer} retry_on_failure: true # default: false max_retries: 2 # default: 1 ``` When `retry_on_failure` is true, a failed validation triggers a retry with the schema injected into the system prompt via `build_schema_instruction`. `extract_json` strips ` ```json ` and ` ``` ` fences before parsing, so models that wrap output in markdown still validate. Validation is structural: required-field presence and per-property type checks (`string`, `number`, `integer`, `boolean`, `array`, `object`, `null`). Full JSON Schema features such as `$ref` and `oneOf` are not implemented. The validator and the schema-instruction builder are live functions; the wiring that calls them on every chat response is a runtime construct rather than a top-level YAML field. The YAML block above is the shape that ships when a runtime caller threads `StructuredOutputConfig` into the chat handler. Source: `crates/sbproxy-ai/src/structured_output.rs`. ## OpenAI surface-area modules The `sbproxy-ai` crate ships shape definitions and lightweight handlers for the OpenAI surface beyond chat completions: assistants, threads, batch jobs, image generation, audio, fine-tuning, realtime sessions, and structured output. The shapes are stable and round-trip through `serde_json`; the chat-path router (`crates/sbproxy-ai/src/handler.rs:parse_ai_path` and `crates/sbproxy-ai/src/api_routes.rs:parse_endpoint`) recognises a subset (chat, embeddings, models, rerank, moderations, image generation, audio transcription, audio speech) and falls back to `Unknown` for the rest. The remaining shapes are present so plugin authors can build on top of them and so the action config surface is forward-compatible. The subsections below describe what each module contributes today. ### `assistants` Shape definitions for the OpenAI Assistants API. `AssistantHandler::route_request(path, method)` classifies a request into one of: `CreateAssistant`, `ListAssistants`, `GetAssistant(id)`, `CreateThread`, `CreateMessage(thread_id)`, `CreateRun(thread_id)`, `GetRun(thread_id, run_id)`, or `Unknown`. The optional `/v1` prefix is stripped before matching. `AssistantConfig { enabled: bool }` is the on/off shape. ```yaml action: type: ai_proxy providers: [...] # Forward-compatible flag, recognised by the parser but not yet enforced. assistants: enabled: true ``` The router classifier is implemented; routing into the chat dispatcher is not yet wired in the OSS build. Use chat completions for assistant-style flows until the dispatcher lands. Source: `crates/sbproxy-ai/src/assistants.rs:AssistantHandler`. ### `threads` In-memory `ThreadStore` for OpenAI-style threads and their messages. Stores `Thread { id, created_at, metadata }` and ordered `ThreadMessage { id, thread_id, role, content, created_at }`. The store is thread-safe (mutex-backed) and used by the assistants handler for local session continuity. There is no YAML field that selects a backing store today; the in-memory store is the only implementation. Source: `crates/sbproxy-ai/src/threads.rs:ThreadStore`. ### `batch` `BatchJob` shape (id, status, created_at, completed_at, total_requests, completed_requests, failed_requests, metadata) plus a `BatchStore` trait with one implementation, `MemoryBatchStore`. Status lifecycle: `pending`, `in_progress`, `completed`, `failed`, `cancelled`. The store is wired by the runtime when a batch dispatcher is constructed; there is no top-level `batch:` YAML block. Source: `crates/sbproxy-ai/src/batch.rs`. ### `image` Request and response shapes for image generation, edit, and variation. `ImageGenerationRequest { prompt, model, size, n }` and `ImageGenerationResponse { images: Vec }`, where each `ImageData` carries either a `url` or a base-64 `b64_json` payload depending on the provider's `response_format`. `/v1/images/generations` is routed by `api_routes.rs`; the per-call dispatch is built by the runtime. No dedicated YAML knobs. Source: `crates/sbproxy-ai/src/image.rs`. ### `audio` Request and response shapes for audio transcription and speech synthesis. `TranscriptionRequest { file_url, model, language }`, `TranscriptionResponse { text, duration }`, and `SpeechRequest { input, model, voice }`. `/v1/audio/transcriptions` and `/v1/audio/speech` are recognised by `api_routes.rs`. No dedicated YAML knobs; the audio dispatcher reuses the top-level provider list and routing strategy. Source: `crates/sbproxy-ai/src/audio.rs`. ### `finetune` Fine-tuning API classifier. `FinetuneHandler::route_request(path, method)` classifies into `CreateJob`, `ListJobs`, `GetJob(id)`, `CancelJob(id)`, `ListEvents(id)`, or `Unknown`, with the optional `/v1` prefix stripped. `FinetuneConfig { enabled: bool }` is the on/off shape. ```yaml action: type: ai_proxy providers: [...] # Forward-compatible flag, recognised by the parser but not yet enforced. finetune: enabled: true ``` Like `assistants`, the classifier is implemented; routing into the chat dispatcher is not yet wired in the OSS build. Source: `crates/sbproxy-ai/src/finetune.rs:FinetuneHandler`. ### `realtime` Shape definitions and config for OpenAI's Realtime websocket API. `RealtimeConfig { enabled, model }` defaults to `enabled: false` and `model: "gpt-4o-realtime-preview"`. `RealtimeSession { session_id, model, created_at, status }` and `RealtimeEvent { event_type, data }` round-trip through serde. The `/v1/realtime` websocket path is recognised by the proxy but session bridging requires a runtime-level dispatcher; the config shape above is the YAML form that the dispatcher reads. ```yaml action: type: ai_proxy providers: [...] realtime: enabled: true model: gpt-4o-realtime-preview ``` Source: `crates/sbproxy-ai/src/realtime.rs`. ### `structured_output` Already covered above under [Structured output](#structured-output). Shape and validator are live (`extract_json`, `validate_response`, `build_schema_instruction`); the wiring that runs the validator on every chat response is a runtime construct rather than a top-level YAML field. Source: `crates/sbproxy-ai/src/structured_output.rs`. ## Per-request attribution The gateway records provider, model, token counts, and estimated cost for every AI request and exposes them through Prometheus metrics (see below). Direct response headers for these fields are not emitted today. ## Token usage metrics The proxy exposes aggregate AI usage as Prometheus metrics. When `telemetry.bind_port` is configured, the following counters and gauges are available at `/metrics` under the `sbproxy_ai_*` namespace: | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `sbproxy_ai_requests_total` | Counter | `provider`, `model`, `status` | Total AI requests | | `sbproxy_ai_surface_requests_total` | Counter | `surface`, `method` | Total AI requests partitioned by classified surface (chat completions, assistants, image generation, ...) and HTTP method | | `sbproxy_ai_surface_request_duration_seconds` | Histogram | `surface`, `method` | Per-surface request latency. Buckets match `sbproxy_ai_request_duration_seconds` for side-by-side dashboards | | `sbproxy_ai_tokens_total` | Counter | `provider`, `model`, `direction` | Tokens consumed (`direction` is `input` or `output`) | | `sbproxy_ai_cost_dollars_total` | Counter | `provider`, `model` | Estimated cost in USD | | `sbproxy_ai_request_duration_seconds` | Histogram | `provider`, `model` | End-to-end AI request latency | | `sbproxy_ai_failovers_total` | Counter | `from_provider`, `to_provider`, `reason` | Provider failover events | | `sbproxy_ai_guardrail_blocks_total` | Counter | `category` | Guardrail block events (pii, injection, jailbreak, etc.) | | `sbproxy_ai_cache_results_total` | Counter | `provider`, `cache_type`, `result` | AI response cache results (`cache_type` is `exact` or `semantic`, `result` is `hit` or `miss`) | | `sbproxy_ai_budget_utilization_ratio` | Gauge | `scope` | Current budget utilization as a 0 to 1 ratio | | `sbproxy_ai_key_requests_total` | Counter | `virtual_key`, `provider`, `model` | Requests per virtual key | | `sbproxy_ai_key_tokens_total` | Counter | `virtual_key`, `direction` | Tokens per virtual key | | `sbproxy_ai_key_cost_dollars_total` | Counter | `virtual_key` | Cost in USD per virtual key | | `sbproxy_ai_realtime_sessions_active` | Gauge | | Currently open OpenAI Realtime API WebSocket sessions | | `sbproxy_ai_realtime_session_duration_seconds` | Histogram | `provider`, `close_reason` | Wall-clock duration of a Realtime WebSocket session, observed at close. `close_reason` is `client_closed` or `error` | | `sbproxy_ai_realtime_audio_seconds_total` | Counter | `provider`, `direction` | Cumulative audio seconds forwarded over Realtime sessions. Frame-exact accounting requires terminate-and-relay (not on the OSS path); the OSS dispatcher uses session wall-clock as a duration proxy on close | | `sbproxy_ai_realtime_frames_forwarded_total` | Counter | `provider`, `direction`, `kind` | Cumulative frames forwarded over Realtime sessions (`kind` is `text` or `audio`). Reserved for a future enterprise terminate-and-relay path | Use these to build spending dashboards, set budget alerts, and track provider reliability without any application-level instrumentation. ## Dashboards The metrics above can be wired into any Prometheus-compatible dashboard tool. A pre-built JSON for AI gateway health is on the roadmap; for now, point your existing Prometheus or Grafana setup at `/metrics` and chart the counters and histograms listed above. ## Streaming The proxy supports streaming responses. When your client sends a streaming request (e.g. `"stream": true` in the OpenAI API), the proxy: 1. Validates the request (auth, rate limits, guardrails). 2. Picks a provider using the configured routing strategy. 3. Opens a streaming connection to the provider. 4. Forwards SSE chunks to the client as they arrive. 5. Reads token usage from the final chunk and records it to the metrics counters. No special configuration is needed. Streaming works with all routing strategies and all providers. ### Usage extraction Different providers report streaming token counts in different SSE shapes. The streaming relay scans every chunk through a pluggable parser and records the captured tokens against the configured budget scopes when the stream closes. Pick the parser explicitly with `usage_parser`, or leave it at the default `auto` and the proxy resolves it from the upstream URL host, response `Content-Type`, and an optional `X-Provider` response header. | `usage_parser` | Wire format | Notes | |---|---|---| | `openai` | `data: {..., "usage": {...}}\n\n` terminal frame | OpenAI, Azure OpenAI, OpenAI-compatible relays | | `anthropic` | `event: message_start` plus `event: message_delta` with `usage` | Max-of across both events; `input_tokens` from start, `output_tokens` from delta | | `vertex` | `data: {..., "usageMetadata": {...}}` on every chunk | Vertex AI / Gemini; values grow monotonically | | `bedrock` | `data: {"bytes": ""}` envelope | Decodes the envelope and delegates to the Anthropic parser for the inner stream | | `cohere` | `data: {..., "event_type": "stream-end", ..., "billed_units": {...}}` | Reads `response.meta.billed_units` or `meta.billed_units` | | `ollama` | NDJSON: `{..., "done": true, "prompt_eval_count": N, "eval_count": M}\n` | Line-delimited JSON instead of SSE | | `generic` | Best-effort across all of the above | Default fallback when `auto` cannot match a known upstream | | `auto` | Resolved at request time | See order below | | `none` | Skip parsing | Disables streaming budget recording for this origin | `auto` resolves in this order: 1. Response `X-Provider` header (operator-controlled). 2. Upstream URL host: `*.openai.com` plus `*.openai.azure.com` -> `openai`, `*.anthropic.com` -> `anthropic`, `*.googleapis.com` or any host containing `aiplatform` -> `vertex`, `bedrock-*` or `*.amazonaws.com` -> `bedrock`, `*.cohere.ai` or `*.cohere.com` -> `cohere`, `localhost:11434` or any host containing `ollama` -> `ollama`. 3. Response `Content-Type`: `application/x-ndjson` or `application/jsonl` -> `ollama`. 4. Fall back to `generic`. Unknown values warn once and fall back to `generic` so a typo never silently disables budget recording. ```yaml origins: "ai.example.com": action: type: ai_proxy usage_parser: anthropic # or auto, openai, vertex, bedrock, cohere, ollama, generic, none providers: - name: anthropic api_key: ${ANTHROPIC_API_KEY} base_url: https://api.anthropic.com/v1 ``` ```python from openai import OpenAI client = OpenAI( base_url="http://localhost:8080/v1", api_key="unused", default_headers={"Host": "ai.example.com"}, ) stream = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": "Write a haiku about proxies."}], stream=True, ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") ``` ## Realtime The AI gateway routes OpenAI Realtime API WebSocket sessions through the same dispatch path as the rest of the surface set. A client opens `GET /v1/realtime` with `Upgrade: websocket` against the proxy, the gateway runs its standard pre-upgrade gating, picks an enabled provider that supports Realtime (today: OpenAI), and lets Pingora forward bytes between the client and the provider after the `101 Switching Protocols` handshake. What runs before the upgrade: - Surface classification stamps `ai.surface = "realtime"` on the request span and the access log. - The 501 capability gate fires if no configured provider supports Realtime. - The per-surface rate limit (`per_surface_rate_limits.realtime`) fires before the upgrade is attempted, returning 429 when the cap is hit. - The active-sessions gauge `sbproxy_ai_realtime_sessions_active` ticks up. What runs during the session: - Pingora forwards WebSocket frames byte-transparently. The proxy does not inspect individual frames (per-frame guardrails are not on the OSS path; they would require terminate-and-relay, which is reserved for an enterprise build). What runs at session close (the `logging` hook): - The active-sessions gauge ticks down. - `sbproxy_ai_realtime_session_duration_seconds` records the wall-clock session lifetime. - An `AiBillingEvent` fires with `usage = AudioSeconds { seconds = wall_clock }` so operators see realtime usage on the standard billing event bus. Cost is reported as 0.0 in OSS until the realtime rate card lands in the pricing helper; downstream consumers can compute cost from the duration. ```yaml origins: "ai.example.com": action: type: ai_proxy providers: - name: openai api_key: ${OPENAI_API_KEY} base_url: https://api.openai.com/v1 models: [gpt-4o-realtime-preview] per_surface_rate_limits: realtime: requests_per_minute: 30 ``` A client connects with the standard OpenAI Realtime URL, replacing the OpenAI host with the proxy host: ```python import websocket # websocket-client ws = websocket.create_connection( "wss://ai.example.com/v1/realtime?model=gpt-4o-realtime-preview", header=[ "Authorization: Bearer ", "OpenAI-Beta: realtime=v1", ], ) ``` The proxy enforces gating before the upgrade and emits a session-end billing event after close; per-frame inspection is reserved for an enterprise terminate-and-relay path that would land alongside a dedicated Pingora `Service` impl. ## Full example An AI gateway with two providers, fallback routing, API key auth, and a rate limit: ```yaml proxy: http_bind_port: 8080 origins: "ai.example.com": action: type: ai_proxy providers: - name: openai api_key: ${OPENAI_API_KEY} priority: 1 models: [gpt-4o, gpt-4o-mini, gpt-4-turbo] - name: anthropic api_key: ${ANTHROPIC_API_KEY} priority: 2 models: [claude-sonnet-4-20250514, claude-3-5-haiku-20241022] default_model: gpt-4o-mini routing: strategy: fallback_chain authentication: type: api_key api_keys: - ${AI_GATEWAY_KEY} policies: - type: rate_limiting requests_per_minute: 200 ``` ## Hot-reload behavior A `SIGHUP`, an admin-API reload, or an in-place edit of `sb.yml` (when the file watcher is on) refreshes the AI gateway without restarting the proxy. The provider catalog under `proxy.ai_providers_file`, the live `AiClient`, and the compiled handler chain are rebuilt and swapped atomically; in-flight requests continue against their existing snapshot until they finish, and subsequent requests pick up the new state. Adding a provider, rotating a `default_base_url`, or fixing a typo in `ai_providers.yml` no longer requires shedding connections. The process-wide AI budget tracker is deliberately left alone on reload. Budget windows are wall-clock-relative (daily, monthly, custom), so the per-scope token and cost accumulators must outlive a config reload. Wiping the tracker would silently roll counters back to zero and let already-spent budget through a second time. To clear a budget intentionally, restart the process or call the per-scope reset path on the admin surface. ## See also - [providers.md](providers.md) - full provider table and per-provider model lists. - [scripting.md](scripting.md) - CEL and Lua reference, including AI selector and guardrail variables. - [configuration.md](configuration.md) - general configuration model, origin schema, and the full `sb.yml` field reference. - [features.md](features.md) - higher-level overview of features including guardrails. ================================================================ # docs/ai-lb-benchmark.md ================================================================ ## AI router load-balancing benchmark *Last modified: 2026-05-31* The AI router supports several load-balancing strategies (round-robin, peak-EWMA, least-connections, least-token-usage, prefix-affinity, and others). This page compares them on a synthetic, skewed workload and publishes the P50 / P95 / P99 / P99.9 numbers an operator can compare against when picking a strategy. ## What the bench measures The harness at `sbproxy-bench/harness/ai_lb_strategy/` drives a synthetic, skewed workload through the live `sbproxy_ai::routing::Router` for each declared strategy, then prints a P50 / P95 / P99 / P99.9 / max comparison table plus a Jain fairness index and (for `prefix_affinity`) a KV-cache hit rate. The bench is in-process, not HTTP-driven. The variable under test is the LB algorithm; an HTTP backend would have to fake the KV-cache and provider-latency skews anyway, so the in-process driver lets the bench measure the router without confounds from the proxy substrate. ## The workload Three orthogonal skews, each tunable via CLI: | Skew | Default | Models the real-world case where ... | | --- | --- | --- | | Provider latency heterogeneity | one slow provider out of four at 5x base latency | A vLLM pool has one warm-but-overloaded worker | | Prompt-prefix Zipf | s = 1.1 over 100 prefixes | Chat traffic where some system prompts repeat | | Tenant token-burst Zipf | s = 1.0 over 10 tenants | A small fleet with one hot tenant emitting most tokens | ## Simulated latency model ```text observed_ms = base_ms * provider_factor - kv_cache_bonus_ms if prefix was seen on this provider in the last 64 requests + queue_term_ms (in-flight count * 5ms) + lognormal noise (mu=0, sigma=0.3) ``` The lognormal noise creates the heavy tail that makes P99 the right comparison metric. The KV-cache bonus is what lets `prefix_affinity` show its value in simulation; without it the strategy is indistinguishable from round-robin. These assumptions are not validated against a real vLLM pool. A follow-up bench against a Docker vLLM fixture is tracked under the bench harness's README. ## Reproducing the run ```bash cd sbproxy-bench/harness/ai_lb_strategy SBPROXY_BENCH=1 cargo run --release -- --total-requests 50000 ``` The `SBPROXY_BENCH=1` env-var gate is enforced in `main.rs` so an accidental local invocation cannot saturate a core. CI does not run this; it is a lab-only artifact. ## What to expect Under the default skewed workload: - **`round_robin`** posts the worst P99 because it does not avoid the slow provider. Per-provider request distribution is uniform (Jain ~1.0) which looks fair but produces the tail. - **`peak_ewma`** posts the best P99 of the latency-aware strategies. Two-of-N sampling avoids the herd-on-one-fast-provider pathology that `lowest_latency` falls into. - **`prefix_affinity`** posts the best P99 when the Zipf parameter is at least ~1.0 (default 1.1). The KV-cache hit rate column shows why: the same prefix lands on the same provider often enough to reuse a warm cache. Lower the prefix-Zipf to 0.0 (uniform) and the strategy degenerates toward round-robin's number. - **`least_token_usage`** posts a fairness Jain index above 0.95 on the tenant-skewed workload because it spreads the hot tenant's tokens evenly across providers. - **`least_connections`** behaves similarly to `peak_ewma` here because the queue term in the latency model is what its in-flight signal tracks. In a real vLLM pool the queue term is more pronounced and the two diverge. The README at `sbproxy-bench/harness/ai_lb_strategy/README.md` is the canonical reference for the flags and the model assumptions. ## Caveats 1. The KV-cache bonus and lognormal-noise sigma are unvalidated against production traffic. The doc calls them out so a reader can challenge them. 2. The bench writes to `Router::record_latency` with `Relaxed` atomic semantics. Two strategies (`lowest_latency`, `peak_ewma`) read the same field as ground truth. The most recent write wins; under the bench's single-threaded sample loop this is deterministic, but under multi-threaded production traffic the reads see slightly stale numbers. 3. `prefix_affinity` looks bad with uniform prompts. The default prefix-Zipf of 1.1 ships the strategy in its strong configuration; operators considering it should match against their own traffic shape before turning it on. 4. The bench does not measure cost. Strategies with cost in their name (`cost_optimized`, `cascade`) are not in the comparison table because P99 is the wrong axis for them. ## Related - `crates/sbproxy-ai/src/routing.rs` is where the strategies live. - `BENCHMARK.md` at the repo root covers workspace-level proxy overhead numbers; this page is the AI router-specific axis. - The `sbproxy_ai_lb_decisions_total{strategy, provider}` metric emitted by the router lets you reproduce the per-provider distribution table on a live deployment. ================================================================ # docs/architecture.md ================================================================ ## SBproxy architecture and deployment guide *Last modified: 2026-06-08* This document covers the internal architecture of SBproxy, the request lifecycle, the plugin system, the AI gateway, caching, events, and common deployment topologies. --- ## 1. Overview SBproxy is a single static binary with no required external runtime dependencies. It is written in Rust and ships as a self-contained executable. There is no JVM, no Python interpreter, no Node.js runtime, and no shared library requirement beyond libc (or none at all when built with `musl` or `--target *-unknown-linux-musl`). The proxy is built on Cloudflare's [Pingora](https://github.com/cloudflare/pingora) framework. Pingora supplies the tokio runtime, listener management, HTTP/1.1, HTTP/2 (HTTP/3 is currently disabled pending native Pingora HTTP/3), TLS termination, and a phase-based callback model for the request pipeline. SBproxy layers its host router, compiled origin pipeline, plugin registry, and hot-reload machinery on top of those primitives. The plugin system is modeled on Caddy's module pattern. Every extensible component type (action handlers, auth providers, policy evaluators, transforms, middleware) registers itself at compile time through the `inventory` crate. The proxy crate is the binary composition root; pulling a feature in or out is a matter of which workspace crates are linked into the final executable. Key properties: - Single binary. One file to copy, one process to manage. mimalloc is the global allocator, typically 5 to 10 percent faster than glibc's allocator under contention. - Zero-dependency startup. Runs without Redis, a database, or a sidecar. External integrations (Redis cache, webhook events, OTEL tracing) are opt-in and fail gracefully when unavailable. - Hot reload. Config changes are applied without restarting. The watcher detects file changes and atomically swaps the compiled origin map via `arc-swap`. In-flight requests finish on their snapshot; new requests pick up the new map immediately. - Embeddable. The `sbproxy-core` crate exposes a small `run` / `shutdown` API for use as a library inside another Rust binary. --- ## 2. Workspace layout ``` sbproxy/ crates/ sbproxy/ - Binary entry point. Wires modules and starts the server. sbproxy-core/ - Pingora server, host router, phase dispatch, hot reload, hook registry. sbproxy-config/ - YAML/JSON schema, type definitions, parsing, compilation (RawOrigin -> CompiledOrigin). sbproxy-plugin/ - Plugin trait definitions and `inventory` registry (PUBLIC API for third-party modules). sbproxy-modules/ - Built-in modules: action/ - proxy, loadbalancer, redirect, static, echo, mock, beacon, websocket, grpc, ai_proxy, mcp, noop, storage auth/ - api_key, basic_auth, bearer, jwt, digest, forward_auth, jwks policy/ - rate_limit, ip_filter, waf, ddos, csrf, security_headers, request_limit, assertion, sri, cel transform/- json, json_projection, html, markdown, template, lua, javascript, css, encoding, format_convert, normalize, payload_limit, replace_strings, html_to_markdown, sse_chunking, noop sbproxy-ai/ - AI gateway: 66 native providers, routing, guardrails, budget enforcement, key vault, memory store, MCP federation. sbproxy-extension/ - Scripting and extension runtimes: cel/ - cel-rust expression evaluation lua/ - mlua + Luau scripting wasm/ - wasmtime sandboxed plugins js/ - QuickJS via rquickjs mcp/ - Model Context Protocol server sbproxy-middleware/ - CORS, HSTS, compression (gzip/brotli/zstd), header modifiers, error pages, forward rules. sbproxy-cache/ - Response cache trait, memory backend, pluggable store interface, cache key partitioning. sbproxy-security/ - Cross-cutting security primitives: crypto helpers, host filter (bloom + HashMap lookup), client-IP extraction with trusted-proxy CIDRs, PII redactor, SSRF guard, plus optional headless-browser detection and bot/agent verification helpers. The WAF, DDoS, CSRF, and security_headers policies live in sbproxy-modules/src/policy/. sbproxy-tls/ - TLS termination via rustls 0.23 with the `ring` crypto provider, ACME auto-cert (Let's Encrypt), HTTP/3 listener wiring (currently disabled pending native Pingora HTTP/3), OCSP stapling. sbproxy-transport/ - Outbound transport: retry with exponential backoff, request coalescing, hedged requests, circuit breaker, upstream rate limiting. sbproxy-vault/ - Secret management. Encrypted local vault, rotation hooks, secret reference resolution. sbproxy-observe/ - tracing-based structured logging, Prometheus metrics, typed event bus. sbproxy-platform/ - Infrastructure primitives: KV store abstraction, DNS cache, messenger, health tracking, circuit breaker. sbproxy-httpkit/ - HTTP utilities: client IP extraction, host:port splitting, buffer pools, body limit readers. examples/ - Working sb.yml examples per feature docs/ - Documentation e2e/ - End-to-end test harness schemas/ - JSON schema for sb.yml ``` The dependency graph is enforced by the workspace structure. `sbproxy-plugin` is the public API surface and depends only on `sbproxy-config`. Built-in modules depend on `sbproxy-plugin`, never on `sbproxy-core`. Third-party plugins built against the published `sbproxy-plugin` crate are link-compatible with the binary. --- ## 3. Request pipeline Every inbound request passes through the following stages in order. A rejection at any stage short-circuits the rest and writes the error response immediately. The pipeline is implemented as a sequence of `ProxyHttp` callbacks; the per-request work happens inside those callbacks rather than in a separate dispatcher. ``` request_filter: 1. Trace context extract (W3C / B3) 2. ACME HTTP-01 challenge interception 3. /health and /metrics short-circuit 4. Hostname extraction and origin resolution (bloom + HashMap) 5. Force-SSL redirect 6. Allowed methods check 7. CORS preflight handling 8. Bot detection 9. Threat protection (JSON body checks) 10. Authentication 11. Policy enforcement (rate limit, IP filter, WAF, CSRF, DDoS, CEL, ...) 12. Response cache lookup 13. on_request callbacks 14. Forward rule matching 15. Non-proxy action dispatch (static, redirect, echo, mock, beacon, AI, ...) upstream_peer: Resolve upstream peer for proxy actions. upstream_request_filter: URL rewrite, query injection, method override, body replacement, request header modifiers, distributed tracing headers. response_filter: CORS, HSTS, security headers, response modifiers, forward rule echo, rate limit headers, Alt-Svc, CSRF cookie, session cookie, on_response callbacks, traceparent echo. response_body_filter: Response cache write on miss, transform pipeline, fallback body swap. logging: Metrics emission, access log, event publication. ``` Action types dispatched inside `request_filter` step 15 (or via `upstream_peer` for `proxy` actions): `proxy`, `load_balancer`, `ai_proxy`, `static`, `mock`, `redirect`, `echo`, `beacon`, `noop`, `websocket`, `grpc`. Built-in actions are enum variants; the compiler turns the dispatch site into a branch-predicted match. Third-party plugins use `Plugin(Box)` and pay one indirect call per request. --- ## 4. Plugin system All extensible component types use a single pattern: register at compile time via the `inventory` crate, keyed by the type string that appears in YAML configs. ### Registry traits (sbproxy-plugin) ```rust,no_run pub trait ActionHandler: Send + Sync + 'static { fn handler_type(&self) -> &'static str; fn handle( &self, req: &mut http::Request, ctx: &mut dyn std::any::Any, ) -> Pin> + Send + '_>>; } // Same shape for AuthProvider, PolicyEnforcer, TransformHandler, RequestEnricher. ``` Factory closures construct concrete handlers from a `serde_json::Value` config blob and return `Box`. The factory itself is the registration unit. ### Registration pattern ```rust,no_run inventory::submit! { PluginRegistration { kind: PluginKind::Policy, name: "rate_limit_custom", factory: |raw| { let cfg: MyConfig = serde_json::from_value(raw)?; Ok(Box::new(MyPolicy::new(cfg))) }, } } ``` `inventory::submit!` writes a static descriptor into a link-section that the binary enumerates at startup. There is no central wiring file. Adding a policy is: 1. Implement `PolicyEnforcer` for the new struct. 2. Drop the file in `sbproxy-modules/src/policy/`. 3. Add an `inventory::submit!` block. 4. Add `pub mod my_policy;` to the parent `mod.rs`. The compile_config step in `sbproxy-config` looks up factories by name from the inventory registry. Built-in modules are exposed as enum variants (`Policy::RateLimit(...)`, `Policy::Plugin(Box)`); the compiler prefers the enum variant when available for cache locality and branch prediction, falling back to dynamic dispatch for third-party names. ### Built-in vs plugin dispatch Built-in modules are enum variants. Match dispatch over enums is a single branch-predicted jump that the compiler typically inlines. Third-party plugins go through `Box` for dynamic dispatch. That costs one indirect call per phase but keeps the plugin ABI stable across compiler versions. ```rust,no_run enum Action { Proxy(ProxyAction), Static(StaticAction), Redirect(RedirectAction), LoadBalancer(LoadBalancerAction), AiProxy(AiProxyAction), // ... built-ins Plugin(Box), // third-party } ``` ### Thread safety `inventory` is populated at link time before `main` runs. All registry reads happen after that, against an immutable slice. There is no lock on the hot path: the compiled origin holds direct `Arc` pointers to the handler instances, so per-request dispatch is a pointer dereference followed by a virtual or static call. --- ## 5. Config architecture ### Pure types layer (sbproxy-config) The `sbproxy-config` crate contains type definitions, serde derives, and the compilation step. Its workspace dependencies are limited to `sbproxy-plugin`, `sbproxy-httpkit`, and `sbproxy-platform` (for the `KVStore` trait used by `l2_store`). It does not pull in Pingora, the module set, or any networking runtime. The serde tags in `sbproxy-config` are the canonical field names. When in doubt about a YAML field name, read the struct definition, not prose documentation. ### Config lifecycle ``` sb.yml (YAML file or API-delivered bytes) | v serde_yaml::from_str -> ConfigFile { proxy, origins, secrets, ... } | v validate_schema() - Reject unknown fields, type-check. | v resolve_secrets() - Expand ${secret.X} references via the vault. | v apply_inheritance() - Parent / child origin merge. | v compile_config() - For each origin: build CompiledOrigin { action, auths: SmallVec<[Auth; 2]>, policies: SmallVec<[Policy; 4]>, request_modifiers, response_modifiers, transforms, hooks, cache, error_pages, ... } | v build host_map: bloom filter + HashMap of hostname -> origin index | v Arc - Immutable snapshot. | v ArcSwap::store() - Atomic publish. Old readers continue against the previous snapshot. ``` ### Parent/child origin inheritance Origins can declare a `parent` field that references another origin by name. The child inherits all fields from the parent and can override any of them. This is resolved at parse time, not at request time. The resulting child config is fully materialized before compilation. ### Hot reload The config watcher (`sbproxy-core::reload`) uses the `notify` crate to detect file changes. On change it re-parses, re-resolves, and recompiles the config. The new `Arc` is published via `ArcSwap::store`. Requests that already loaded a snapshot continue with it; new requests pick up the new pointer on their next snapshot load. Old snapshots are dropped when their refcount hits zero, after all in-flight requests using them complete. There is no global lock and no quiescence period. --- ## 6. AI gateway architecture The `ai_proxy` action delegates entirely to the `sbproxy-ai` crate. It presents an OpenAI-compatible API surface and routes requests to any supported LLM provider. ``` Client (OpenAI-compatible request) | v +------------------+ | AI Handler | Validates request format. Extracts consumer identity. | | Checks per-key concurrency limits. +------------------+ | v +------------------+ | Guardrails | Pre-request evaluation. CEL/Lua selectors determine | (pre-request) | which guardrail rules apply. Rules may block, flag, | | or redact content before the request leaves the proxy. | | Built-in types: PII, prompt injection, toxicity, | | jailbreak, content safety, JSON schema, regex. +------------------+ | v +------------------+ | Router | Selects provider and model based on routing strategy. | | Strategies: round_robin, weighted, fallback_chain, | | random, lowest_latency, least_connections, | | cost_optimized, token_rate, sticky. | | Context window validation: token count checked against | | provider model limits. Oversized requests routed to a | | model with a larger context window or rejected. +------------------+ | v +------------------+ | Budget Enforcer | Hierarchical scopes (workspace, key, route). | | Action on exceed: log, downgrade to cheaper model, | | or hard-block with 402. +------------------+ | v +------------------+ | Provider | Translates normalized request to provider-specific | | wire format. Injects API key from vault. +------------------+ | v LLM API (OpenAI / Anthropic / Gemini / Bedrock / ...) | v +------------------+ | Response Handler | For streaming: SSE proxy with buffered guardrail | | evaluation on accumulated chunks. Token usage and | | cost updated atomically. Conversation memory written. | | For non-streaming: full response passed to post-request | | guardrails before returning to client. +------------------+ | v Client ``` ### Provider registry Providers register through the same `inventory` mechanism as actions. Each provider implements `sbproxy_ai::providers::Provider`. The provider list is also driven by `providers.yaml`, which maps provider names to their base URLs and supported models. Rust implementations handle request serialization and response normalization. 66 native providers ship in-tree alongside a native Anthropic translator. The `model` field passes straight through to the upstream, so the gateway reaches 200+ models without enumerating them. Direct adapters include OpenAI, Anthropic, Google Gemini, Azure OpenAI, AWS Bedrock, Cohere, Mistral, DeepSeek, xAI / Grok, Perplexity, Groq, Together AI, Fireworks AI, OpenRouter, Ollama, vLLM, AWS SageMaker, Databricks, Oracle Cloud GenAI, IBM Watsonx, plus three local-runtime adapters (Hugging Face TGI, LM Studio, llama.cpp). ### Routing strategies | Strategy | Behavior | |---------------------|----------| | `round_robin` | Rotate through providers in order. | | `weighted` | Distribute proportional to provider weight. | | `fallback_chain` | Try providers in priority order, falling back on failure. | | `random` | Uniform random pick. | | `lowest_latency` | Provider with the lowest observed latency (microseconds, atomic counter). | | `least_connections` | Provider with the fewest in-flight requests. | | `cost_optimized` | Lowest score of `connections * 1000 + weight`. Utilization dominates; weight breaks ties in favor of cheaper providers. | | `token_rate` | Provider with the most remaining tokens-per-minute headroom. | | `sticky` | Pin a session key to one provider. Falls back to round robin without a session key. | | `race` | Fan out to every healthy provider in parallel; first non-error response wins, the rest are cancelled. | ### Streaming The SSE proxy reads chunks from the upstream provider and forwards them to the client immediately. For guardrail evaluation, the proxy keeps a rolling window of the last N tokens. When the stream completes, a final guardrail pass runs against the accumulated content. If a violation shows up mid-stream, the proxy injects a stop chunk and closes the stream. ### Streaming cache recorder hook `StreamCacheRecorderHook` (in `sbproxy-core/src/hooks.rs`) is the OSS-side seam that lets an enterprise build record streaming AI responses for later replay. It mirrors the shape of `SemanticLookupHook` and `StreamSafetyHook`: a trait, a per-session context type (`StreamCacheCtx`), and a unit slot on the `Hooks` bundle that defaults to `None`. The hook lives in OSS because the emit point is on the SSE forwarding hot path. Threading chunks across a crate boundary at runtime would be expensive; landing the trait in `sbproxy-core` keeps the per-chunk fan-out cheap and lets the enterprise impl plug in through `EnterpriseStartupHook::on_startup` exactly like every other slot. When the slot is wired, `relay_ai_stream` calls `start_session` once at stream start, forwards a copy of every chunk into the returned channel, and emits exactly one terminal `StreamCacheEvent::End { complete }`. The `complete` flag is true on a clean end-of-stream and false on every other terminal condition (client cancel, upstream error, mid-stream abort). A `StreamCacheGuard` RAII wrapper owns this terminal-event invariant: `guard.finish()` sends `complete: true`, and the guard's `Drop` impl sends `complete: false` if `finish` was never called. What stays out of OSS: caching policy decisions (deterministic tool calls only, image data by reference only), replay pacing (`as_fast_as_possible` vs `natural`), eviction, and persistence. The OSS proxy passes the AI handler's `semantic_cache.streaming` config block through verbatim as a `serde_json::Value` so the enterprise recorder reads whatever shape it expects without OSS validating those fields. The enterprise crate fills the slot from its `EnterpriseStartupHook::on_startup` implementation. ### MCP federation `sbproxy-extension::mcp` implements a Model Context Protocol server. Tools from upstream MCP endpoints can be federated and exposed as a single combined tool surface to clients. Tool calls are routed to the registered upstream by name, with optional auth injection. --- ## 7. Event system SBproxy uses two event mechanisms with different scopes and semantics. ### Internal bus (sbproxy-observe::events) High-throughput, in-process publish/subscribe. Components call `events::emit(SystemEvent { ... })`. Subscribers register for specific event type strings. Used for: - Circuit breaker state transitions. - Config hot-reload completion. - Buffer overflow warnings. - Rate limit threshold crossings. - Workspace quota alerts. Events carry a `workspace_id` field. Per-workspace bounded queues (backed by `sbproxy-platform::messenger` with a 10k-entry cap) prevent one active workspace from starving event delivery to others. The bus is implemented over tokio broadcast channels plus per-subscriber filter predicates. ### Public bus The `EventBus` trait is exposed to external consumers via the embedding API. The default implementation is a no-op. Three built-in subscriber types ship with the binary: - log subscriber: writes events as structured JSON via `tracing`. - webhook subscriber: POSTs event payloads to a configurable HTTPS endpoint with HMAC signing. - prometheus subscriber: increments labeled counters for each event type. ### Event filtering Subscribers declare a filter predicate at registration time. The bus evaluates predicates before delivering the event, so filtered subscribers never receive irrelevant events. The filter is evaluated inline (no spawn per delivery in the common case). --- ## 8. Caching architecture ### Response cache The response cache sits inside the request pipeline at two points: before the action handler (cache hit check) and after the action handler (cache write on miss). It is keyed by a signature derived from the request method, URL, selected request headers, and optionally the request body hash. Configurable per origin: - `ttl` - Time-to-live for cached entries. - `stale_while_revalidate` - Serve stale content while a background refresh runs. - `vary` - List of request headers to include in the cache key. - `methods` - Which HTTP methods are eligible for caching (default: GET, HEAD). ### Store backends | Backend | Use case | |-----------|----------| | `memory` | Single-instance deployments. LRU eviction. No persistence. | | `file` | Survives restarts. Suitable for low-traffic origins with slow upstreams. | | `memcached` | Distributed cache via memcached protocol. | | `redis` | Shared cache across multiple proxy instances. Requires Redis 6+. JSON serialization with TTL. Circuit breaker on Redis failures. | The `Cacher` trait is the pluggable surface; new backends are added without touching the pipeline. ### Object cache Separate from the response cache. Stores arbitrary objects (compiled CEL programs, parsed Lua scripts, provider capability metadata). Backed by the same store interface. TTL and LRU eviction policy are configured independently. ### Cache key partitioning Keys are namespaced as `workspace_id:config_id:hostname:signature`. This prevents cross-tenant collisions when multiple origins share a backend store. A test-mode fallback omits the workspace and config prefix for isolation in unit tests. --- ## 9. Observability The observability stack has three components: Prometheus metrics, OpenTelemetry tracing, and structured logging via `tracing`. ### Prometheus metrics When `telemetry.bind_port` is configured, SBproxy runs a dedicated HTTP server that exposes a `/metrics` endpoint in Prometheus exposition format. Metric names share a single `sbproxy_*` namespace. Core HTTP counters include `sbproxy_requests_total`, `sbproxy_request_duration_seconds`, `sbproxy_errors_total`, and `sbproxy_active_connections`. AI gateway metrics carry `sbproxy_ai_*`. Per-origin breakdowns use `sbproxy_origin_*` variants. Auth, policy, cache, and circuit breaker counters follow the same convention. ### Grafana dashboards Two Grafana dashboards ship in `crates/sbproxy-observe/dashboards/`: - `proxy-overview.json` - Request rates, latency, active connections, cache hit ratio, error breakdown. - `mesh-overview.json` - Per-origin and per-edge topology view. Pre-built Prometheus alert rules are not bundled today; build your own against the `sbproxy_*` metric names. ### Structured logging Logging uses the `tracing` crate. `release_max_level_info` is set at the workspace level, which compile-strips `debug!` and `trace!` calls from release builds entirely. On hot paths the macro arguments are eliminated rather than evaluated and filtered at runtime. ### Distributed tracing Distributed tracing extracts W3C Trace Context (`traceparent` / `tracestate`) and B3 single / multi-header formats, generates a child span ID for each upstream call, and echoes the propagation headers back to the downstream client. Full OTLP export to an external collector is wireframed in `sbproxy-observe::export::otlp_grpc` but not yet shipped; the runtime emits structured logs and Prometheus counters today. --- ## 10. Deployment topologies ### Single instance (simplest) ``` Internet | v [ sbproxy ] <-- single binary, one process | v [ Upstream services / APIs ] ``` One process, one config file. TLS handled by SBproxy via ACME (Let's Encrypt). Fine for internal tools, development environments, and low-traffic production services. ### Behind a load balancer (horizontal scaling) ``` Internet | v [ Load Balancer ] (e.g., AWS ALB, Nginx, HAProxy) | | v v [ sbproxy ] [ sbproxy ] (2+ instances, same sb.yml) | | v v [ Upstream services / APIs ] ``` For shared cache and session state, configure the `redis` store backend. All instances connect to the same Redis. TLS is terminated at the load balancer. ### Kubernetes with Ingress ``` Internet | v [ Ingress Controller ] (nginx, traefik, etc.) | v [ sbproxy Service ] (ClusterIP or NodePort) / | \ v v v [pod] [pod] [pod] (3+ replicas, Deployment) | v [ Upstream Services ] (other Deployments or external APIs) ``` Sample topology: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: sbproxy spec: replicas: 3 template: spec: containers: - name: sbproxy image: sbproxy:latest args: ["--config", "/config/sb.yml"] ports: - containerPort: 8080 readinessProbe: httpGet: path: /health port: 8080 livenessProbe: httpGet: path: /health port: 8080 volumeMounts: - name: config mountPath: /config volumes: - name: config configMap: name: sbproxy-config ``` Config is supplied via a ConfigMap. The hot-reload watcher detects the kubelet's atomic symlink swap when the ConfigMap updates. ### Docker Compose (dev and test) ``` Browser / curl | v [ sbproxy ] (port 8080) | +---> [ mock-api ] (local upstream for testing) | +---> [ redis ] (shared cache for multi-instance testing) ``` Sample `docker-compose.yml` fragment: ```yaml services: sbproxy: image: sbproxy:latest ports: - "8080:8080" volumes: - ./sb.yml:/config/sb.yml:ro command: ["--config", "/config/sb.yml"] depends_on: - redis redis: image: redis:7-alpine ports: - "6379:6379" ``` --- ## 11. Performance characteristics ### Compiled pipeline, not interpreted The biggest win in the request path is that auth chains, policy chains, modifier chains, and the action handler are compiled exactly once per origin and stored as inline collections of trait objects (or enum variants for built-ins). A request through a compiled pipeline is a slice iteration with no map lookups, no JSON re-parsing, and no config re-reads. ### Per-request allocation budget The goal is near-zero heap allocations on the hot path for a proxy-type request: - Per-request state lives in a `bumpalo` arena that resets after the response is written. Many small allocations become a single bump-pointer increment. - `bytes::Bytes` and `BytesMut` carry request and response bodies, avoiding copies as data moves through pipeline phases. - `compact_str::CompactString` keeps short strings (hostnames, IDs, header names) inline on the stack without heap allocation. - `smallvec::SmallVec<[T; N]>` keeps policies, transforms, and modifiers inline; most origins have 1 to 3 of each. - The compiled pipeline itself allocates nothing at call time. ### Connection pooling and HTTP/2 Pingora maintains a connection pool per upstream peer with tuned idle connection limits. HTTP/2 multiplexing is enabled for upstreams that negotiate it via ALPN. Connection reuse eliminates TCP and TLS setup cost for repeated requests to the same upstream. Pingora is production-tested at Cloudflare scale; SBproxy inherits its IO model directly. ### DNS cache `sbproxy-platform::dns` wraps the system resolver with an LRU cache. Cache entries are keyed by hostname and carry a configurable TTL (default: 30 seconds). Lookups are O(1). Eviction uses a doubly-linked list to maintain LRU order without O(n) scans. This matters most for AI proxy routes, which resolve provider hostnames on every request. ### Bloom filter for hostname pre-check The host router maintains an in-memory bloom filter over all configured hostnames. On each request, the filter is checked before any HashMap lookup. Requests for unconfigured hostnames (scanners, bots, misconfigurations) are rejected in sub-microsecond time without touching the HashMap. ### Sharded counters for hot state Subsystems that track per-consumer or per-origin state (rate limiters, AI session counters) shard their state across N buckets based on a hash of the key. Each shard uses `parking_lot::Mutex` or atomic counters. That cuts lock contention by a factor of N under concurrent load from many distinct keys. The rate limiter also has atomic-only fast paths when the bucket has clear capacity. ### Lock-free config reads `arc-swap` provides atomic pointer swap with no locking on the read side. Every request loads the current `Arc` once, which is a single atomic read plus a refcount increment. Hot reload publishes a new pointer; in-flight requests continue against their existing snapshot until they complete and drop their `Arc`. ### Circuit breaker design Each upstream has a circuit breaker backed by atomic compare-and-swap operations. The open / half-open / closed state transition uses a single atomic int. Only one probe request is allowed through per recovery cycle. All other requests during the open state fail fast without acquiring any lock or making any network call. ### Compiler optimizations Release builds use `lto = "fat"`, `codegen-units = 1`, and `panic = "abort"`. mimalloc replaces the system allocator. `tracing`'s `release_max_level_info` feature compile-strips all debug and trace logging from the binary. ### Observed overhead Under typical workloads (no Lua, no CEL, no response transforms), the proxy adds well under 1 millisecond of overhead at p99 to end-to-end request latency. The dominant cost is the upstream network round-trip. Microbenchmarks for static and echo actions clear 100k requests per second on a single core; full-pipeline scenarios with auth, rate limiting, CORS, and HSTS sustain 80k or more. For benchmark methodology, scenario definitions, and how to reproduce these numbers, see [performance.md](performance.md). For feature-by-feature comparisons against other proxies and AI gateways, see [comparison.md](comparison.md). For the YAML schema reference, see [configuration.md](configuration.md). ================================================================ # docs/audit-log.md ================================================================ ## Audit log *Last modified: 2026-05-04* Every state-mutating endpoint in SBproxy emits one audit envelope. The envelope is typed and append-only. This guide covers what gets audited, the schema, the `target_kind` JSON discriminator note, and the structured-log audit sink that ships with the OSS distribution. The OSS surface emits the envelope through the structured-log audit sink so every deployment gets an audit trail. Durable persistence (Postgres, S3, hash-chained verification) lives in the commercial distribution and is out of scope for this repo. ## What is audited Audit emission is on **writes** by default. Every mutating handler emits one envelope per call: agent registration / approval / revocation, key rotation, registry edit, policy edit, login, logout. Reads are audited only when: 1. The read targets the audit log itself (export, verify). The auditor must be auditable. 2. The read targets secret material (key-management endpoints, even when the response redacts the secret). 3. The read is a bulk-export endpoint. Routine reads (list agents, get balance) are not audited; they live in the access log and the request-event stream. Adding read-audit to a routine endpoint requires an ADR amendment because the cardinality cost is high. ## Envelope schema Every event is an `AdminAuditEvent`. Wire format is JSON; field order is significant only for canonical hashing. | Field | Type | Required | Notes | |---|---|---|---| | `event_id` | ULID (string) | yes | Generated at emission. Lexicographically time-sortable. | | `schema_version` | u16 | yes | Currently `0`. | | `ts` | RFC 3339 UTC | yes | Wall-clock time at emission. | | `tenant_id` | string | yes | `default` in OSS. | | `subject` | tagged enum | yes | Who initiated the action. See subjects below. | | `action` | enum | yes | What was done. Closed enum with an `Other(String)` escape hatch. | | `target` | tagged enum | yes | What was acted on. See targets below. | | `before` | JSON value | optional | Pre-mutation snapshot, redacted. `None` on pure-read operations. | | `after` | JSON value | optional | Post-mutation snapshot, redacted. `None` on failed mutations. | | `reason` | string | optional | Operator justification. Capped at 4 KiB; over-cap truncates with `...[truncated]`. Not redacted. | | `result` | tagged enum | yes | Outcome: `Success`, `Failure { error_code, error_message }`, `Denied { reason }`. | | `request_id` | ULID | yes | Correlation: the in-flight HTTP request. | | `trace_id` | string (32 hex) | yes | Correlation: OTel trace id. Empty string when no trace context. | | `span_id` | string (16 hex) | yes | Correlation: OTel span id. | | `ip` | IpAddr | yes | Caller IP, post-trusted-proxy resolution. | | `user_agent` | string | optional | Capped at 512 bytes. | | `chain_position` | object | optional | Reserved for future hash-chained log support. Always `None` in OSS. | ### Subjects ```rust,no_run pub enum AuditSubject { User { user_id: String, session_id: Option }, Service { principal_id: String }, Agent { agent_id: String, agent_class: Option }, System { component: String }, } ``` `User` is a portal-authenticated human. `Service` is CI or internal automation. `Agent` is a registered agent acting on its own behalf. `System` is the subject of last resort and SHOULD be rare; config reload and scheduled jobs use it. ### Actions Closed enum. Adding a new variant is an ADR amendment. The current set: `Create`, `Update`, `Delete`, `Read`, `Approve`, `Revoke`, `RotateKey`, `Disable`, `Enable`, `Export`, `Import`, `Login`, `Logout`, `PolicyEdit`, `Other(String)`. `Other(String)` is the escape hatch for variants not yet hoisted into the closed enum; persistent uses require an ADR amendment to add a proper variant. ### Targets ```rust,no_run pub enum AuditTarget { Agent { agent_id: String }, RegistryEntry { feed: String, entry_id: String }, Key { kind: KeyKind, key_id: String }, Policy { policy_path: String }, Origin { hostname: String }, User { user_id: String }, Tenant { tenant_id: String }, Config { path: String }, AuditLog, Other { kind: String, id: String }, } ``` `KeyKind` is closed: `OutboundWebhook`, `RegistryFeed`, `Tls`, `Tenant`. ## JSON discriminator note: `target_kind` `AuditTarget` serializes with an external tag named `target_kind`, **not** the serde default `kind`. The rename avoids a field collision: the `Other { kind, id }` variant carries its own `kind` field, and the default tag would silently overwrite it. The wire format looks like this: ```json {"target_kind": "registry_entry", "feed": "agents", "entry_id": "openai-gptbot"} {"target_kind": "other", "kind": "rate-limit", "id": "rl_us_east_1"} ``` Verifier CLIs and replay tooling MUST read the discriminator from `target_kind`. The trailing `kind` inside the `Other` variant is opaque payload. ## Append-only contract The storage backend MUST reject updates and deletes. The contract is enforced at the trait level: ```rust,no_run #[async_trait::async_trait] pub trait Emitter: Send + Sync { async fn emit(&self, event: AdminAuditEvent) -> Result; async fn read_range( &self, from: chrono::DateTime, to: chrono::DateTime, ) -> Result, AuditError>; // No update(), no delete(). Compile-time enforcement. } ``` A refactor that wants to mutate prior events would have to add a method to the trait, which is an ADR-amendment-level change. PII deletion (GDPR Article 17, CCPA right-to-delete) is handled by tombstoning, not by mutating the audit log. A separate `audit_tombstones` table records the deletion request, and the verifier CLI redacts matching subjects on read. ## Adapters ### In-memory Used for tests. Append to a `Vec`; no removal API. ```rust,no_run use sbproxy_audit::{InMemoryEmitter, AdminAuditEvent}; use std::sync::Arc; let emitter = Arc::new(InMemoryEmitter::default()); emitter.emit(event).await?; let range = emitter.read_range(from, to).await?; ``` ### Structured log The default OSS sink writes envelopes to the structured log stream so every deployment gets an audit trail. Pair it with whatever log shipper you already run. ## EmitterMiddleware A Tower / Axum `Layer` wraps every state-mutating handler. The middleware: 1. Captures envelope context up front (`request_id`, `trace_id`, `span_id`, caller IP, User-Agent, subject). 2. Runs the handler. 3. Pulls the `AuditDescriptor` the handler attached to the response extensions (action, target, before, after, optional reason). 4. Builds the envelope, applies the length caps, redacts `before` and `after` per the internal profile, and emits. ```rust,no_run use axum::Router; use sbproxy_audit::{AuditLayer, EmitterArc, InMemoryEmitter}; use std::sync::Arc; let emitter: EmitterArc = Arc::new(InMemoryEmitter::default()); let app: Router = Router::new() .route("/agents/:id/approve", axum::routing::post(approve_handler)) .layer(AuditLayer::new(emitter, "tenant_42")); ``` State-mutating handlers opt in by implementing `Auditable`: ```rust,no_run use sbproxy_audit::{ AuditAction, AuditDescriptor, AuditTarget, Auditable, }; impl Auditable for ApproveHandler { fn audit_action(&self) -> AuditAction { AuditAction::Approve } fn audit_target(&self, req: &axum::extract::Request) -> AuditTarget { AuditTarget::Agent { agent_id: extract_agent_id(req) } } fn audit_snapshot(&self, req: &axum::extract::Request) -> Option { Some(snapshot_agent_state(req)) } } ``` A clippy lint and a CI grep ensure every mutating handler is wrapped or wears an explicit `#[allow(audit_required)]` with a comment. ### Failure handling Audit emission failure does not fail the underlying request. The handler succeeds even if the audit append fails; the failure pages on `SLO-AUDIT-WRITE` so durable audit gets restored. The OSS sink logs and drops on emit failure. ## See also - [observability.md](observability.md) - audit metrics (`sbproxy_audit_emit_total`), the `SLO-AUDIT-WRITE` page tier, and the audit-log Grafana dashboard. ================================================================ # docs/auth-oidc.md ================================================================ ## OIDC Relying-Party login *Last modified: 2026-06-03* The `oidc` auth provider turns SBproxy into an OpenID Connect Relying Party. Unlike the `jwt` provider, which only validates a bearer JWT that the caller already holds, this provider drives the full authorization-code + PKCE login dance: it redirects an unauthenticated caller to the IdP, exchanges the returned code for an ID token, validates the token, and mints a sealed session cookie. Subsequent requests authenticate from the cookie until the session expires. This is the "put SSO in front of an app that has none" use case that operators reach for with oauth2-proxy, Pomerium, or Cloudflare Access. SBproxy ships it as a configuration auth provider; no separate sidecar needed. ## Quick start ```yaml origins: "app.example.com": action: type: proxy url: http://upstream-app:3000 auth: type: oidc authorization_endpoint: https://idp.example.com/authorize token_endpoint: https://idp.example.com/oauth/token jwks_uri: https://idp.example.com/.well-known/jwks.json issuer: https://idp.example.com/ client_id: sbproxy-app-example-com client_secret: vault://idp/client_secret cookie_secret: vault://oidc/cookie_secret scope: "openid email profile" ``` The minimum fields are the four IdP endpoints (`authorization_endpoint`, `token_endpoint`, `jwks_uri`, `issuer`), the OAuth `client_id` and `client_secret`, and a `cookie_secret` used to seal the session cookie. Everything else has a sensible default. A runnable example lives at [`examples/oidc/`](../examples/oidc/) with a mock IdP shape and the curl invocations to walk through. ## Flow 1. The browser requests a protected origin without a session cookie. 2. SBproxy mints a transaction cookie (sealed PKCE verifier + state + nonce, TTL `tx_ttl_secs`) and 302's the browser to `authorization_endpoint?response_type=code&client_id=...&code_challenge=...&state=...&nonce=...&scope=...&redirect_uri=https://app.example.com/oidc/callback`. 3. The IdP authenticates the user and 302's back to `https://app.example.com/oidc/callback?code=...&state=...`. 4. The `/oidc/callback` handler (a synthetic endpoint mounted by the OIDC provider, the same shape as MCP's well-known endpoints) unseals the transaction cookie, verifies the `state` matches, POSTs to `token_endpoint` with the `code` and the PKCE `code_verifier`, validates the returned ID token against `issuer` + `client_id` + `nonce`, mints a sealed session cookie (TTL `session_ttl_secs`), and 302's the browser back to the originally-requested URL. 5. Subsequent requests carry the session cookie; the proxy decrypts and the caller is treated as authenticated. All cookies use the `__Host-` prefix per RFC 6265bis (forces `Secure` + `Path=/` + no `Domain`), so the cookie-tossing attack against the session secret is closed. ## Configuration reference | Field | Type | Default | Description | |---|---|---|---| | `authorization_endpoint` | URL | (required) | IdP's authorization endpoint. | | `token_endpoint` | URL | (required) | IdP's token endpoint. The callback POSTs `code` + `code_verifier` here. | | `jwks_uri` | URL | (required) | IdP's JWKS endpoint. Fetched through the same `JwksCache` the `jwt` provider uses, so the keys are cached across origins. | | `issuer` | URL | (required) | Expected `iss` on the ID token. Pinned by config so a rogue token from a different IdP (even one signed by a key pulled from `jwks_uri`) is rejected. | | `client_id` | string | (required) | OAuth client ID. Sent on the auth redirect and matched against the ID token `aud`. | | `client_secret` | string | (required) | OAuth client secret. Sent over Basic on the token-endpoint POST. Supports `vault://` references. | | `cookie_secret` | string | (required) | 32+ byte secret used as the HKDF IKM for the session + transaction cookie keys. Supports `vault://`. Rotating this invalidates every outstanding session and tx cookie. | | `redirect_path` | path | `/oidc/callback` | Path the IdP redirects back to. Must be one of the URIs you registered with the IdP under `redirect_uris`. | | `logout_path` | path | `/oidc/logout` | Path that triggers RP-initiated logout. | | `end_session_endpoint` | URL | unset | IdP's `end_session_endpoint`. When set, `/oidc/logout` deletes the session cookie and 302's to the OP so the IdP terminates its own session too. When unset, `/oidc/logout` only deletes the cookie and 302's to `post_logout_redirect_default`. | | `userinfo_endpoint` | URL | unset | IdP's userinfo endpoint. When set, the callback handler calls userinfo after the token exchange and projects the resulting claims as trust headers on the request to the upstream. | | `post_logout_redirect_default` | path or URL | `/` | Where to send the browser after a logout completes if the caller did not supply (or did not allowlist) a `post_logout_redirect_uri`. | | `post_logout_redirect_allowlist` | list of URLs | `[]` | Permitted values for the `post_logout_redirect_uri` query parameter on `/oidc/logout`. Without this gate the endpoint becomes an open-redirect. Match is verbatim. | | `scope` | string | `openid` | Space-separated OIDC scope list. Minimum is `openid` (the scope that produces an ID token); add `email profile groups` etc. as needed. | | `session_ttl_secs` | integer | `3600` | Session cookie TTL in seconds. | | `tx_ttl_secs` | integer | `300` | Transaction cookie TTL in seconds. Should comfortably exceed the operator's expected time between auth redirect and callback redirect; a stale tx cookie aborts the login. | | `session_cookie_name` | string | `__Host-sbproxy_session` | Name of the session cookie. The `__Host-` prefix forces `Secure` + `Path=/` + no `Domain`. | | `tx_cookie_name` | string | `__Host-sbproxy_oidc_tx` | Name of the transaction cookie. | | `attrs` | block | `{}` | Provider-level attribution metadata stamped onto the resolved `Principal` on a successful OIDC session validation. Same shape as the other auth providers. | ## Trust-header injection (optional) When `userinfo_endpoint` is set, the callback handler: 1. Calls the userinfo endpoint with the access token from the token exchange. 2. Projects the returned claims through `userinfo::trust_headers_from_claims`. 3. Stashes the projection in the sealed session cookie. On every subsequent request, the request-time auth check replays the trust headers onto the upstream request. Downstream policies (for example the `object_authz` BOLA + BFLA policy) see the verified subject and groups without an additional round trip. The headers stamped are: | Header | Source claim | |---|---| | `X-Auth-Subject` | `sub` | | `X-Auth-Email` | `email` (when present and `email_verified` is `true`) | | `X-Auth-Name` | `name` (when present) | | `X-Auth-Groups` | `groups` (comma-joined when array-shaped) | Upstreams MUST be configured to trust these headers only from the proxy (e.g. via mTLS or a tight network boundary); the proxy strips inbound copies of these headers from the client before adding its own so a malicious client cannot inject identity. ## Logout Send the browser to `logout_path` (default `/oidc/logout`). The handler: 1. Deletes the session cookie. 2. If `end_session_endpoint` is set, 302's the browser to the IdP so the OP terminates its own session. 3. Otherwise, 302's the browser to `post_logout_redirect_default` (or, if the caller supplied a `post_logout_redirect_uri` query parameter that appears in `post_logout_redirect_allowlist`, honours that value verbatim). The allowlist is the open-redirect gate. Without it, leaving the endpoint to honour arbitrary query parameters is unsafe. ## Discovery Today the IdP endpoints are explicit config fields. The OIDC discovery document at `/.well-known/openid-configuration` is supported as an optional discovery-time fetch: when an operator points the provider at a discovery URL (a follow-up PR2), the proxy can populate `authorization_endpoint`, `token_endpoint`, `jwks_uri`, and `end_session_endpoint` from the fetched document instead of from explicit config. Until that lands, populate the endpoints by hand from the IdP's discovery document. ## Session storage Default is **stateless encrypted cookie**: the session claims travel in the cookie body, sealed with the per-origin cookie key. No proxy-side state, no Redis. The cookie size grows with the projected trust headers, so keep the trust-header projection narrow. For long-lived sessions or for sessions that need server-side revocation, the `oidc::store` helpers offer a server-side session-store hook (KV-backed) that operators can wire under the existing `kv` storage. The default is stateless because the cookie shape covers the common case and avoids the operational cost of a session store. ## Relationship to the other auth providers | Provider | Validates | Issues | Drives a login flow | |---|---|---|---| | `noop` | nothing | nothing | no | | `api_key`, `basic_auth`, `bearer`, `digest` | per-credential lookup | no | no | | `jwt` | bearer JWT (issuer / audience / signature) | no | no | | `forward_auth` | delegates to an external authorizer | no | no | | `oidc` (this provider) | session cookie + ID token | session cookie | **yes** | The `oidc` provider shares the JWKS cache with `jwt` so two origins backed by the same IdP do not duplicate key fetches. Operators that want to layer "validate a bearer JWT issued by a different system" on top of "log in via OIDC" can combine `oidc` here with `jwt` on a different origin in the same config; the providers are independent. ## What's not in this provider * **Discovery-document auto-population** of the four endpoint fields. Tracked as a follow-up; today the operator pastes the values from the IdP's published `.well-known/openid-configuration`. * **Refresh-token rotation.** The session TTL bounds the time between IdP round-trips. A follow-up adds rotating refresh tokens behind a server-side session store. * **DPoP-bound sessions.** The session cookie today is a sealed bearer; DPoP binding to a client-held key is a follow-up. * **MFA enforcement / step-up.** The provider honours whatever the IdP does on the auth side; in-proxy step-up is not in scope. ## See also - [Example: `examples/oidc/`](../examples/oidc/) - [`configuration.md`](configuration.md) for the auth-provider registry surface. ================================================================ # docs/build.md ================================================================ ## Build pipeline *Last modified: 2026-04-30* How the proxy container images are built, what stays warm between runs, and what the expected wall-clock numbers are. Companion to `docs/architecture.md` (request pipeline) and the workspace `CLAUDE.md` (pre-commit local loop). ## Container image layout Two Dockerfiles live at the repo root and share the same layered cargo-chef layout: | File | Purpose | Consumer | |---|---|---| | `Dockerfile.cloudbuild` | Cloud Build / GCR amd64 image. | `gcloud builds submit`; bench loadtest stack. | | `Dockerfile.ci` | Kind-based smoke-test image. | `make k8s-operator-smoke`. | Both files have six stages: 1. **chef-base**: `rust:1.94-bookworm` plus the apt deps (`pkg-config`, `libclang-dev`, `build-essential`, `cmake`, `perl`) plus a pinned `cargo-chef@0.1.71`. Reused by every later Rust stage. 2. **planner**: copies the workspace, runs `cargo chef prepare`, emits `recipe.json`. The recipe captures every `Cargo.toml` and `Cargo.lock` digest in the workspace; nothing under `crates/*/src/` affects it. 3. **cacher**: `cargo chef cook --profile release-fast --bin sbproxy --recipe-path recipe.json`. Compiles every dependency from crates.io. This is the layer the warm-rebuild path reuses. 4. **builder**: copies `/src/target` from cacher, then the workspace source, then runs `cargo build --profile release-fast --bin sbproxy --locked`. The dep `target/` from the cacher stage is the entire reason this step does not have to recompile crates like `pingora`, `aws-lc-sys`, or `tokio` again. 5. **cert-gen** (cloudbuild only): self-signed loadtest cert. Production deploys mount real certs over `/etc/sbproxy/` at runtime. 6. **runtime**: `gcr.io/distroless/cc-debian12`. Carries the binary and (cloudbuild) the loadtest cert pair. ## Build-time numbers Cold = empty BuildKit cache (`docker buildx prune -f` first). Warm = touch a file under `crates/sbproxy/src/` and rebuild without clearing the cache. | Build | Before chef | After chef | |---|---|---| | Cold (Cloud Build amd64) | ~12 min | ~3-4 min | | Warm (only first-party source changed) | ~12 min (no caching) | <90s | The warm path's win comes from the `cacher` layer: as long as `recipe.json` is byte-identical to the previous build, Docker short-circuits stages 1-3 and only re-runs stages 4 + 6. The Dockerfiles default to `CARGO_PROFILE=release-fast`, which inherits the production release settings but disables fat LTO and raises `codegen-units` for lower link time and memory. Pass `--build-arg CARGO_PROFILE=release` when you intentionally want the full production release profile inside these Dockerfiles. The cold path's win comes from BuildKit `--mount=type=cache` on `/usr/local/cargo/{registry,git}`: even when the layer cache is cold (e.g. a fresh Cloud Build worker), the cargo registry tarballs are re-used across builds of the same Cloud Build trigger. ## BuildKit requirement Both Dockerfiles use the cache-mount syntax (`RUN --mount=type=cache,...`). That syntax is BuildKit-only. - Local: `export DOCKER_BUILDKIT=1` or use `docker buildx build`. - Cloud Build: builders that consume these Dockerfiles must set `DOCKER_BUILDKIT=1` in the build step env, or use a `docker buildx build` invocation. Cloud Build's standard `gcr.io/cloud-builders/docker` step honors `DOCKER_BUILDKIT=1`. If a build step ever drops back to the legacy builder, the `--mount=type=cache` directives silently no-op; the build still succeeds, just slower. ## Validating a build The fast smoke test, locally: ```bash DOCKER_BUILDKIT=1 docker build \ -f Dockerfile.cloudbuild \ --target builder \ -t sbproxy:builder-smoke . ``` The `--target builder` short-circuits before the runtime stage so the test does not pay for the cert-gen + distroless copy. To validate the runtime image: ```bash DOCKER_BUILDKIT=1 docker build -f Dockerfile.cloudbuild -t sbproxy:rt . docker run --rm sbproxy:rt --version ``` ## Warm-path verification To prove the chef layer is doing its job, after a cold build, touch a file under `crates/sbproxy/src/`: ```bash touch crates/sbproxy/src/main.rs DOCKER_BUILDKIT=1 docker build -f Dockerfile.cloudbuild --target builder -t sbproxy:warm . ``` The output should show stages `chef-base`, `planner`, and `cacher` all `CACHED`, and only `builder` running. Wall-clock time on a modern amd64 worker should be under 90s. ## Troubleshooting - **The cacher stage rebuilds every time.** Some change touched a `Cargo.toml` or `Cargo.lock` (added a dep, bumped a version, changed a feature flag). The recipe digest is keyed on those files; the cacher stage cooks fresh. - **`cargo build` in the builder stage refuses to use the cooked artifacts.** Symptom: stage 4 takes ~12 min, ignoring the COPY from cacher. Most likely cause: `--locked` and a stale `Cargo.lock` in cacher's COPY. Re-run `cargo update` and rebuild. - **OOM on Cloud Build.** Set `machineType` on the build step to `E2_HIGHCPU_8` or higher; the chef cacher stage holds the full `target/` of cooked deps in memory while linking. ================================================================ # docs/bulk-redirects.md ================================================================ ## Bulk redirects *Last modified: 2026-04-27* The `redirect` action accepts a list of source-to-destination rows in addition to (or instead of) a single `url:`. Each origin owns its own list. The proxy compiles the rows once at config-load time into an O(1) lookup table keyed on the request path; runtime cost is one hash hit on the redirect dispatch path. ## Sources | `bulk_list.type` | What it loads | |------------------|---------------| | `inline` | YAML rows embedded directly in the config under `rows:`. | | `file` | A local file. CSV when the path ends in `.csv`, YAML otherwise. | | `url` | An HTTPS URL fetched once at startup. CSV/YAML by URL extension or explicit `format:`. The proxy refuses HTTP because list contents drive 30x responses. | ```yaml origins: "marketing.local": action: type: redirect status_code: 301 preserve_query: true bulk_list: type: file path: /etc/sbproxy/marketing-redirects.csv ``` ## Row shape CSV columns: `from,to[,status]`. Lines starting with `#` and blank lines are ignored. A leading row whose first column is the literal `from` is treated as a header. ```csv from,to,status /old/about,/about,301 /old/help,/help # status defaults to the action's status_code /blog/2023,https://blog.example.com/2023,308 ``` YAML or inline: ```yaml bulk_list: type: inline rows: - from: /category/legacy to: /category/2024 status: 308 - from: /docs/v1 to: https://docs.example.com/v2 preserve_query: false # override per row ``` ## Lookup semantics - Exact-match on the request path. Wildcards and prefix matching are not supported; use the existing `forward_rules` for those. - A row's `status` and `preserve_query` default to the action's values when omitted; per-row overrides win when set. - Unmapped paths fall through to the action's `url:`. When `url:` is empty, the proxy returns `404`. ## Per-origin isolation Lists never cross origins. Two origins can declare lists with overlapping paths and no row leaks; each origin's compiled table is scoped to its hostname. ## Reload The list reloads on the next config swap. There is no per-row hot reload; redeploy the config to pick up new rows. URL-backed lists re-fetch on each config compile. ## Performance A 100k-row CSV compiles in well under a second on a warm cache and serves redirects in tens of nanoseconds per request (HashMap lookup on a `String` key). Cap the list length at the size your operators can audit. ## See also - [configuration.md](configuration.md#redirect) - full action schema. - `examples/bulk-redirects/` - runnable CSV + inline example. ================================================================ # docs/cache-reserve.md ================================================================ ## Cache Reserve *Last modified: 2026-04-27* Cache Reserve is a long-tail cold tier sitting under the per-origin response cache. Items evicted from the hot cache are admitted into the reserve subject to a sample rate and size threshold; on a hot miss the proxy consults the reserve before falling through to origin and promotes the entry back into the hot tier on hit. The OSS package ships three reserve backends out of the box (memory, filesystem, redis) plus the [`CacheReserveBackend`](#backend-trait) trait that enterprise builds extend with an S3 + KMS implementation. ## Configuration Cache Reserve is configured at the top level of `sb.yml`. It applies to every origin whose `response_cache.enabled` is true. ```yaml proxy: http_bind_port: 8080 cache_reserve: enabled: true backend: type: filesystem path: /var/lib/sbproxy/reserve sample_rate: 0.1 # mirror 10% of hot-cache writes min_ttl: 3600 # only items with TTL >= 1 hour are admitted max_size_bytes: 1048576 # skip entries above 1 MiB origins: "api.example.com": action: { type: proxy, url: "https://upstream.example.com" } response_cache: enabled: true ttl: 7200 cacheable_status: [200] ``` ### Backends | `type` | Required fields | Notes | |--------|-----------------|-------| | `memory` | none | In-process map. For tests and ephemeral single-replica setups; nothing survives a restart. | | `filesystem` | `path` | One body file plus a sidecar metadata JSON per key, fanned out by SHA-256 hash. Survives restarts. | | `redis` | `redis_url`, optional `key_prefix` | Connection pooling via `ConnectionManager`. Entries self-evict on the server side via `PEXPIREAT`. | Enterprise builds register additional types (e.g. `s3`) through the `CacheReserveBackend` trait. The OSS pipeline ignores unknown types with a warning so the enterprise startup hook can swap in its own implementation. ### Admission filter | Field | Default | Behaviour | |-------|---------|-----------| | `sample_rate` | `0.1` | Fraction of hot-cache writes mirrored into the reserve. Use a low rate when the reserve is on a paid object store. | | `min_ttl` | `3600` | Skip entries whose TTL is below this (seconds). Items that won't outlive a typical hot eviction window aren't worth carrying. | | `max_size_bytes` | `1048576` | Skip oversize objects. `0` disables the cap. | The filter runs before any reserve I/O happens so a misconfigured admission window doesn't show up as a reserve write spike. ## Request flow 1. Hot cache lookup runs first. 2. On a hot miss, the proxy consults the reserve. A reserve hit replays the body to the client with `x-sbproxy-cache: HIT-RESERVE` and promotes the entry back into the hot tier so subsequent reads stay hot. 3. On a hot miss + reserve miss, the request goes to origin as normal. 4. On the response path, every cacheable upstream reply lands in the hot tier; the reserve admits a sampled subset that passes the TTL and size filters. 5. When a hot entry's TTL is exhausted (and it's outside any SWR window), the entry is mirrored to the reserve before being deleted from the hot tier so the long-tail content gets a second life. 6. `POST` / `PUT` / `PATCH` / `DELETE` invalidations evict the no-Vary canonical reserve key alongside the hot-tier prefix sweep. Vary-based variants in the reserve must wait for natural expiry; the trait surface is intentionally narrow so backends like S3 don't need to scan keys. ## Backend trait The integration point for cold-tier backends is the async [`CacheReserveBackend`](../crates/sbproxy-cache/src/reserve/mod.rs) trait. Enterprise builds ship their own `impl CacheReserveBackend` (S3 + KMS, GCS, Azure Blob) without re-vendoring the OSS data plane. ```rust,no_run use async_trait::async_trait; use bytes::Bytes; use std::time::SystemTime; use sbproxy_cache::{CacheReserveBackend, ReserveMetadata}; pub struct MyBackend { /* ... */ } #[async_trait] impl CacheReserveBackend for MyBackend { async fn put(&self, key: &str, value: Bytes, metadata: ReserveMetadata) -> anyhow::Result<()> { // ... Ok(()) } async fn get(&self, key: &str) -> anyhow::Result> { // ... Ok(None) } async fn delete(&self, key: &str) -> anyhow::Result<()> { // ... Ok(()) } async fn evict_expired(&self, before: SystemTime) -> anyhow::Result { // ... Ok(0) } } ``` The trait is small on purpose. Admission control, sampling, and metric emission live above the backend so a custom backend only has to answer "store this", "fetch this", and "drop this". Implementations should be `Send + Sync` so a single instance backs every origin in a multi-tenant proxy. `ReserveMetadata` carries the response shape needed to replay an entry verbatim: ```rust,no_run pub struct ReserveMetadata { pub created_at: SystemTime, pub expires_at: SystemTime, pub content_type: Option, pub vary_fingerprint: Option, pub size: u64, pub status: u16, } ``` Backends should treat metadata as opaque once written: every field is round-tripped exactly through `get`. ## Metrics The reserve emits four Prometheus counters via the standard `sbproxy_*` registry: | Metric | Description | |--------|-------------| | `sbproxy_cache_reserve_hits_total` | Reserve hits served after a hot-cache miss. | | `sbproxy_cache_reserve_misses_total` | Hot + reserve both empty. | | `sbproxy_cache_reserve_writes_total` | Entries written into the reserve. | | `sbproxy_cache_reserve_evictions_total` | Explicit reserve deletions (invalidate-on-mutation). | Each counter is labelled by `origin`. Watch the hits / (hits + misses) ratio to size the reserve appropriately and the writes counter to confirm the admission filter is actually limiting reserve I/O. ## When the reserve helps - **Long-tail content.** Pages that get one hit per hour drop out of an LRU primary quickly. The reserve keeps them around so the second hit still serves from cache instead of paying the origin round trip. - **Cold-start churn.** When the primary is evicted on restart, the reserve carries enough warm entries that the cache hit ratio recovers in seconds rather than minutes. - **Large payloads with high origin egress cost.** Object-store costs are usually dominated by per-request operations, not per-byte storage; a reserve trades a small storage bill for the egress fees you would otherwise pay every time the origin re-renders the same page. ## Failure semantics - A failed reserve `put` is logged at `warn` level and does not fail the request. The hot tier already accepted the entry. - A failed reserve `get` falls through to origin. The hot tier's value, when present, is returned before the reserve is consulted, so primary hits are unaffected by reserve outages. - A failed reserve construction (e.g. invalid Redis URL) is logged at warn and degrades to "no reserve" rather than failing the whole config load. Plain hot-cache behaviour resumes. ## Tuning | Workload | `sample_rate` | `min_ttl` | `max_size_bytes` | |----------|---------------|-----------|------------------| | HTML pages, JSON API responses | `0.25` | `3600` | `1048576` | | Image / asset edge cache | `0.1` | `86400` | `10485760` | | AI completion bodies | `0.05` | `600` | `524288` | Lower sample rates are appropriate for backends with per-request operation costs (S3, Redis Cluster); a filesystem reserve can afford `sample_rate: 1.0` because writes are local. ## Library composer The `crates/sbproxy-cache/src/reserve/composer.rs` module also exposes a synchronous `ReserveCacheStore` that wraps two `CacheStore` implementations into a hot/cold pair. It remains the in-process building block when both tiers are cheap (memory + filesystem) and a code-level integration is preferred over the YAML config block. See the doc comment on `ReserveCacheStore` for usage. ## See also - [configuration.md](configuration.md#response-cache) - response cache schema. - `crates/sbproxy-cache/src/reserve/mod.rs` - backend trait + OSS implementations. ================================================================ # docs/clickhouse-attribution.md ================================================================ ## ClickHouse attribution *Last modified: 2026-06-01* A canonical ClickHouse schema for the SBproxy access log, plus sample queries for the three reports an operator most often wants: monthly project cost, top users by token spend, and tag-level burndown against a budget. The schema mirrors the JSON shape emitted by the structured logger (`sbproxy-observe::access_log::AccessLogEntry`), so a Vector / Fluent Bit pipeline can ingest the proxy's stdout into ClickHouse without an intermediate transform. This guide assumes a recent ClickHouse (v24.3 or newer; `JSONEachRow` and `TIMESTAMP` semantics are unchanged across the LTS line). The schema uses `MergeTree` for the raw rows and `AggregatingMergeTree` for the materialised pre-aggregations. ## Why ClickHouse The access log carries one row per terminated request. A production proxy emits 10 to 100 million rows per day. Three properties matter for an attribution warehouse: 1. **Columnar reads.** Almost every attribution query reads three to five columns from a row that has 60+. Columnar beats row-oriented by 10-20x on this shape. 2. **Time-partitioned writes.** UUIDv7 `request_id` already encodes the ingest millisecond in its leading 48 bits, so `ORDER BY (toDate(timestamp), request_id)` keeps writes append-only and partitions land naturally without a separate `_date` derived column. 3. **Pre-aggregation.** `AggregatingMergeTree` collapses the 10M-row daily volume to a few thousand per-day-per-project rows, so the dashboards point at a table that fits in memory regardless of fleet size. ## Raw row table The schema mirrors `AccessLogEntry`. Optional fields land as `Nullable(...)` so a row with no AI fields (a vanilla reverse-proxy hit) inserts without sentinels. Strings stay `LowCardinality(String)` for the columns whose distinct count is bounded; freeform fields use plain `String`. ```sql CREATE TABLE access_log ( -- Identity timestamp DateTime64(3, 'UTC'), request_id String, origin LowCardinality(String), method LowCardinality(String), path String, query Nullable(String), protocol LowCardinality(Nullable(String)), scheme LowCardinality(Nullable(String)), host Nullable(String), user_agent Nullable(String), referer Nullable(String), status UInt16, upstream_status Nullable(UInt16), latency_ms Float64, auth_ms Nullable(Float64), upstream_ttfb_ms Nullable(Float64), response_filter_ms Nullable(Float64), bytes_in UInt64, bytes_out UInt64, client_ip LowCardinality(String), -- Attribution workspace_id LowCardinality(String), auth_type LowCardinality(Nullable(String)), principal_kind LowCardinality(Nullable(String)), project LowCardinality(Nullable(String)), user LowCardinality(Nullable(String)), team LowCardinality(Nullable(String)), tags Array(LowCardinality(String)), metadata Map(LowCardinality(String), String), attribution Map(LowCardinality(String), String), -- AI gateway provider LowCardinality(Nullable(String)), model LowCardinality(Nullable(String)), prompt_name LowCardinality(Nullable(String)), prompt_version LowCardinality(Nullable(String)), tokens_in Nullable(UInt64), tokens_out Nullable(UInt64), ai_surface LowCardinality(Nullable(String)), -- Cache / cost cache_result LowCardinality(Nullable(String)), tier LowCardinality(Nullable(String)), shape LowCardinality(Nullable(String)), price Nullable(UInt64), currency LowCardinality(Nullable(String)), rail LowCardinality(Nullable(String)), cost_usd_micros Nullable(UInt64) MATERIALIZED if( price IS NOT NULL AND currency = 'USD', price, toNullable(0) ), -- Trace correlation trace_id Nullable(String), envelope_request_id Nullable(String), user_id Nullable(String), session_id Nullable(String), -- Captured headers (bounded by access-log capture caps) request_headers Map(LowCardinality(String), String), response_headers Map(LowCardinality(String), String), properties Map(LowCardinality(String), String) ) ENGINE = MergeTree PARTITION BY toYYYYMM(timestamp) ORDER BY (toDate(timestamp), workspace_id, project, request_id) TTL toDate(timestamp) + INTERVAL 90 DAY SETTINGS index_granularity = 8192; ``` The `TTL` is the recommended starting point for a SaaS deployment. Hot-data dashboards work off the last 30 days; the 90-day window covers month-end reconciliation. Compliance regimes that require longer retention (HIPAA, financial audit) should bump the TTL and budget the storage; ClickHouse compresses this schema to roughly 12-16 bytes per row in practice. ### `metadata` vs `attribution` Two map columns carry per-request labels, from different sources: * `attribution` is the resolved business attribution tag set: the credential's `attrs:` defaults (project, team) merged with the inbound `SB-Attr-*` headers (project, feature, okr, team, customer, environment, agent_type, risk_tier, trace_id). Per-request headers override the credential default. This is the **same tag set the Prometheus per-attribution metrics are labeled by** (`sbproxy_ai_tokens_attributed_total`, `sbproxy_ai_cost_dollars_attributed_total`), so a log query and a metric query answer "spend by feature/customer" identically. Pivot on any key with `attribution['feature']`, `attribution['customer']`, and so on. * `metadata` is free-form key/values the operator pins on the credential's `attrs.metadata:`. Use it for dimensions outside the fixed attribution schema (cost_center is lifted in here for back-compat). To pivot spend by any attribution dimension, group on the map value: ```sql SELECT attribution['feature'] AS feature, sum(cost_usd_micros) / 1e6 AS usd FROM access_log WHERE workspace_id = {workspace:String} AND timestamp >= toStartOfWeek(now()) AND attribution['feature'] != '' GROUP BY feature ORDER BY usd DESC; ``` ## Truncation policy for text fields The proxy never persists raw prompt or completion text to the access log. The `prompt_name` and `prompt_version` columns identify the rendered prompt; the token counts (`tokens_in`, `tokens_out`) describe the volume. If an operator needs raw text for evals or audit, route those through a separate sink with redaction enabled and ingest into a parallel table: ```sql CREATE TABLE prompt_audit ( timestamp DateTime64(3, 'UTC'), request_id String, role LowCardinality(String), content_redacted String -- emitted by the reversible PII pass; placeholders only ) ENGINE = MergeTree PARTITION BY toYYYYMM(timestamp) ORDER BY (toDate(timestamp), request_id) TTL toDate(timestamp) + INTERVAL 30 DAY; ``` Joining `prompt_audit` to `access_log` on `request_id` lets analysts trace a flagged response back to the redacted prompt without ever surfacing PII. The reversible-PII pass on the AI origin keeps the original out of every persisted artefact; only `` shapes ever land here. See the "Reversible PII redaction" section in `docs/observability.md` for the opt-in. ## Sample query 1: monthly project cost rollup ```sql SELECT project, toStartOfMonth(timestamp) AS month, countIf(provider IS NOT NULL) AS ai_requests, sumIf(tokens_in, provider IS NOT NULL) AS input_tokens, sumIf(tokens_out, provider IS NOT NULL) AS output_tokens, sum(cost_usd_micros) / 1e6 AS usd_spend FROM access_log WHERE workspace_id = {workspace:String} AND timestamp >= now() - INTERVAL 6 MONTH AND project IS NOT NULL GROUP BY project, month ORDER BY month DESC, usd_spend DESC; ``` The query partitions by month and project. `cost_usd_micros` is the materialised column from the schema; rows without a settled price contribute zero. Pass the operator's workspace_id as a parameter so a SaaS deployment can serve the report to multiple tenants from one table without a per-tenant view. ## Sample query 2: top-10 users by token spend in the last 24h ```sql SELECT user, project, sumIf(tokens_in, provider IS NOT NULL) AS input_tokens, sumIf(tokens_out, provider IS NOT NULL) AS output_tokens, (input_tokens + output_tokens) AS total_tokens, sum(cost_usd_micros) / 1e6 AS usd_spend FROM access_log WHERE workspace_id = {workspace:String} AND timestamp >= now() - INTERVAL 24 HOUR AND user IS NOT NULL GROUP BY user, project ORDER BY total_tokens DESC LIMIT 10; ``` The `principal_kind` column lets a query filter to non-AI traffic when wanted; the example above implicitly leaves it untouched so virtual-key and bearer-token attribution merge into one report. To split: ```sql WHERE ... AND principal_kind IN ('virtual_key', 'bearer') ``` ## Sample query 3: tag-level burndown vs budget The per-credential attribution metric `sbproxy_tokens_attributed_total{project, user, tag, direction}` rolls up at scrape time; the access-log query below mirrors it against per-credential budgets so dashboards can show "tag X has spent 7,200 of its 10,000 token allotment this week". Tags are a first-class `tags` array column on every line (copied from the credential's `attrs.tags:`), so the query reads them directly rather than parsing them out of the free-form `metadata` map: ```sql WITH ( SELECT map( 'cost_center:eng-001', 10000, 'cost_center:ops-002', 5000, 'okr:q3-latency', 50000 ) ) AS tag_budgets SELECT arrayJoin(tags) AS tag, sumIf(tokens_in + tokens_out, provider IS NOT NULL) AS spent_tokens, tag_budgets[tag] AS budget_tokens, if(budget_tokens > 0, round(100.0 * spent_tokens / budget_tokens, 1), NULL) AS percent_used FROM access_log WHERE workspace_id = {workspace:String} AND timestamp >= toStartOfWeek(now()) AND notEmpty(tags) GROUP BY tag HAVING budget_tokens > 0 ORDER BY percent_used DESC; ``` The query reads each line's `tags` array (populated from the credential's `attrs.tags:` list). To slice by team instead, group on the first-class `team` column the same way. Free-form `metadata` is still available for any key/value an operator declares on the credential. Replace the inline `tag_budgets` map with a join against an operator-maintained budget table for production use. ## Materialised view: per-day-per-project pre-aggregation Dashboards that render six months of monthly rollups every 30 seconds do not need to scan the raw 1.8B-row table on every refresh. A daily pre-aggregation collapses the volume to a few thousand rows per workspace: ```sql CREATE TABLE access_log_daily_project ( day Date, workspace_id LowCardinality(String), project LowCardinality(String), ai_requests AggregateFunction(count, UInt64), input_tokens AggregateFunction(sum, UInt64), output_tokens AggregateFunction(sum, UInt64), usd_spend_micros AggregateFunction(sum, UInt64) ) ENGINE = AggregatingMergeTree PARTITION BY toYYYYMM(day) ORDER BY (day, workspace_id, project); CREATE MATERIALIZED VIEW access_log_daily_project_mv TO access_log_daily_project AS SELECT toDate(timestamp) AS day, workspace_id, project, countState(toUInt64(1)) AS ai_requests, sumState(toUInt64(coalesce(tokens_in, 0))) AS input_tokens, sumState(toUInt64(coalesce(tokens_out, 0))) AS output_tokens, sumState(toUInt64(coalesce(cost_usd_micros, 0))) AS usd_spend_micros FROM access_log WHERE project IS NOT NULL GROUP BY day, workspace_id, project; ``` Read it with `*Merge` finalisers: ```sql SELECT project, toStartOfMonth(day) AS month, countMerge(ai_requests) AS ai_requests, sumMerge(input_tokens) AS input_tokens, sumMerge(output_tokens) AS output_tokens, sumMerge(usd_spend_micros) / 1e6 AS usd_spend FROM access_log_daily_project WHERE workspace_id = {workspace:String} AND day >= toDate(now()) - INTERVAL 6 MONTH GROUP BY project, month ORDER BY month DESC, usd_spend DESC; ``` The dashboard query reads `access_log_daily_project` instead of `access_log`. On a 100M-row-per-day fleet the pre-aggregated table holds ~3000 rows per month and answers a six-month rollup in single-digit milliseconds. ## Ingestion Vector and Fluent Bit both speak ClickHouse's `JSONEachRow` format. A minimal Vector config that reads the proxy's stdout (or a sink configured under `proxy.observability.log.sinks` once dispatch lands) into the table above: ```toml [sources.sbproxy_stdout] type = "stdin" [transforms.parse] type = "remap" inputs = ["sbproxy_stdout"] source = '. = parse_json!(.message)' [sinks.clickhouse] type = "clickhouse" inputs = ["parse"] endpoint = "http://clickhouse:8123" database = "sbproxy" table = "access_log" encoding.codec = "json" ``` For multi-tenant fleets where each tenant operates its own ClickHouse, the sink declares its own endpoint; the proxy's per-tenant sink config (planned alongside the credentials epic) routes each tenant's lines to the tenant's collector without the operator running a fan-out service. ## Related reading * `docs/observability.md` for the proxy-side log schema, redaction layers, and reversible PII semantics. * `docs/access-log.md` for the per-field reference and capture caps. * `docs/ai-gateway.md` for the AI virtual key shape that populates `project`, `user`, `metadata`, and the per-credential token attribution. ================================================================ # docs/cloudflare-code-mode.md ================================================================ ## Cloudflare Code Mode *Last modified: 2026-05-15* SBproxy can emit a typed TypeScript module covering every tool in the MCP federation registry. Agents written against the [Cloudflare Code Mode](https://blog.cloudflare.com/code-mode/) runtime can import the module and invoke each tool as an ordinary async function. Code Mode compresses a large tool catalog from many tool-call JSONs down to a single typed module, cutting the agent's token spend by roughly an order of magnitude on large surfaces. ## What it emits The emitted module pairs each tool with an `Input` interface, an `Output` interface, and a member of a `codemode` namespace whose shape matches the `@cloudflare/codemode` runtime contract: ```ts export interface SearchDocsInput { query: string; limit?: number; } export interface SearchDocsOutput { content?: Array<{ type: string; text?: string; mimeType?: string; [key: string]: unknown }>; isError?: boolean; [key: string]: unknown; } export const codemode = { /** Search the documentation. */ search_docs: (input: SearchDocsInput): Promise => __codemode_call('search_docs', input as unknown), } as const; export default codemode; ``` A self-contained runtime stub is appended to the module so it is importable from any TypeScript environment that has `fetch`. The stub posts the typed input to the gateway and parses the JSON response. An `AGENT_GATEWAY_TOKEN` env var, when set, is forwarded as a bearer token; callers that need a custom auth scheme can install their own fetch via `setCodemodeFetch(...)`. ## Calling the emitter from Rust The federation registry exposes a single method: ```rust,ignore let federation: McpFederation = /* built at startup */; let module_text: String = federation.codemode_ts("https://gw.example/.well-known/mcp"); ``` The returned string is reproducible: tools are sorted lexicographically before emission so an Etag derived from the body is stable as long as the registry does not change. ## JSON Schema support The codegen covers the subset MCP tool schemas typically use: - `type: object` with `properties` and `required` becomes a typed `interface`. `additionalProperties: false` removes the index signature; otherwise the interface allows extension fields. - `type: string|number|integer|boolean|null` maps to the obvious TS primitive. - `type: array` with `items` becomes `Array`. - `enum` over strings becomes a TS string-literal union. - `oneOf` / `anyOf` becomes a union. - Nested objects inline as structural types so the parent interface stays compact. - Unrecognised shapes fall back to `unknown` rather than failing to emit. Operators who want a tighter type can post-process or ask the upstream MCP server to publish a tighter schema. Property names that collide with TypeScript reserved words or contain non-identifier characters are emitted as string-quoted keys (`'class':`, `'with-dash':`). ## Streaming tools Streaming MCP tools are out of scope for the initial emission. The runtime stub posts and waits for a JSON response. A follow-up will emit `AsyncIterable`-typed signatures and add server-sent-event plumbing to the stub. ## HTTP endpoint Serving the module over HTTP at a well-known URL is the natural next step. The current PR ships the emitter as a library function on the federation registry so any HTTP wiring layer can hand the bytes through to the client. A future ticket will land the `/.well-known/mcp/codemode.ts` route on the proxy itself, with caching, Etag, and workspace + RBAC filtering wired against the same predicates the existing agent-skills endpoint uses. ## References - Code Mode: the better way to use MCP (Cloudflare blog): https://blog.cloudflare.com/code-mode/ - Code Mode SDK changelog v0.2.1: https://developers.cloudflare.com/changelog/post/2026-03-17-codemode-sdk-v021/ - Code Mode for MCP server portals (Cloudflare changelog): https://developers.cloudflare.com/changelog/post/2026-03-26-mcp-portal-code-mode/ - Cloudflare Agents docs: https://developers.cloudflare.com/agents/api-reference/codemode/ ================================================================ # docs/comparison.md ================================================================ ## How SBproxy compares *Last modified: 2026-06-08* SBproxy is a reverse proxy that doubles as an AI gateway. Most tools do one or the other; this page is honest about where SBproxy fits and where you should pick something else. ## The short version | Tool | Type | AI Gateway | General Proxy | Single Binary | Scripting | |------|------|-----------|---------------|---------------|-----------| | **SBproxy** | Proxy + AI gateway | Yes (200+ models) | Yes | Yes (Rust) | CEL + Lua + WASM + JS | | LiteLLM | AI gateway only | Yes (100+ providers) | No | No (Python) | No | | Portkey | AI gateway (SaaS) | Yes | No | No (Node.js) | No | | Helicone | AI observability | Proxy + observability | No | No (managed or self-host) | No | | Kong | API gateway | Yes (plugin) | Yes | Yes (Lua/C) | Lua | | Caddy | Reverse proxy | No | Yes | Yes | Modules | | Traefik | Reverse proxy | No | Yes | Yes | Limited | | Nginx | Reverse proxy | No | Yes | Yes (C) | Lua (OpenResty) | | Pingora (raw) | Proxy framework | No (DIY) | Yes (DIY) | Library, not a binary | Rust code | | Envoy | Service mesh proxy | No | Yes | Yes (C++) | WASM | ## When SBproxy is the right choice SBproxy fits when you need a production reverse proxy *and* an AI gateway in the same traffic layer. Pick it when: - **You run both kinds of traffic.** HTTP and LLM. Most teams glue Nginx or Traefik together with LiteLLM, Portkey, or a SaaS AI gateway. Two systems to configure, deploy, and monitor. SBproxy is one binary, one config, one place to put policies. - **You care about overhead.** Sub-millisecond p99 on the proxy path. Idle RSS in single-digit megabytes. LiteLLM wants 4 CPU and 8 GB plus Python, PostgreSQL, and Redis. Managed gateways add a public network hop. - **You want scripting that ships in the binary.** CEL for routing (compiled once, evaluates in microseconds), Lua for transforms, JavaScript via QuickJS, and sandboxed WebAssembly for plugins. No C modules to compile, no separate plugin daemon. - **You need MCP federation.** SBproxy proxies and federates Model Context Protocol traffic alongside HTTP and AI. No other general-purpose proxy ships this. - **You want to self-host without a database.** Single binary. No PostgreSQL. Redis is optional, only needed for distributed rate limiting and shared cache. ## When to pick something else - **AI-only with maximum provider breadth.** LiteLLM has 100+ native providers and is simpler to set up if HTTP routing isn't part of your problem. Note: its current Business Source License restricts commercial self-hosting. - **Managed AI gateway, zero ops.** Portkey Cloud or one of the SaaS-only AI gateways (OpenRouter, Cloudflare AI Gateway, Vercel AI Gateway) is worth a look. Those are not on this comparison page because they don't ship as a self-hostable proxy. - **Pure reverse proxy.** Caddy and Traefik have larger communities and simpler config for the basics. Pingora is the framework underneath SBproxy if you'd rather hand-roll in Rust. ## Detailed comparisons ### vs LiteLLM LiteLLM is the most popular open-source AI gateway. It supports 100+ LLM providers. SBproxy reaches 200+ models through 66 native providers behind one OpenAI-compatible API, including a native Anthropic translator. You bring your own key per provider and the model name passes straight through, so any model a provider serves works without per-model config. Point any provider at a custom `base_url` for self-hosted or proprietary endpoints. | | SBproxy | LiteLLM | |---|---------|---------| | LLM providers | 200+ models (66 native providers, bring your own keys) | 100+ native | | General HTTP proxy | Yes | No | | Implementation | Compiled native binary | Python | | Min resources | 1 CPU, 256 MB | 4 CPU, 8 GB | | Database required | No | PostgreSQL | | HTTP/3 | Planned | No | | WebSocket proxy | Yes | No | | gRPC proxy | Yes | No | | MCP federation | Yes | No | | Authentication | 7+ types (JWT, forward auth, digest, ...) | API key | | Scripting | CEL + Lua + WASM + JS | No | | Rate limiting | Built-in, distributed | Built-in | | Response caching | Built-in (memory, file, memcached, redis) | 7 backends | | Guardrails | 7 built-in types (PII, injection, ...) | External integrations | | P99 proxy overhead | < 1 ms | 240-1200 ms | Choose LiteLLM if you only need an AI gateway and want the broadest provider coverage out of the box. Choose SBproxy if you need a general proxy that also routes AI traffic, or you care about performance and resource efficiency. ### vs Portkey Portkey is a managed AI gateway focused on observability and prompt management. | | SBproxy | Portkey | |---|---------|---------| | Deployment | Self-hosted | SaaS (primary) | | Open source | Full proxy (Apache 2.0) | Gateway component (MIT) | | General HTTP proxy | Yes | No | | Response caching | Built-in | Yes | | Prompt management | No | Yes | | Cost tracking | Yes (events + budget enforcement) | Yes (dashboard) | Choose Portkey if you want a managed service with dashboards and prompt management and don't need a general proxy. Choose SBproxy if you want to self-host, need a general proxy, or want full control over your infrastructure. ### vs Helicone Helicone focuses on AI observability, with a proxy in the path that captures requests for analytics. | | SBproxy | Helicone | |---|---------|---------| | Primary focus | Proxy + AI gateway | Observability with a proxy in the path | | General HTTP proxy | Yes | No | | Self-host | Yes | Yes (managed primary) | | Caching, guardrails, budgets | Built-in | Caching only | | Custom transforms and scripting | Yes | No | Choose Helicone if observability is your sole need. Choose SBproxy if you want gateway features (routing, fallbacks, budgets, guardrails, caching) plus observability, or also need a general proxy. ### vs Kong Kong is a mature API gateway with a large plugin ecosystem. It added AI gateway capabilities via plugins in 2024. | | SBproxy | Kong | |---|---------|------| | Primary focus | Proxy + AI gateway | API gateway | | Implementation | Native binary on Pingora | Lua/C (OpenResty) | | Database | Not required | PostgreSQL (or DB-less mode) | | AI gateway | Native | Plugin-based | | Plugin system | CEL + Lua + WASM + JS + registry | Lua plugins | | HTTP/3 | Planned | No | | Rate limiting | Built-in, distributed | Plugin | | Authentication | 7+ built-in types | Plugin-based | | MCP federation | Yes | No | | gRPC proxy | Yes | Yes | Choose Kong if you want a mature API gateway ecosystem with hundreds of community plugins. Choose SBproxy if you want native AI gateway features without plugins or a lighter deployment footprint. ### vs Caddy Caddy is a Go reverse proxy known for automatic HTTPS. | | SBproxy | Caddy | |---|---------|-------| | Automatic HTTPS | Yes (ACME via rustls + Let's Encrypt) | Yes (ACME) | | AI gateway | Yes (200+ models) | No | | Config format | YAML | Caddyfile or JSON | | Rate limiting | Built-in, distributed | Community module | | Scripting | CEL + Lua + WASM + JS | Modules | | HTTP/3 | Planned | Yes | | Compression | Gzip, Brotli, Zstd | Gzip, Brotli, Zstd | | Circuit breaker | Built-in (3-state) | Latency-based | | Health checks | Active + passive | Active + passive | | Retries | Configurable with backoff | Configurable | | PROXY protocol | Yes (v1/v2) | Yes (v1/v2) | | Service discovery | DNS SRV, Consul | SRV, A/AAAA | | Load balancing | 12 algorithms | 12+ algorithms | | WAF | Built-in (OWASP, SQLi, XSS) | Community module | | DDoS protection | Built-in | No | | gRPC proxy | Yes | Yes | | MCP federation | Yes | No | | Authentication | 7+ built-in types | Community modules | | Memory model | No garbage collector | Garbage collected | Caddy and SBproxy overlap heavily on core proxy features. Caddy has a larger community, deeper static-file support, and simpler config for the simplest cases. SBproxy adds AI gateway features, more scripting options, no GC pauses, and built-in distributed rate limiting and DDoS protection. Choose Caddy if you want the simplest reverse proxy with automatic HTTPS and don't need AI features or scripting. Choose SBproxy if you need AI gateway capabilities, programmable scripting, predictable latency without GC pauses, or built-in rate limiting and caching. ### vs Traefik Traefik is a cloud-native reverse proxy with automatic service discovery. | | SBproxy | Traefik | |---|---------|---------| | Service discovery | Config-based + DNS | Docker, K8s, Consul | | AI gateway | Yes | No | | Middleware | CEL + Lua + WASM + JS + built-in | Declarative chain | | HTTP/3 | Planned | Experimental | | Rate limiting | Built-in, distributed | Traefik Hub only (paid) | | MCP federation | Yes | No | | Plugin system | CEL + Lua + WASM + JS | WASM/Yaegi | Choose Traefik if you need automatic service discovery from Docker or Kubernetes labels. Choose SBproxy if you need AI gateway features, more flexible scripting, or built-in distributed rate limiting. ### vs Nginx Nginx is the most widely deployed reverse proxy. | | SBproxy | Nginx | |---|---------|-------| | Config reload | Hot reload (atomic in-process swap) | Worker process restart (graceful, but new process) | | AI gateway | Yes | No | | gRPC proxy | Yes | Yes | | MCP federation | Yes | No | | Scripting | CEL + Lua + WASM + JS | Lua (OpenResty) / C modules | | HTTP/3 | Planned | Yes (newer builds) | | Active health checks | Built-in | NGINX Plus only | | Dynamic config | Feature flags | NGINX Plus only | | Static file serving | Not supported (proxy focus) | Excellent | | Memory model | No garbage collector | Native | Nginx is hard to beat for static content and simple reverse proxying, and it's likely already in your stack. Choose Nginx if you need maximum raw throughput for static content, simple reverse proxying, or you already have a mature Nginx footprint. Choose SBproxy if you need AI gateway features, dynamic configuration via feature flags, or programmable routing without writing Lua or C modules. ### vs Pingora (raw framework) Pingora is the Cloudflare-built proxy framework that SBproxy is built on. Using Pingora directly means writing your proxy logic in Rust against its `ProxyHttp` trait. | | SBproxy | Pingora (direct) | |---|---------|---------| | Out-of-the-box config | YAML, hot reload | None, you write Rust | | Auth, policies, transforms, AI | Built-in | DIY | | Plugin ecosystem | CEL + Lua + WASM + JS + native | DIY in Rust | | Operational tooling | Metrics, dashboards, events | DIY | Choose Pingora directly if you have narrow custom requirements and a team comfortable maintaining a Rust codebase. Choose SBproxy if you want the Pingora performance envelope without writing and maintaining proxy infrastructure yourself. ### vs Envoy Envoy is a high-performance L4/L7 proxy designed for service mesh deployments. | | SBproxy | Envoy | |---|---------|-------| | Deployment model | Standalone binary | Sidecar or edge (needs control plane) | | Configuration | YAML file | xDS API (usually via Istio) | | AI gateway | Yes | No | | gRPC proxy | Yes | Yes (native) | | MCP federation | Yes | No | | Rate limiting | Built-in | External gRPC service | | Caching | Built-in | No | | Authentication | 7+ built-in types | External service or filters | | Extensibility | CEL + Lua + WASM + JS | WASM | Choose Envoy if you're building a service mesh or need L4 TCP proxying with advanced traffic management. Choose SBproxy if you want a standalone proxy with built-in features (rate limiting, caching, AI gateway) that doesn't require a control plane. ## Summary SBproxy is a full reverse proxy (like Nginx, Caddy, or Traefik) and an AI gateway (like LiteLLM or Portkey) in one binary, with MCP federation built in. Most teams run two separate systems today. SBproxy collapses them. Next: the [manual](manual.md), [architecture](architecture.md), [performance](performance.md), or runnable [examples](../examples/). ================================================================ # docs/config-stability.md ================================================================ ## Config stability tiers *Last modified: 2026-06-08* Stability guarantees for every field in `sb.yml`. Check a field's tier before relying on it in production. --- ## Stability tiers ### `stable` A `stable` field is part of the committed public API of SBproxy. - The field name, type, and default value will not change in a minor or patch release. - Removing or renaming a `stable` field requires a major version bump (e.g. v1 -> v2) and a migration guide. - Behavioral changes to a `stable` field require at least a minor version bump and a changelog entry. ### `beta` A `beta` field is functional and tested but may still change. - Available for production use. Monitor the changelog before upgrading. - Renames or semantic changes may happen in a minor release with a deprecation notice. - Beta fields are not silently removed. A one-release deprecation period applies. ### `alpha` An `alpha` field is experimental. - May be renamed, restructured, or removed in any release without prior notice. - Do not depend on `alpha` fields in critical production paths. - Feedback on alpha fields is welcome and influences their stabilization. ### `disabled` A `disabled` field still parses but has no runtime effect today. - The field is accepted by the config loader so existing configs keep loading. - No code path acts on the value; setting it does nothing beyond an optional warning log. - Currently applies to the `http3` block: HTTP/3 is temporarily disabled until native QUIC support lands in Pingora. --- ## Stabilization rules 1. A field moves from `alpha` to `beta` once its interface is reviewed, it has integration tests, and it has been in at least one release. 2. A field moves from `beta` to `stable` once it has been in production use by at least one internal deployment for one full release cycle without interface changes. 3. Stable fields are never silently removed. The process is: deprecate (add `x-deprecated` annotation in schema), warn in logs, remove in the next major version. --- ## Field stability reference ### Top-level fields | Field | Type | Stability | Notes | |---|---|---|---| | `proxy` | object | **stable** | Server configuration block. | | `origins` | object (map) | **stable** | Map of hostname to origin config. | ### `proxy` - ProxyServerConfig | Field | Type | Default | Stability | Notes | |---|---|---|---|---| | `http_bind_port` | integer | 8080 | **stable** | Plain HTTP listener port. | | `https_bind_port` | integer | - | **stable** | TLS listener port. Optional. | | `tls_cert_file` | string | - | **stable** | Path to PEM cert for manual TLS. | | `tls_key_file` | string | - | **stable** | Path to PEM key for manual TLS. | | `acme` | object | - | **beta** | Automatic TLS via ACME. | | `http3` | object | - | **disabled** | HTTP/3 (QUIC) listener. Currently inert. | ### `proxy.acme` - AcmeConfig | Field | Type | Default | Stability | Notes | |---|---|---|---|---| | `enabled` | boolean | false | **beta** | Activates ACME. | | `email` | string | "" | **beta** | Contact email for the ACME account. | | `directory_url` | string | Let's Encrypt prod | **beta** | ACME directory endpoint URL. | | `challenge_types` | array | `[tls-alpn-01, http-01]` | **beta** | Challenge method preference list. | | `storage_backend` | string | `redb` | **beta** | Cert persistence backend. | | `storage_path` | string | `/var/lib/sbproxy/certs` | **beta** | Filesystem path for cert storage. | | `renew_before_days` | integer | 30 | **beta** | Days before expiry to renew. | ### `proxy.http3` - Http3Config HTTP/3 is temporarily disabled until native QUIC support lands in Pingora. These fields still parse, but no QUIC listener starts and setting `enabled: true` only logs a warning. | Field | Type | Default | Stability | Notes | |---|---|---|---|---| | `enabled` | boolean | false | **disabled** | Enable QUIC listener. Currently inert; no listener starts. | | `max_streams` | integer | 100 | **disabled** | Max concurrent QUIC streams per connection. Currently inert. | | `idle_timeout_secs` | integer | 30 | **disabled** | QUIC idle timeout in seconds. Currently inert. | ### Origin Config (each entry under `origins:`) | Field | Alias | Type | Default | Stability | Notes | |---|---|---|---|---|---| | `action` | - | object | required | **stable** | What the proxy does with requests. | | `authentication` | `auth` | object | - | **stable** | Auth plugin config. | | `policies` | - | array | `[]` | **stable** | Policy plugin list. | | `transforms` | - | array | `[]` | **beta** | Body transform plugin list. | | `request_modifiers` | - | array | `[]` | **stable** | Request modification steps. | | `response_modifiers` | - | array | `[]` | **stable** | Response modification steps. | | `cors` | - | object | - | **stable** | CORS policy. | | `hsts` | - | object | - | **stable** | HSTS policy. | | `compression` | - | object | - | **stable** | Response compression. | | `session_config` | - | object | - | **beta** | Session cookie management. | | `force_ssl` | - | boolean | false | **stable** | Redirect HTTP to HTTPS. | | `allowed_methods` | - | array | `[]` (all) | **stable** | HTTP method allowlist. | | `forward_rules` | - | array | `[]` | **beta** | Conditional routing rules. | | `fallback_origin` | - | object | - | **beta** | Secondary origin on primary failure. | | `response_cache` | - | object | - | **beta** | Response caching config. | | `variables` | - | object | `{}` | **beta** | Named template variables. | | `on_request` | - | array | `[]` | **alpha** | Request event hook plugins. | | `on_response` | - | array | `[]` | **alpha** | Response event hook plugins. | | `bot_detection` | - | object | - | **alpha** | Bot detection config. | | `threat_protection` | - | object | - | **alpha** | Dynamic threat blocklist config. | | `rate_limit_headers` | - | object | - | **beta** | Rate limit response header config. | | `error_pages` | - | object | - | **beta** | Custom error page config. | | `traffic_capture` | - | object | - | **alpha** | Request mirroring config. | | `message_signatures` | - | object | - | **alpha** | HTTP message signing config. | ### CORS Config (`cors:`) | Field | Alias | Type | Default | Stability | |---|---|---|---|---| | `allowed_origins` | `allow_origins` | array | `[]` | **stable** | | `allowed_methods` | `allow_methods` | array | `[]` | **stable** | | `allowed_headers` | `allow_headers` | array | `[]` | **stable** | | `expose_headers` | - | array | `[]` | **stable** | | `max_age` | - | integer | - | **stable** | | `allow_credentials` | - | boolean | false | **stable** | | `enable` | `enabled` | boolean | - | **stable** | ### HSTS Config (`hsts:`) | Field | Type | Default | Stability | |---|---|---|---| | `max_age` | integer | 31536000 | **stable** | | `include_subdomains` | boolean | false | **stable** | | `preload` | boolean | false | **stable** | ### Compression Config (`compression:`) | Field | Alias | Type | Default | Stability | |---|---|---|---|---| | `enabled` | `enable` | boolean | true | **stable** | | `algorithms` | - | array | `[]` | **stable** | | `min_size` | - | integer | 0 | **stable** | | `level` | - | integer | - | **beta** | ### Session Config (`session_config:`) | Field | Alias | Type | Default | Stability | |---|---|---|---|---| | `cookie_name` | - | string | - | **beta** | | `max_age` | `cookie_max_age` | integer | - | **beta** | | `http_only` | - | boolean | false | **beta** | | `secure` | - | boolean | false | **beta** | | `same_site` | `cookie_same_site` | string | - | **beta** | | `allow_non_ssl` | - | boolean | false | **beta** | ### Request Modifier (`request_modifiers[]`) | Field | Type | Stability | Notes | |---|---|---|---| | `headers` | object | **stable** | Header set/add/remove. | | `url` | object | **stable** | Path rewrite. | | `query` | object | **stable** | Query param set/add/remove. | | `method` | string | **stable** | Override HTTP method. | | `body` | object | **stable** | Body replacement. | | `lua_script` | string | **beta** | Dynamic modification via Lua. | ### Response Modifier (`response_modifiers[]`) | Field | Type | Stability | Notes | |---|---|---|---| | `headers` | object | **stable** | Header set/add/remove. | | `status` | object | **stable** | Status code override. | | `body` | object | **stable** | Body replacement. | | `lua_script` | string | **beta** | Dynamic modification via Lua. | ### Header Modifiers | Field | Alias | Type | Default | Stability | |---|---|---|---|---| | `set` | - | object | `{}` | **stable** | | `add` | - | object | `{}` | **stable** | | `remove` | `delete` | array | `[]` | **stable** | ### Path Replace (`url.path.replace`) | Field | Type | Stability | |---|---|---| | `old` | string | **stable** | | `new` | string | **stable** | ### Query Modifier | Field | Alias | Type | Default | Stability | |---|---|---|---|---| | `set` | - | object | `{}` | **stable** | | `add` | - | object | `{}` | **stable** | | `remove` | `delete` | array | `[]` | **stable** | ### Body Modifier (request) | Field | Type | Stability | |---|---|---| | `replace` | string | **stable** | | `replace_json` | any | **stable** | ### Response Body Modifier | Field | Type | Stability | |---|---|---| | `replace` | string | **stable** | | `replace_json` | any | **stable** | ### Status Override | Field | Type | Stability | |---|---|---| | `code` | integer | **stable** | | `text` | string | **stable** | ================================================================ # docs/configuration.md ================================================================ ## SBproxy Configuration Reference *Last modified: 2026-06-08* The complete configuration reference for SBproxy. Every option, every field, every action type is documented here with real-world examples you can copy-paste and run. For AI-specific features in depth, see [ai-gateway.md](ai-gateway.md). For CEL, Lua, JavaScript, and WASM scripting, see [scripting.md](scripting.md). For the event system, see [events.md](events.md). ## Table of contents 1. [Overview](#overview) 2. [Top-level structure](#top-level-structure) 3. [Proxy settings](#proxy-settings) 4. [Origins](#origins) 5. [Actions](#actions) 6. [Authentication](#authentication) 7. [Policies](#policies) 8. [Transforms](#transforms) 9. [Request modifiers](#request-modifiers) 10. [Response modifiers](#response-modifiers) 11. [Response cache](#response-cache) 12. [Forward rules](#forward-rules) 13. [Fallback origin](#fallback-origin) 14. [Variables, vaults, and secrets](#variables-vaults-and-secrets) 15. [Session config](#session-config) 16. [Compression](#compression) 17. [HSTS](#hsts) 18. [Connection pool](#connection-pool) 19. [Bot detection](#bot-detection) 20. [Threat protection](#threat-protection) 21. [Error pages](#error-pages) 22. [Rate limit headers](#rate-limit-headers) 23. [Message signatures](#message-signatures) 24. [Traffic capture](#traffic-capture) 25. [Host header semantics](#host-header-semantics) 26. [Trusted proxies and forwarding headers](#trusted-proxies-and-forwarding-headers) 27. [Request mirror](#request-mirror) 28. [Upstream retries](#upstream-retries) 29. [Active health checks](#active-health-checks) 30. [Circuit breaker](#circuit-breaker) 31. [Outlier detection](#outlier-detection) 32. [Service discovery](#service-discovery) 33. [Correlation ID](#correlation-id) 34. [mTLS client authentication](#mtls-client-authentication) 35. [Webhook envelope and signing](#webhook-envelope-and-signing) 36. [Secrets](#secrets) 37. [Environment variables](#environment-variables) 38. [ACME / auto TLS](#acme--auto-tls) 39. [Redis integration](#redis-integration) 40. [Validation](#validation) --- ## Overview SBproxy reads its configuration from a YAML file, typically named `sb.yml`. This file defines how the proxy listens for traffic, which hostnames it handles, and what it does with each request. Load a config file. The path must be supplied explicitly; the binary does not auto-discover `sb.yml` in the current directory. ```bash ## Explicit path sbproxy --config /etc/sbproxy/production.yml ## Same thing via the `serve` subcommand and the short flag sbproxy serve -f /etc/sbproxy/production.yml ## Or via env var for containerised deployments SB_CONFIG_FILE=/etc/sbproxy/production.yml sbproxy ``` Validate without starting: ```bash sbproxy validate /etc/sbproxy/production.yml ## or sbproxy --config /etc/sbproxy/production.yml --check ``` The config has two main sections: `proxy` (server-level settings) and `origins` (per-hostname routing and behavior). Optional shared-state blocks (`l2_cache_settings`, `messenger_settings`) live nested under `proxy`. --- ## JSON Schema (editor autocomplete + validation) SBproxy ships a JSON Schema at `schemas/sb-config.schema.json`. Editor tooling that understands the `yaml-language-server` directive (VS Code with the YAML extension, IntelliJ / JetBrains, Helix) reads this schema and validates `sb.yml` field names + types in real time. A typo in a key surfaces as an editor error rather than as a runtime parse failure. Opt in by adding a comment header at the top of your `sb.yml`: ```yaml ## yaml-language-server: $schema=https://raw.githubusercontent.com/soapbucket/sbproxy/main/schemas/sb-config.schema.json proxy: http_bind_port: 8080 origins: "api.example.com": action: { type: proxy, url: http://127.0.0.1:9000 } ``` Every `examples/*/sb.yml` in this repo carries the header pointing at the local `schemas/` path so the examples are self-validating against the same schema operators consume. The schema is **generated** from the Rust types in `crates/sbproxy-config/src/types.rs` so it cannot drift from the runtime. Regenerate locally with: ```bash cargo run -p sbproxy-config --bin generate-schema > schemas/sb-config.schema.json ``` The CI gate `scripts/check-config-schema.sh` runs the generator and `diff`s against the committed file; a Rust type change that does not regenerate the schema is rejected at PR time. The generator is deterministic (the `preserve_order` feature on `schemars` keeps object property order stable), so the diff is byte-for-byte. --- ## Top-level structure Complete YAML skeleton with every top-level key: ```yaml ## Server settings (ports, TLS, ACME, admin, secrets, shared state) proxy: http_bind_port: 8080 https_bind_port: 8443 tls_cert_file: /etc/sbproxy/cert.pem tls_key_file: /etc/sbproxy/key.pem acme: { ... } http3: { ... } metrics: { ... } alerting: { ... } admin: { ... } secrets: { ... } # L2 cache (Redis) for distributed rate limiting and caching l2_cache_settings: driver: redis params: dsn: redis://localhost:6379/0 # Messenger (Redis) for real-time config updates messenger_settings: driver: redis params: dsn: redis://localhost:6379 # Opaque per-server extensions consumed by enterprise / third-party crates. extensions: { ... } ## Per-hostname origin configurations origins: "api.example.com": action: { ... } authentication: { ... } policies: [ ... ] transforms: [ ... ] request_modifiers: [ ... ] response_modifiers: [ ... ] forward_rules: [ ... ] response_cache: { ... } variables: { ... } session: { ... } cors: { ... } compression: { ... } hsts: { ... } connection_pool: { ... } extensions: { ... } ``` `l2_cache_settings` and `messenger_settings` are nested under `proxy:` (the deserializer also accepts `l2_cache` as a canonical alias). --- ## Proxy settings The `proxy` block configures server-level behavior: ports, TLS, ACME, the admin API, metrics, secrets, and the optional shared-state backends. ```yaml proxy: http_bind_port: 8080 https_bind_port: 8443 tls_cert_file: /etc/sbproxy/cert.pem tls_key_file: /etc/sbproxy/key.pem acme: enabled: true email: admin@example.com storage_path: /var/lib/sbproxy/certs http3: enabled: false metrics: max_cardinality_per_label: 1000 cardinality: hostname_cap: 200 admin: enabled: false port: 9090 ``` ### Proxy fields | Field | Type | Default | Description | |-------|------|---------|-------------| | `http_bind_port` | int | 8080 | HTTP listen port | | `https_bind_port` | int | unset | Optional HTTPS listen port. Requires `tls_cert_file` + `tls_key_file` or an `acme` block. | | `tls_cert_file` | string | | Path to PEM-encoded TLS certificate. Ignored when `acme` is configured. | | `tls_key_file` | string | | Path to PEM-encoded TLS private key. | | `acme` | object | | ACME (auto-TLS) block. Overrides manual cert/key when set. See [ACME / auto TLS](#acme--auto-tls). | | `http3` | object | | HTTP/3 (QUIC) listener config. Currently inert; see [HTTP/3 fields](#http3-fields). | | `metrics` | object | | Metrics tuning, including label cardinality limits. | | `alerting` | object | | Alert notification channels. | | `admin` | object | | Embedded read-only admin / stats API server. | | `secrets` | object | | Secrets management backend. See [Secrets](#secrets). | | `l2_cache_settings` | object | | Optional shared-state backend. Alias: `l2_cache`. | | `messenger_settings` | object | | Optional shared message bus for inter-component eventing. | | `trusted_proxies` | array of CIDR strings | `[]` | Source ranges whose inbound `X-Forwarded-For` / `X-Real-IP` / `Forwarded` headers are honoured. Connections from outside the list have those headers stripped on ingress so they cannot spoof identity. IPv6 CIDRs work. See [Trusted proxies and forwarding headers](#trusted-proxies-and-forwarding-headers). | | `correlation_id` | object | enabled, `X-Request-Id`, echo on | Correlation-ID propagation policy. See [Correlation ID](#correlation-id). | | `mtls` | object | unset | mTLS client-certificate verification on the HTTPS listener. See [mTLS client authentication](#mtls-client-authentication). | | `http_client_timeouts` | object | (see below) | Tunable timeouts for the proxy's outbound HTTP helpers (forward-auth, callbacks, mirrors, SWR refreshes, bot-auth directory). See [HTTP client timeouts](#http-client-timeouts). | | `extensions` | object | | Opaque map for enterprise / third-party top-level config blocks. OSS never parses these. | ### HTTP client timeouts The proxy keeps a small set of pooled `reqwest::Client` instances for its outbound helper requests. Each one used to bake a hardcoded timeout into the binary; operators who wanted a slower forward-auth deadline or a shorter callback budget had to fork the binary. The `http_client_timeouts` block exposes those numbers as config keys. All fields default to the values the binary used before this block existed, so omitting it leaves behaviour unchanged. ```yaml proxy: http_client_timeouts: forward_auth_client_secs: 30 forward_auth_request_secs: 5 bot_auth_directory_client_secs: 5 swr_client_secs: 30 callback_client_secs: 10 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `forward_auth_client_secs` | int | 30 | Outer client-level timeout for the shared forward-auth client. The per-provider `forward_auth.timeout` field still applies on top. | | `forward_auth_request_secs` | int | 5 | Per-request fallback timeout for a forward-auth subrequest when the provider's own `timeout` field is unset. | | `bot_auth_directory_client_secs` | int | 5 | Client-level timeout for the Web Bot Auth directory lookup client. | | `swr_client_secs` | int | 30 | Client-level timeout for the stale-while-revalidate background refresh client. | | `callback_client_secs` | int | 10 | Client-level timeout for the callback / webhook client used by fire-and-forget POSTs. | ### HTTP/3 fields HTTP/3 is temporarily disabled until native QUIC support lands in Pingora. The `http3` block still parses, but no QUIC listener starts and setting `enabled: true` only logs a warning. The fields below are documented for forward compatibility; they have no runtime effect today. | Field | Type | Default | Description | |-------|------|---------|-------------| | `enabled` | bool | false | Enable the HTTP/3 (QUIC) listener. Currently inert; no listener starts. | | `max_streams` | int | 100 | Maximum concurrent QUIC streams per connection. Currently inert. | | `idle_timeout_secs` | int | 30 | Idle timeout for QUIC connections. Currently inert. | ### Admin fields | Field | Type | Default | Description | |-------|------|---------|-------------| | `enabled` | bool | false | Enable the admin server | | `port` | int | 9090 | Listen port | | `username` | string | "admin" | HTTP Basic Auth username | | `password` | string | "changeme" | HTTP Basic Auth password | | `max_log_entries` | int | 1000 | Recent-request log buffer size | When enabled, the admin server binds on `127.0.0.1:` only, gates every request behind HTTP Basic auth, and applies a 60-rps per-IP rate limit. Endpoints: | Path | Description | |------|-------------| | `GET /api/health` | Liveness check returning `{"status":"ok"}`. | | `GET /api/openapi.json` | Emitted OpenAPI 3.0 document for the running pipeline. | | `GET /api/openapi.yaml` | Same document in YAML. | | `POST /admin/reload` | Re-read the on-disk config file and hot-swap the pipeline. Single-flight; concurrent calls return 409. | | `GET /admin/drift` | Compare the on-disk config file against the loaded baseline. See below. | Unauthenticated requests get a 401 with a `WWW-Authenticate: Basic` header. Requests from outside `127.0.0.1` are dropped at the socket level. #### `GET /admin/drift` Returns whether the on-disk config file has diverged from what the running proxy has loaded, without triggering a reload. K8s operators and dashboards scrape this so they can flag a config that was edited on disk but not yet hot-reloaded. Response shape (200 OK): ```json { "config_path": "/etc/sbproxy/sb.yml", "loaded_revision": "a3f5b1d829c4", "loaded_content_hash": "8e1c5d4a9f7b", "on_disk_content_hash": "8e1c5d4a9f7b", "drift": false, "on_disk_size_bytes": 4321, "checked_at": "2026-05-06T15:42:00Z" } ``` * `loaded_revision` is the 12-char origin-set identity hash from the running pipeline. Stable when only policies, transforms, or ports change; moves when origins or hostnames are added or removed. * `loaded_content_hash` is the 12-char SHA-256 prefix of the raw YAML bytes captured at load time (startup or last successful `/admin/reload`). * `on_disk_content_hash` is the same hash recomputed against the current file contents. * `drift` is `true` iff the two content hashes differ. Failure modes: * `503` - the admin server has no on-disk config path (constructed without `with_config_path`, e.g. tests), or no content-hash baseline has been captured yet (no startup load and no successful reload). * `500` - the on-disk file could not be read. The error message has the absolute path scrubbed so the response does not leak the operator's filesystem layout. * `405` - any verb other than `GET`. ### Metrics fields | Field | Type | Default | Description | |-------|------|---------|-------------| | `max_cardinality_per_label` | int | 1000 | Default cap on unique label values per metric. New values are collapsed to `__other__`. | | `cardinality.hostname_cap` | int | 200 | Optional override for the `hostname` label budget. Useful for high-tenant-count deployments and deterministic overflow tests. | ### access_log Top-level block (sibling of `proxy:` and `origins:`) that turns on structured-JSON access logging. Off by default. When enabled, every completed request emits one JSON line at info level via the `access_log` tracing target after status, method, and sampling filters apply. Secrets are redacted before the line is written. See [Access log](access-log.md) for the full record shape. ```yaml access_log: enabled: true sample_rate: 1.0 status_codes: [] # empty = log every status methods: [] # empty = log every method ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `enabled` | bool | `false` | Master switch. When false, no access-log lines are emitted. | | `sample_rate` | float | `1.0` | Probability in `[0.0, 1.0]` that a matching request is logged. | | `status_codes` | list | `[]` | HTTP status codes to log. Empty matches every status. | | `methods` | list | `[]` | HTTP methods to log (case-insensitive). Empty matches every method. | ### Alerting fields The `proxy.alerting` block defines notification channels that receive alert events from the runtime. ```yaml proxy: alerting: channels: - type: webhook url: https://hooks.example.com/sbproxy headers: X-Auth: ${ALERT_TOKEN} - type: log ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `channels` | list | `[]` | Notification channels. | | `channels[].type` | string | required | Channel type. Supported: `webhook`, `log`. | | `channels[].url` | string | | Webhook URL. Required when `type` is `webhook`. | | `channels[].headers` | map | `{}` | Extra HTTP headers added to webhook deliveries. | | `channels[].secret` | string | | Optional shared secret. When set, the dispatcher signs the payload with HMAC-SHA256 and emits `X-Sbproxy-Signature: v1=`. Receivers verify with `.`. See [Webhook envelope and signing](#webhook-envelope-and-signing). | Alert webhook deliveries also include the standard `X-Sbproxy-*` identity headers (`Event`, `Instance`, `Rule`, `Severity`, `Timestamp`) and a `User-Agent: sbproxy/`. The body is wrapped in an envelope: ```json { "event": "alert", "proxy": { "instance_id": "...", "version": "..." }, "alert": { "rule": "...", "severity": "...", "message": "...", "timestamp": "...", "labels": { ... } } } ``` ### l2_cache_settings The `l2_cache_settings` block points the proxy at a shared key-value backend used for cluster-wide rate limit counters and (optionally) response cache entries. When unset, every replica keeps its own in-memory state. The deserializer also accepts `l2_cache:` as an alias. The `driver` field selects the backend; `params` is a flat string map whose keys depend on the driver. Only the `redis` driver is implemented in the Rust proxy today. ```yaml proxy: l2_cache_settings: driver: redis params: dsn: redis://redis.internal:6379/0 ``` `params` keys for the `redis` driver: | Key | Type | Default | Description | |-----|------|---------|-------------| | `dsn` | string | | Connection string. Accepts `redis://[user[:pass]@]host:port[/db]`, `rediss://...`, or a bare `host:port`. The database index in the path is parsed but ignored by the single-connection RESP client. | Pool size and acquire timeout are not exposed via `params` and use built-in defaults (pool size 8, acquire timeout 5 seconds). ### messenger_settings The `messenger_settings` block configures the message bus the proxy uses for inter-component events such as config updates and semantic-cache purges. When unset, the proxy runs without a bus, which is fine for single-replica deployments. The `driver` field picks the implementation; `params` is a flat string map whose keys depend on the driver. Unknown driver names cause startup to error. ```yaml proxy: messenger_settings: driver: redis params: dsn: redis://redis.internal:6379 ``` Supported drivers and their `params` keys: `memory` takes no `params`. It uses bounded in-process channels and only works for a single replica. `redis`: | Key | Type | Default | Description | |-----|------|---------|-------------| | `dsn` | string | `redis://127.0.0.1:6379` | Redis connection string. Same parsing rules as the L2 cache `dsn`. | `sqs` (all required): | Key | Type | Description | |-----|------|-------------| | `queue_url` | string | Full SQS queue URL. | | `region` | string | AWS region the queue lives in. | | `api_key` | string | AWS access key used to sign requests. | `gcp_pubsub` (all required): | Key | Type | Description | |-----|------|-------------| | `project` | string | GCP project ID that owns the topic. | | `topic` | string | Pub/Sub topic name. | | `subscription` | string | Pub/Sub subscription name. | | `access_token` | string | OAuth2 access token used on requests. | --- ## Tenants SBproxy is a multi-tenant gateway. A tenant scope groups an operator's tenant of record (a customer, a deployment slice, a regulatory boundary) so the same proxy binary can serve isolated configurations. Every origin resolves to exactly one tenant; downstream auth, policy, and vault resolution picks the tenant-scoped config block before falling back to proxy-level defaults. For single-tenant deployments the synthetic `__default__` tenant is used implicitly; no operator action is required and existing configs see no behaviour change. ```yaml proxy: tenants: - id: acme-corp - id: beta-corp origins: api.acme.example.com: tenant_id: acme-corp action: type: ai_proxy url: https://api.openai.com api.beta.example.com: tenant_id: beta-corp action: type: ai_proxy url: https://api.anthropic.com ``` ### Field schema | Field | Type | Default | Description | |-------|------|---------|-------------| | `proxy.tenants[].id` | string | required | Stable identifier. Referenced from `origin.tenant_id` and stamped on every request the origin serves. Max 256 ASCII characters. The literal `__default__` is reserved and cannot be declared. | ### Resolution rules - A request matches an origin by hostname. The origin's `tenant_id` (or `__default__`) becomes `RequestContext.tenant_id` for the rest of the request lifecycle. - An origin that names an undeclared tenant fails config compile so an operator's typo surfaces at startup rather than at request time. - An empty `proxy.tenants:` list is the same as omitting it; every origin resolves to `__default__`. ### Credentials at the tenant scope Each tenant can declare its own `credentials:` block alongside the proxy default. Resolution at request time walks origin → tenant → proxy. The same credential `name:` re-declared at a more specific scope shadows the broader scope, so a tenant can override the proxy default key + budget without rewriting the rest. See [Credentials block](#credentials-block) below and `docs/migration-credentials.md` for the worked migration from the legacy `virtual_keys:` shape. --- ## Origins Each key under `origins` is a hostname. When a request arrives, SBproxy matches the `Host` header to an origin key and applies that origin's configuration. Every origin must have an `action` block. ```yaml origins: "api.example.com": force_ssl: true allowed_methods: [GET, POST, PUT, DELETE] action: type: proxy url: https://backend.internal:8080 ``` ### Hostname matching - Exact match: `"api.example.com"` matches only `api.example.com`. - Wildcard match: `"*.example.com"` matches `api.example.com`, `www.example.com`, and so on. The wildcard must be the first character and only covers one subdomain level. - Multiple origins: define as many as you need. Each has independent auth, policies, and routing. ### Origin fields | Field | Type | Default | Description | |-------|------|---------|-------------| | `action` | object | required | What to do with the request (proxy, redirect, static, etc.). | | `tenant_id` | string | `__default__` | Tenant this origin resolves to. Must match a `proxy.tenants[].id`; absent uses the synthetic `__default__` tenant. Stamped on the request context for auth / policy / vault resolution. See [Tenants](#tenants). | | `authentication` | object | | Auth provider. Alias: `auth`. | | `policies` | list | | Policy enforcers (rate limit, IP filter, WAF, etc.). | | `transforms` | list | | Body transforms applied in order. | | `request_modifiers` | list | | Header / URL / query / body / script edits before the action. | | `response_modifiers` | list | | Header / status / body / script edits after the action. | | `cors` | object | | CORS header injection. | | `hsts` | object | | HSTS header injection. | | `compression` | object | | Response compression. | | `session` | object | | Session cookie settings. Alias: `session_config`. | | `force_ssl` | bool | false | Redirect plain HTTP requests to HTTPS. | | `allowed_methods` | list | empty (allow all) | Whitelist of HTTP methods. | | `forward_rules` | list | | Path / header / IP rules that route to inline child origins. | | `fallback_origin` | object | | Inline origin served when the primary upstream errors or returns a configured status. See [Fallback origin](#fallback-origin). | | `response_cache` | object | | Per-origin response cache. | | `variables` | map | | Static template variables. | | `on_request` | list | | Webhook callbacks invoked when a request enters the origin. Each entry accepts `url`, `method` (default POST), `secret` (HMAC), `timeout` (seconds), `on_error`. Lua callbacks are also accepted. See [Webhook envelope and signing](#webhook-envelope-and-signing). | | `on_response` | list | | Same shape as `on_request`; fired after the upstream response is observed. Payload includes `status` and `duration_ms`. | | `mirror` | object | | Shadow traffic configuration. See [Request mirror](#request-mirror). | | `bot_detection` | object | | Bot detection config. | | `threat_protection` | object | | IP reputation / blocklist config. | | `rate_limit_headers` | object | | `X-RateLimit-*` and `Retry-After` header configuration. | | `error_pages` | list | | Custom error pages keyed by status code or class. | | `problem_details` | object | | RFC 9457 `application/problem+json` default renderer. Composes with `error_pages`. | | `traffic_capture` | object | | Traffic capture / mirroring. | | `message_signatures` | object | | RFC 9421 HTTP message signatures. | | `idempotency` | object | | RFC 8594 idempotency middleware. See [Idempotency](#idempotency). | | `connection_pool` | object | | Per-origin connection pool tuning. | | `extensions` | object | | Opaque map for enterprise / third-party origin-level blocks. | ### Origin architecture Every origin config block supports the fields above as siblings. They sit at the same level as `action`, never inside it: ```yaml origins: "api.example.com": action: { ... } # Required authentication: { ... } # Optional policies: [ ... ] # Optional transforms: [ ... ] # Optional request_modifiers: [ ... ] # Optional response_modifiers: [ ... ] # Optional forward_rules: [ ... ] # Optional response_cache: { ... } # Optional variables: { ... } # Optional session: { ... } # Optional cors: { ... } # Optional compression: { ... } # Optional hsts: { ... } # Optional connection_pool: { ... } # Optional ``` --- ## Actions The `action` block defines what the proxy does with a matched request. The `type` field selects the handler. ### proxy Forward requests to an upstream URL. The most common action type, and the right choice when SBproxy sits in front of an existing backend. ```yaml origins: "api.example.com": action: type: proxy url: https://backend.internal:8080 strip_base_path: false preserve_query: true ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `url` | string | required | Upstream URL to forward requests to | | `strip_base_path` | bool | false | Strip the matched origin path before forwarding | | `preserve_query` | bool | false | Forward the original query string to the upstream | | `host_override` | string | unset | Override the upstream `Host` header. Default is the upstream URL's hostname (so vhost-routed services like Vercel, Cloudflare-fronted origins, S3, ALBs work without configuration). See [Host header semantics](#host-header-semantics). | | `sni_override` | string | unset | Override the SNI server name sent during the upstream TLS handshake (and the cert verification target). Use when the cert's hostname differs from the URL host. See [Origin overrides](#origin-overrides). | | `resolve_override` | string | unset | Pin the upstream connect address, bypassing DNS for the URL host. Accepts `ip`, `ip:port`, `[ipv6]:port`, or `host:port`. Equivalent to `curl --connect-to`. See [Origin overrides](#origin-overrides). | | `service_discovery` | object | unset | DNS-based service discovery. Re-resolves the upstream hostname on a TTL. See [Service discovery](#service-discovery). | | `disable_forwarded_host_header` | bool | false | Suppress the `X-Forwarded-Host` header that the proxy would otherwise set to the client's original `Host` whenever it rewrites the upstream `Host`. | | `disable_forwarded_for_header` | bool | false | Suppress `X-Forwarded-For` (the client IP appended to the chain). | | `disable_real_ip_header` | bool | false | Suppress `X-Real-IP`. | | `disable_forwarded_proto_header` | bool | false | Suppress `X-Forwarded-Proto` (`http`/`https`). | | `disable_forwarded_port_header` | bool | false | Suppress `X-Forwarded-Port` (the listener port). | | `disable_forwarded_header` | bool | false | Suppress the RFC 7239 `Forwarded` header. | | `disable_via_header` | bool | false | Suppress the `Via: 1.1 sbproxy` header. | | `retry` | object | unset | Upstream retry policy. See [Upstream retries](#upstream-retries). | The same `host_override` and `disable_*_header` flags are accepted on every URL-bearing action: `proxy`, `load_balancer` targets, `websocket`, `grpc` (via the `:authority` field), `graphql`, `a2a`, and `forward_auth`. ### static Return a fixed response without proxying to any upstream. Good for health check endpoints, maintenance pages, and mock APIs. ```yaml origins: "status.example.com": action: type: static status: 200 content_type: application/json json_body: status: healthy version: "2.1.0" services: database: up cache: up ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `status` | int | 200 | HTTP status code (alias: `status_code`) | | `content_type` | string | | Content-Type header | | `body` | string | | Plain text or HTML body (alias: `text_body`) | | `json_body` | object | | JSON body. Auto-sets Content-Type to application/json. Overrides `body`. | | `headers` | map | | Additional response headers | ### redirect Return an HTTP redirect. Common uses: domain migrations, HTTPS enforcement, URL shortening, large URL lookup tables. ```yaml origins: "old.example.com": action: type: redirect url: https://new.example.com status: 302 preserve_query: true ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `url` | string | required* | Redirect target URL. Required when `bulk_list` is unset. | | `status` | int | 302 | HTTP status code (alias: `status_code`). | | `preserve_query` | bool | false | Preserve original query string. | | `bulk_list` | object | unset | Per-origin bulk redirect source. See [bulk-redirects.md](bulk-redirects.md). | `bulk_list` accepts three source types: `inline` (rows embedded in YAML), `file` (CSV or YAML on disk; CSV detected by `.csv` suffix), and `url` (HTTPS document fetched at config-load). Per-row `status` and `preserve_query` overrides win when set; otherwise rows inherit the action's defaults. Unmapped paths fall through to the action's `url:` (or 404 when `url:` is empty). ```yaml origins: "marketing.local": action: type: redirect status_code: 301 preserve_query: true bulk_list: type: file path: /etc/sbproxy/marketing-redirects.csv ``` ### echo Return the incoming request as a JSON response. Handy for debugging proxy behavior, testing forward rules, and verifying that headers and auth are set up correctly. Echo takes no fields. ```yaml origins: "debug.example.com": action: type: echo ``` ### mock Return a fixed JSON response for API mocking. Optionally injects an artificial delay so you can test slow-backend behavior. ```yaml origins: "mock.example.com": action: type: mock status: 200 body: ok: true message: "mocked" headers: X-Mock: "true" delay_ms: 250 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `status` | int | 200 | HTTP status code | | `body` | object | `null` | JSON body returned to the client | | `headers` | map | | Additional response headers | | `delay_ms` | int | | Optional artificial delay in milliseconds | ### beacon Return a 1x1 transparent GIF. Useful for tracking pixel endpoints. Beacon takes no fields. ```yaml origins: "px.example.com": action: type: beacon ``` ### load_balancer Distribute traffic across multiple backend targets when you have several instances of a service. ```yaml origins: "api.example.com": action: type: load_balancer algorithm: round_robin targets: - url: https://backend-1.internal:8080 weight: 70 - url: https://backend-2.internal:8080 weight: 30 sticky: cookie_name: sb_sticky ttl: 3600 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `targets` | list | required | Backend targets. | | `algorithm` | string \| object | `round_robin` | Routing algorithm (see below). | | `sticky` | object | | Sticky-session config: `cookie_name` (default `sb_sticky`), `ttl` seconds. | | `deployment_mode` | object | `{mode: normal}` | Deployment mode. See below. | | `outlier_detection` | object | unset | Passive ejection policy. See [Outlier detection](#outlier-detection). | Algorithms: | Algorithm | Description | |-----------|-------------| | `round_robin` | Cycle through active targets in order (default). | | `weighted_random` | Pick a target with probability proportional to its weight. | | `least_connections` | Route to the target with the fewest in-flight requests. | | `ip_hash` | Hash the client IP to a target (sticky by client). | | `uri_hash` | Hash the request URI to a target (sticky by path). | | `header_hash` | Hash a named request header. Configured as `algorithm: { header_hash: { header: X-User } }`. | | `cookie_hash` | Hash a named cookie. Configured as `algorithm: { cookie_hash: { cookie: sid } }`. | Target fields: | Field | Type | Default | Description | |-------|------|---------|-------------| | `url` | string | required | Backend URL. | | `weight` | int | 1 | Weight used by weighted algorithms. | | `backup` | bool | false | Reserved for fallback. Excluded from normal selection. | | `group` | string | | Deployment group label (`blue`, `green`, `canary`). | | `priority` | int | 5 | Routing priority (1 = highest, 10 = lowest). Read from `X-Priority` header when not set here. | | `zone` | string | | Availability zone or region label for locality-aware routing. | | `health_check` | object | | Active health-check probe config. See [Active health checks](#active-health-checks). | | `host_override` | string | unset | Override the upstream `Host` for this target. Default is the target URL's hostname. | | `disable_*_header` | bool | false | Same per-header opt-outs as on `proxy` actions; see [Forwarding headers](#trusted-proxies-and-forwarding-headers). | #### Blue-green deployments Route 100% of traffic to the named active group. Targets must have a `group` field set to `blue` or `green`. ```yaml action: type: load_balancer deployment_mode: mode: blue_green active: green targets: - url: https://blue.internal:8080 group: blue - url: https://green.internal:8080 group: green ``` #### Canary deployments Route a configurable percentage of requests to canary targets (group `canary`); remaining traffic goes to primary targets. ```yaml action: type: load_balancer deployment_mode: mode: canary weight: 10 # 10% to canary targets: - url: https://primary.internal:8080 - url: https://canary.internal:8080 group: canary ``` ### websocket Proxy WebSocket connections for real-time applications, chat systems, and streaming APIs. ```yaml origins: "ws.example.com": action: type: websocket url: wss://ws-backend.internal:8080 subprotocols: [graphql-ws, graphql-transport-ws] max_message_size: 5242880 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `url` | string | required | Backend WebSocket URL (ws:// or wss://) | | `subprotocols` | list | | Supported WebSocket subprotocols | | `max_message_size` | int | 10485760 | Maximum message payload size in bytes (10 MB) | ### grpc Proxy gRPC traffic for microservice architectures. ```yaml origins: "grpc.example.com": action: type: grpc url: grpcs://grpc-backend.internal:50051 tls: true authority: grpc-backend.internal timeout_secs: 30 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `url` | string | required | Backend gRPC URL (`grpc://`, `grpcs://`, `http://`, `https://`) | | `tls` | bool | false | Force TLS regardless of URL scheme | | `authority` | string | | Override the HTTP/2 `:authority` pseudo-header | | `timeout_secs` | int | 30 | Request timeout in seconds | ### ai_proxy Route requests across LLM providers with automatic failover, cost tracking, and content-based routing. Supports 66 native providers behind one OpenAI-compatible API; the model name passes straight through, so any model a provider serves is reachable. For full details, see [ai-gateway.md](ai-gateway.md) and [providers.md](providers.md). ```yaml origins: "ai.example.com": action: type: ai_proxy providers: - name: openai api_key: ${OPENAI_API_KEY} models: [gpt-4o, gpt-4o-mini, gpt-4-turbo] default_model: gpt-4o-mini - name: anthropic api_key: ${ANTHROPIC_API_KEY} models: [claude-sonnet-4-20250514, claude-3-5-haiku-20241022] routing: fallback_chain allowed_models: [gpt-4o, gpt-4o-mini, claude-3-5-haiku-20241022] blocked_models: [] max_body_size: 4194304 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `providers` | list | required | Configured upstream AI providers. | | `routing` | string \| object | `round_robin` | Routing strategy. Either a flat string or `{strategy: ..., ...}`. | | `allowed_models` | list | empty (allow all) | Allow-list of model names. | | `blocked_models` | list | | Block-list of model names. Takes precedence over allow-list. | | `max_body_size` | int | | Maximum request body size in bytes. | | `guardrails` | object | | Input/output guardrails pipeline. | | `budget` | object | | Budget enforcement configuration. | | `virtual_keys` | list | | Virtual API keys mapped to provider keys and scopes. | | `model_rate_limits` | map | | Per-model rate limit overrides keyed by model name. | | `per_surface_rate_limits` | map | | Per-surface rate limit overrides keyed by AI surface label (`chat_completions`, `assistants`, `image_generation`, ...). | | `max_concurrent` | map | | Maximum concurrent in-flight requests per provider. | | `resilience` | object | | Per-provider circuit breaker, outlier detection, and active health probes. | | `shadow` | object | | Side-by-side eval: mirror each request to a second provider and log metrics. | Routing strategies: `round_robin`, `weighted`, `fallback_chain`, `random`, `lowest_latency`, `least_connections`, `cost_optimized`, `token_rate`, `least_token_usage`, `prefix_affinity`, `peak_ewma`, `sticky`, `race`, `cascade`, `cost_quality`. See [ai-gateway.md](ai-gateway.md#routing-strategies) for each. `default_model` is a per-provider field, not an action-level field. Set it on each `providers[]` entry. #### AI provider fields (`providers[]`) | Field | Type | Default | Description | |-------|------|---------|-------------| | `name` | string | required | Unique provider name used to reference this entry. | | `provider_type` | string | inferred from `name` | Provider type (`openai`, `anthropic`, `google`, etc.). | | `api_key` | string | | API key used to authenticate with the upstream. | | `base_url` | string | provider default | Override the upstream base URL. Validated at config load: non-`http(s)` schemes and private/loopback targets are rejected as SSRF risks unless `allow_private_base_url` is set. | | `allow_private_base_url` | bool | `false` | Allow `base_url` to point at a loopback/private address (a local model server). The scheme check still applies. | | `models` | list | `[]` | Models served by this provider; empty defers to the provider catalog. | | `default_model` | string | | Model used when the request omits an explicit model. | | `model_map` | map | `{}` | Logical to upstream model name mapping. | | `weight` | int | 1 | Weight used by weighted routing strategies. | | `priority` | int | unset | Priority used by priority routing (lower runs first). | | `enabled` | bool | true | When false, this provider is skipped during routing. | | `max_retries` | int | unset | Maximum retries on transient upstream failures. | | `timeout_ms` | int | unset | Request timeout in milliseconds. | | `organization` | string | | Organization identifier for providers that scope keys per org. | | `api_version` | string | | API version header value (e.g. for Anthropic and Azure OpenAI). | #### Virtual keys (`virtual_keys[]`) Virtual API keys map a client-facing key to provider keys, model allow-lists, and per-key rate limits. ```yaml virtual_keys: - key: vk-prod-abc123 name: production-app allowed_models: [gpt-4o-mini, claude-3-5-haiku-20241022] blocked_models: [] allowed_providers: [openai, anthropic] max_tokens_per_minute: 10000 max_requests_per_minute: 60 budget: max_tokens: 1000000 max_cost_usd: 50.0 tags: [team-frontend] enabled: true ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `key` | string | required | The virtual key string clients send. | | `name` | string | | Human-readable label. | | `allowed_models` | list | `[]` | Models this key may use. Empty allows all. | | `blocked_models` | list | `[]` | Models this key is blocked from using. | | `allowed_providers` | list | `[]` | Providers this key may route to. Empty allows all. | | `max_tokens_per_minute` | int | unset | Per-key tokens-per-minute limit. | | `max_requests_per_minute` | int | unset | Per-key requests-per-minute limit. | | `budget` | object | | Per-key total budget (`max_tokens`, `max_cost_usd`). | | `tags` | list | `[]` | Free-form tags surfaced in metrics. | | `enabled` | bool | true | When false, the key is rejected. | #### Budget (`budget`) | Field | Type | Default | Description | |-------|------|---------|-------------| | `limits` | list | `[]` | Budget rules. See below. | | `on_exceed` | string | `block` | Action when a limit is hit: `block`, `log`, `downgrade`. | Each `limits[]` entry: | Field | Type | Default | Description | |-------|------|---------|-------------| | `scope` | string | required | `workspace`, `api_key`, `user`, `model`, `origin`, or `tag`. | | `max_tokens` | int | unset | Maximum tokens for this scope. | | `max_cost_usd` | float | unset | Maximum spend in USD for this scope. | | `period` | string | unset | Time window: `daily`, `monthly`, `total`. | | `downgrade_to` | string | | Model to swap to when `on_exceed: downgrade`. | #### Per-model rate limits (`model_rate_limits`) Keyed by model name; each entry has `requests_per_minute` and `tokens_per_minute`. ```yaml model_rate_limits: gpt-4o: requests_per_minute: 60 tokens_per_minute: 200000 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `requests_per_minute` | int | unset | Requests-per-minute cap for this model. | | `tokens_per_minute` | int | unset | Tokens-per-minute cap for this model. | #### Per-surface rate limits (`per_surface_rate_limits`) Keyed by AI surface label. The labels are the same stable strings emitted on the `sbproxy_ai_surface_requests_total` metric: `chat_completions`, `models`, `embeddings`, `assistants`, `threads`, `batches`, `fine_tuning`, `files`, `realtime`, `image_generation`, `image_edits`, `image_variations`, `audio_transcription`, `audio_speech`, `moderations`, `reranking`. Surfaces without an entry are uncapped. When the cap is hit, the proxy returns 429 before any upstream call. ```yaml per_surface_rate_limits: image_generation: requests_per_minute: 30 audio_speech: requests_per_minute: 60 chat_completions: requests_per_minute: 600 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `requests_per_minute` | int | unset | Requests-per-minute cap for this surface. Sliding one-minute window, shared globally across the process. | #### Guardrails (`guardrails`) | Field | Type | Default | Description | |-------|------|---------|-------------| | `input` | list | `[]` | Guardrails evaluated against the incoming request body. | | `output` | list | `[]` | Guardrails evaluated against the model output. | Each entry is an object with a `type` field and type-specific config. Built-in types: `pii`, `secrets`, `injection` (alias `prompt_injection`), `toxicity`, `jailbreak`, `content_safety`, `schema`, `regex`, `regex_guard`. See [ai-gateway.md](ai-gateway.md) for per-guardrail fields. See the [AI Gateway Guide](ai-gateway.md) for CEL selectors, Lua hooks, guardrails, context window validation, cost headers, and streaming behavior. #### Resilience (`resilience`) Three independent signals that eject misbehaving providers from the routing pool. Any signal alone is enough to skip a provider; when every provider is ejected, the router falls back to the unfiltered enabled list rather than returning no provider at all. ```yaml resilience: circuit_breaker: failure_threshold: 5 # consecutive 5xx / transport errors before opening success_threshold: 2 # half-open successes before closing open_duration_secs: 30 # cooldown before half-open probe outlier_detection: threshold: 0.5 # eject when failure rate >= 50% window_secs: 60 # sliding window min_requests: 5 # minimum sample before ejecting ejection_duration_secs: 30 health_check: path: /models # GET endpoint probed on each provider interval_secs: 30 timeout_ms: 5000 unhealthy_threshold: 3 healthy_threshold: 2 ``` When `resilience` is set, retries fan across providers up to `min(providers.len(), 5)` attempts; ejected providers are skipped on the second and later attempts. #### Shadow (`shadow`) Mirrors each request to a second provider concurrently. The primary's response is what the client sees; the shadow body is drained and metrics are logged at `target: sbproxy_ai_shadow` (status, latency, prompt/completion tokens, finish_reason). Useful for prompt regression checks before swapping a primary model. ```yaml shadow: provider: anthropic # must also appear in `providers` model: claude-3-5-haiku-latest # optional override; defaults to client's model sample_rate: 0.1 # mirror 10% of traffic; 1.0 mirrors all timeout_ms: 30000 ``` #### Race strategy (`routing.strategy: race`) Fans the request out to every eligible provider in parallel; returns the first 2xx and cancels the in-flight losers. Failures still feed `resilience` so persistently slow providers eventually drop out of the eligible set. Use sparingly: race fans up your provider spend by N until one wins. ```yaml routing: strategy: race providers: - name: openai api_key: ${OPENAI_API_KEY} - name: anthropic api_key: ${ANTHROPIC_API_KEY} ``` ### graphql Proxy GraphQL requests to an upstream HTTP endpoint with optional query depth limiting and introspection control. ```yaml origins: "graphql.example.com": action: type: graphql url: https://graphql-backend.internal/graphql max_depth: 10 allow_introspection: false validate_queries: true ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `url` | string | required | Backend GraphQL endpoint URL (`http://` or `https://`). | | `max_depth` | int | 0 | Maximum query nesting depth. `0` means unlimited. | | `allow_introspection` | bool | true | When false, introspection queries are rejected. | | `validate_queries` | bool | false | When true, validate incoming GraphQL queries. | ### storage Serve files from an object storage backend (S3, GCS, Azure Blob, or local filesystem). The OSS implementation currently returns a 501 placeholder; the action exists so configs validate and for future runtime support. ```yaml origins: "static.example.com": action: type: storage backend: s3 bucket: my-public-assets prefix: web/ index_file: index.html ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `backend` | string | required | One of `s3`, `gcs`, `azure`, `local`. | | `bucket` | string | | Bucket name. Required for `s3`, `gcs`, and `azure`. | | `prefix` | string | | Key prefix prepended to request paths. May not contain `..` segments or NUL bytes. | | `path` | string | | Local filesystem root. Required for `backend: local`. May not contain `..` segments or NUL bytes. | | `index_file` | string | | Index file served for directory requests (e.g. `index.html`). May not contain `..` segments or NUL bytes. | ### a2a Proxy requests to an Agent-to-Agent (A2A) endpoint that speaks the Google A2A protocol. The agent card metadata can be cached locally for discovery. ```yaml origins: "agent.example.com": action: type: a2a url: https://agent-backend.internal/a2a agent_card: name: SearchAgent version: "1.0" capabilities: [text, tool-use] ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `url` | string | required | Upstream agent URL. | | `agent_card` | object | | Cached A2A agent card (free-form JSON). | --- ## Authentication The `authentication` block is a sibling of `action`, not nested inside it. It controls who can access the origin. SBproxy ships eight built-in auth providers: `api_key`, `basic_auth`, `bearer`, `jwt`, `digest`, `forward_auth`, `bot_auth`, and `noop`. `bot_auth` verifies cryptographically-signed AI agents per RFC 9421 + the IETF Web Bot Auth draft. Full reference: [web-bot-auth.md](web-bot-auth.md). Anything else falls through to the inventory-based auth plugin registry, so a linked third-party crate can register additional types (`oauth`, `oauth_introspection`, `oauth_client_credentials`, `ext_authz`, `biscuit`, `saml`, ...) without patching the OSS engine. Plugins register on the typed `AuthPluginRegistration` channel and surface through the standard `authentication.type` config field. ### api_key Authenticate requests with an API key. Keys are checked in the `X-Api-Key` header by default; an optional `query_param` lets clients pass keys via the URL. Typical fit: machine-to-machine API access. ```yaml origins: "api.example.com": action: type: proxy url: https://backend.internal:8080 authentication: type: api_key api_keys: - ${API_KEY_1} - ${API_KEY_2} query_param: api_key ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `api_key` | | `api_keys` | list | required | Accepted API keys | | `header_name` | string | `X-Api-Key` | Header carrying the API key | | `query_param` | string | | When set, keys can be supplied via the named URL query parameter | Test with: ```bash curl -H "Host: api.example.com" -H "X-Api-Key: your-key-here" http://localhost:8080/ ``` ### basic_auth HTTP Basic Authentication with username/password pairs. Fits simple internal services and admin panels. ```yaml origins: "admin.example.com": action: type: proxy url: https://admin-backend.internal:8080 authentication: type: basic_auth users: - username: admin password: ${ADMIN_PASSWORD} - username: readonly password: ${READONLY_PASSWORD} realm: "Admin Panel" ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `basic_auth` | | `users` | list | required | Username/password pairs | | `realm` | string | | Optional realm shown in the `WWW-Authenticate` challenge | ### bearer Authenticate with Bearer tokens in the Authorization header. The default for token-based service auth. ```yaml origins: "api.example.com": action: type: proxy url: https://backend.internal:8080 authentication: type: bearer tokens: - ${SERVICE_TOKEN_1} - ${SERVICE_TOKEN_2} ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `bearer` | | `tokens` | list | required | Accepted bearer tokens (each entry is either the raw secret or `{secret, dpop_jkt, ...}`) | | `require_dpop` | bool | `false` | When `true`, every accepted token MUST come with a valid RFC 9449 DPoP proof whose `jkt` matches the token entry's `dpop_jkt` metadata. Tokens without `dpop_jkt` metadata fail closed. | #### Sender-constrained Bearer (RFC 9449) DPoP binds an opaque bearer token to a proof-of-possession key so a stolen token alone is not enough to access the resource. The operator stamps the JWK thumbprint of the expected key on each bearer entry; the proxy reads the `DPoP:` header on every request and verifies the proof against the stamped thumbprint. ```yaml authentication: type: bearer require_dpop: true tokens: - secret: ${SERVICE_TOKEN_1} dpop_jkt: "NzbLsXh8uDCcd-6MNwXF4W_7noWXFZAfHkxZsRGC9Xs" - secret: ${SERVICE_TOKEN_2} dpop_jkt: "8WGoq1lXk-3z7AIuS-XwSeUGzqQ3LtIMOvbf2bZj0Vk" ``` The `dpop_jkt` value is the RFC 7638 SHA-256 thumbprint of the client's DPoP signing key, base64url-no-pad. Deriving it once per client is a one-shot operator step (most identity systems publish it alongside the client's other registration data). ### jwt Validate JSON Web Tokens. Supports JWKS endpoints for key rotation and claims validation. Pick this for OAuth2/OIDC-protected APIs. ```yaml origins: "api.example.com": action: type: proxy url: https://backend.internal:8080 authentication: type: jwt jwks_url: https://auth.example.com/.well-known/jwks.json issuer: https://auth.example.com audience: my-api algorithms: [RS256] required_claims: scope: api:read ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `jwt` | | `secret` | string | | HMAC signing secret (HS256/HS384/HS512) | | `jwks_url` | string | | URL to fetch JWKS from (RS / ES / PS family) | | `issuer` | string | | Required `iss` claim value | | `audience` | string | | Required `aud` claim value | | `algorithms` | list | inferred | Allowed signing algorithms. Defaults to HS256/HS384/HS512 with `secret`, RS256 with `jwks_url`. | | `required_claims` | map | | Claims that must be present and equal to the configured value. | | `require_dpop` | bool | `false` | When `true`, the JWT MUST come with a valid RFC 9449 DPoP proof whose `jkt` matches the token's `cnf.jkt` claim. Tokens without a `cnf.jkt` claim fail closed. | | `require_mtls_bound` | bool | `false` | When `true`, the JWT's `cnf.x5t#S256` claim MUST match the SHA-256 thumbprint of the inbound TLS client cert (RFC 8705 mutual-TLS-bound tokens). | The list must contain at least one entry; an empty list rejects all tokens. Bearer tokens must be supplied via `Authorization: Bearer `. #### Sender-constrained JWT (RFC 9449 + RFC 8705) Both `require_dpop` and `require_mtls_bound` may be set together on the same provider; the request must satisfy BOTH constraints. The two constraints are independent: * **DPoP** (RFC 9449) binds the token to a proof-of-possession key the client signs with on every request. The token's `cnf.jkt` claim is the SHA-256 thumbprint of that key; the proxy reads the `DPoP:` header and verifies. * **mTLS-bound** (RFC 8705) binds the token to the SHA-256 thumbprint of the TLS client cert the resource server saw on the connection. The token's `cnf.x5t#S256` claim carries the thumbprint; the proxy compares against the inbound client cert. ```yaml authentication: type: jwt jwks_url: https://auth.example.com/.well-known/jwks.json issuer: https://auth.example.com audience: my-api require_dpop: true require_mtls_bound: true ``` Both flags default to `false` so existing JWT configurations keep their unbound semantics. Turn them on per-route as the issuer starts minting `cnf.jkt` / `cnf.x5t#S256` tokens. ### digest HTTP Digest Authentication (RFC 7616). The right pick when a legacy system insists on digest auth. The stored `password` is the HA1 hash, `MD5(username:realm:password)`, not the plaintext password. ```yaml origins: "legacy.example.com": action: type: proxy url: https://legacy-backend.internal:8080 authentication: type: digest realm: "Legacy" users: - username: alice password: ${ALICE_HA1} ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `digest`. | | `realm` | string | required | Realm string sent in the `WWW-Authenticate` challenge. | | `users` | list or map | required | Accepted users. Either a list of `{username, password}` objects, or a map of `username: ha1_hex`. | ### forward_auth Delegate authentication to an external service. SBproxy sends a subrequest to the auth service and uses the response status to allow or deny the original request. The right choice when auth logic lives in its own service. ```yaml origins: "api.example.com": action: type: proxy url: https://backend.internal:8080 authentication: type: forward_auth url: https://auth.internal/verify method: GET timeout: 5000 headers_to_forward: [Authorization, Cookie] trust_headers: [X-User-ID, X-User-Email, X-User-Roles] success_status: 200 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `forward_auth` | | `url` | string | required | External auth service URL | | `method` | string | GET | HTTP method for the subrequest | | `timeout` | int | | Subrequest timeout in milliseconds | | `headers_to_forward` | list | | Headers to copy from the original request. Alias: `forward_headers`. | | `trust_headers` | list | | Headers from the auth response to inject into the upstream request | | `success_status` | int \| list | 200 | Status code(s) that mean "authenticated". A list is accepted, but only the first element is used. | ### noop The no-op auth provider accepts every request without checking credentials. Set this explicitly to mark an origin as unauthenticated, so the intent is obvious in the config. ```yaml authentication: type: noop ``` ### Per-credential metadata Every inbound auth provider accepts an optional metadata block on each credential entry. When a credential matches, its metadata travels onto the request principal and surfaces in the access log under `principal_kind`, in metrics labels, and in policy scripts that read `principal.attrs.*`. The metadata fields are: | Field | Type | Description | |-------|------|-------------| | `project` | string | Project the credential belongs to. Drives the `project` column on the access log and metric labels. | | `user` | string | User the credential represents or its owner. | | `team` | string | Team or cost-center grouping. | | `tags` | list of strings | Operator-supplied tags. Stamped on `principal.attrs.tags`. | | `metadata` | map of strings | Free-form metadata copied off the credential. Stored as a sorted map for deterministic log lines. | The block is optional on every provider; existing configs that use the bare-string shorthand (a list of plain secrets) continue to parse unchanged. Operators opt in per credential. #### Bearer The full-shape entry replaces a bare string. Mixed lists are allowed. ```yaml authentication: type: bearer tokens: - "shared-token-no-metadata" - secret: ${SERVICE_TOKEN_1} project: foundation team: platform tags: [internal] metadata: cost_center: eng-001 ``` #### API key ```yaml authentication: type: api_key header_name: X-Api-Key api_keys: - "bare-key" - secret: ${TEAM_FRONTEND_KEY} project: foundation team: frontend ``` #### Basic auth Metadata fields sit flat alongside `username` and `password` on each user entry. ```yaml authentication: type: basic_auth realm: "Admin Panel" users: - username: admin password: ${ADMIN_PASSWORD} project: foundation team: platform tags: [admin] ``` #### JWT The JWT provider takes a single nested `attrs:` block (rather than per-token metadata) because the secret material is the JWKS or shared secret, not a list of static tokens. The optional `roles_claim:` list names the claims to copy onto `principal.attrs.roles`; the first claim present wins. ```yaml authentication: type: jwt jwks_url: https://auth.example.com/.well-known/jwks.json issuer: https://auth.example.com audience: my-api attrs: project: foundation team: platform roles_claim: - roles - groups ``` #### OIDC Same nested `attrs:` shape as JWT. ```yaml authentication: type: oidc authorization_endpoint: https://idp.example.com/authorize token_endpoint: https://idp.example.com/oauth/token jwks_uri: https://idp.example.com/.well-known/jwks.json issuer: https://idp.example.com client_id: sbproxy client_secret: ${OIDC_CLIENT_SECRET} cookie_secret: ${OIDC_COOKIE_SECRET} attrs: project: foundation team: platform ``` The access log records the matched principal's source under the `principal_kind` column (`bearer`, `api_key`, `basic_auth`, `jwt`, `oidc`, `virtual_key`, `bot_auth`, `cap`, `forward_auth`, `plugin`, or `none` when no provider is configured). See [access-log.md](access-log.md) for the full column reference. --- ## Policies Policies are evaluated before the action runs. They enforce rate limits, security rules, and access controls. The `policies` field is a sibling of `action` and is an array of policy objects. SBproxy ships ten policy types: `rate_limiting`, `ip_filter`, `expression`, `waf`, `ddos`, `csrf`, `security_headers`, `request_limit`, `sri`, `assertion`. ### rate_limiting Rate limit clients to prevent abuse and protect backend resources. Uses a token bucket by default (in-process) or a fixed-window counter (when an L2 Redis backend is configured). ```yaml origins: "api.example.com": action: type: proxy url: https://backend.internal:8080 policies: - type: rate_limiting requests_per_minute: 60 burst: 10 algorithm: token_bucket whitelist: - 10.0.0.0/8 ``` Clients exceeding the limit receive `429 Too Many Requests` with a `Retry-After` header. | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `rate_limiting` | | `requests_per_second` | float | | Per-second token refill rate | | `requests_per_minute` | float | | Per-minute token refill rate (mutually exclusive with `requests_per_second`) | | `burst` | int | derived from rate | Maximum burst capacity | | `algorithm` | string | `token_bucket` | Algorithm hint: `token_bucket`, `fixed_window`. The runtime picks based on whether an L2 backend is attached. | | `headers` | object | | `X-RateLimit-*` and `Retry-After` header configuration | | `whitelist` | list | | IPs/CIDRs exempt from rate limiting | Distributed rate limiting: a single-instance deployment tracks counters in memory. For multi-instance deployments, configure an L2 Redis cache so counters are shared across all proxy replicas: ```yaml proxy: l2_cache_settings: driver: redis params: dsn: redis://redis.internal:6379/0 ``` ### ip_filter Allow or block requests by client IP address or CIDR range. Useful for locking down internal services or blocking known bad actors. ```yaml policies: - type: ip_filter whitelist: - 10.0.0.0/8 - 192.168.1.0/24 - 172.16.0.0/12 blacklist: - 10.0.0.99/32 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `ip_filter` | | `whitelist` | list | | CIDR ranges that are explicitly permitted. Empty allows everything. | | `blacklist` | list | | CIDR ranges that are explicitly denied. | If `whitelist` is non-empty, the client IP must match at least one entry. If `blacklist` is non-empty, the client IP must not match any entry. Both lists may be used together. ### expression CEL expression that evaluates to allow or deny a request. Pick this for custom access control logic that goes beyond simple IP or key checks. ```yaml policies: - type: expression expression: 'request.headers["x-internal"] == "true"' deny_status: 403 deny_message: "internal traffic only" ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `expression` | | `expression` | string | required | CEL expression returning a boolean. Alias: `cel_expr`. | | `deny_status` | int | 403 | HTTP status code when denied. Alias: `status_code`. | | `deny_message` | string | "forbidden by policy" | Body returned with the deny status code. | Expression policies evaluate CEL only. For Lua-driven access control, use a request modifier with a `lua_script`. ### request_validator Validate request bodies against a JSON Schema at the edge. Inbound payloads that fail validation are rejected with a configurable status (default 400) and a typed JSON error body, before they reach the upstream. ```yaml policies: - type: request_validator content_types: [application/json] # default status: 400 # default error_content_type: application/json schema: type: object required: [name, age] properties: name: { type: string, minLength: 1 } age: { type: integer, minimum: 0 } additionalProperties: false ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `schema` | JSON | required | JSON Schema document. Compiled once at config-load. | | `content_types` | array | `[application/json]` | Media types this policy applies to. Other types pass through untouched. Matched case-insensitively against the leading media type (parameters are ignored). | | `status` | int | 400 | HTTP status returned on validation failure. | | `error_body` | string | structured JSON | Optional rejection body. Default is `{"error":"...","detail":""}` with no echoed payload. | | `error_content_type` | string | `application/json` | Content-Type for the rejection body. | The proxy buffers the request body locally until validation completes, then either releases it as one chunk to the upstream or aborts with the configured rejection. Remote `$ref` resolution in schemas is disabled at the workspace level so a malicious schema cannot become an SSRF primitive. The rejection body never echoes the offending payload back to the caller, only the JSON path where validation failed. See [example 81](../examples/request-validator/sb.yml). ### openapi_validation Load an OpenAPI 3.0 document at startup and validate each request body against the matching operation's `requestBody` schema. Requests whose path + method are not described in the spec, or whose `Content-Type` has no schema, are passed through. Full reference: [openapi-validation.md](openapi-validation.md). ```yaml policies: - type: openapi_validation mode: enforce # or 'log' status: 422 # status returned on enforce-mode rejection spec: openapi: "3.0.3" info: {title: my-api, version: "1.0"} paths: "/users/{id}": post: requestBody: required: true content: application/json: schema: type: object required: [name] additionalProperties: false properties: name: {type: string, minLength: 1} age: {type: integer, minimum: 0, maximum: 150} ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `spec` | object | required* | Inline OpenAPI document. *One of `spec` or `spec_file` is required. | | `spec_file` | string | required* | Path to an OpenAPI document on disk (`.json` or `.yaml`). | | `mode` | string | `enforce` | `enforce` rejects mismatched bodies; `log` warns and forwards. | | `status` | int | 400 | Status returned in `enforce` mode on validation failure. | | `error_body` | string | auto | Optional rejection body. Defaults to a JSON object naming the failing JSON pointer. | | `error_content_type` | string | `application/json` | `Content-Type` for the rejection body. | OpenAPI path templates compile to anchored regexes at startup; per-operation schemas compile once. The rejection body lists only the offending JSON pointer, not the value itself, to keep the surface area an attacker can probe small. See [example 97](../examples/openapi-validation/sb.yml). ### concurrent_limit Cap in-flight requests per key. Distinct from `rate_limiting`, which throttles RPS. Concurrent limits protect backends with low concurrency budgets: legacy SOAP services, DB-bound endpoints, GPU inference workers, anywhere slow requests pile up faster than they drain. ```yaml policies: - type: concurrent_limit max: 50 key: api_key # or 'ip', or 'origin' (default) status: 503 error_body: '{"error":"too many concurrent requests"}' ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `max` | int | required | Maximum concurrent requests per key. Must be `> 0`. | | `key` | string | `origin` | Bucket strategy: `origin` (one global counter for the route), `ip` (per client IP), or `api_key` (per `X-Api-Key` or `Bearer` token). | | `status` | int | 503 | HTTP status when the limit is exceeded. | | `error_body` | string | unset | Optional response body for rejections. | Each accepted request takes a permit; the permit is released when the request finishes (success, error, or client disconnect). Counters use a sharded `DashMap` so contention across keys is bounded. See [example 82](../examples/concurrent-limit/sb.yml). ### ai_crawl_control Pay Per Crawl: respond with `402 Payment Required` to AI crawlers that arrive without a valid `Crawler-Payment` token. Each token redeems once. Full reference: [ai-crawl-control.md](ai-crawl-control.md). ```yaml policies: - type: ai_crawl_control price: 0.001 currency: USD crawler_user_agents: [GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, Google-Extended, PerplexityBot, CCBot] valid_tokens: - tok_a89be2f1 - tok_b7cf012e ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `price` | float | unset | Price emitted in the challenge body and the `price=` challenge parameter. | | `currency` | string | `USD` | ISO-4217 code surfaced in the challenge. | | `header` | string | `crawler-payment` | Header carrying the payment token. | | `crawler_user_agents` | list | major AI crawler defaults | Case-insensitive substring matches against User-Agent. Empty list treats every GET/HEAD as a crawler. | | `valid_tokens` | list | `[]` | Seeds the in-memory single-use ledger. Enterprise replaces this with an HTTP-callable ledger. | Only `GET` and `HEAD` are subject to charging. `POST`/`PUT`/`PATCH`/`DELETE` bypass. ### exposed_credentials Detect requests carrying a known-leaked password against a static exposure list. Tags the upstream request with `exposed-credential-check: leaked-password` (default) or rejects the request outright. Full reference and rollout guidance: [exposed-credentials.md](exposed-credentials.md). ```yaml policies: - type: exposed_credentials action: tag # or "block" passwords: # plaintext, hashed at compile-time - password - password123 sha1_hashes: # uppercase or lowercase hex - 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8 sha1_file: /etc/sbproxy/leaked-sha1.txt # one hash per line; `#` comments ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `provider` | string | `static` | OSS only ships `static`. Enterprise extends with `hibp` (k-anonymity range query). | | `action` | string | `tag` | `tag` stamps the configured header on the upstream request. `block` returns `403`. | | `header` | string | `exposed-credential-check` | Header name when `action: tag`. | | `passwords` | list | `[]` | Plaintext passwords. Hashed at compile time; the source strings are not retained on the policy. | | `sha1_hashes` | list | `[]` | Inline SHA-1 hex hashes. | | `sha1_file` | string | unset | Path to a file with one SHA-1 hex hash per line. | The policy refuses to compile when no list is supplied. SHA-1 uppercase hex matches the format HIBP returns from its range queries, so a downloaded list drops onto disk without preprocessing. ### page_shield Stamps a Content Security Policy header on every proxied response and runs an intake endpoint at `/__sbproxy/csp-report` for browser-emitted violation reports. Reports are logged structured under the `sbproxy::page_shield` tracing target so logpush sinks (and the enterprise Connection Monitor, F3.20) pick them up. ```yaml policies: - type: page_shield mode: report-only # or "enforce" directives: - "default-src 'self'" - "script-src 'self' https://cdn.example" - "img-src 'self' https: data:" report_path: /__sbproxy/csp-report # default report_to_group: csp-endpoint # optional; emits report-to too respect_upstream: false # yield to an upstream-supplied CSP ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `mode` | string | `report-only` | `report-only` emits `Content-Security-Policy-Report-Only`. `enforce` emits `Content-Security-Policy`. | | `directives` | list | required, non-empty | Each entry is a complete CSP directive (`default-src 'self'`). Joined with `; `. | | `report_path` | string | `/__sbproxy/csp-report` | Override the intake path. Used in the auto-appended `report-uri` directive. | | `report_to_group` | string | unset | When set, the policy also emits `report-to ` for the modern Reporting API. | | `respect_upstream` | bool | `false` | When `true` and the upstream already emits a CSP header, the policy yields and does not write its own. | The intake accepts up to 64 KiB per report via `POST /__sbproxy/csp-report` and returns `204 No Content`. The header is applied to proxied responses; static / redirect / mock actions short-circuit before the response-header phase and bypass injection. ### dlp Data Loss Prevention scan over the request URI and headers. Matches against the configured detector catalogue (or every default when `detectors: []`) and either tags the upstream request with `dlp-detection: ` (`action: tag`, default) or rejects with `403` (`action: block`). ```yaml policies: - type: dlp action: tag # or "block" detectors: [] # empty = enable every default detector rules: # optional custom rules layered on top - name: internal_ticket pattern: '\bTICKET-\d{6}\b' replacement: '[REDACTED:TICKET]' anchor: 'TICKET-' ``` **Default detectors:** `email`, `us_ssn`, `credit_card`, `phone_us`, `ipv4`, `openai_key`, `anthropic_key`, `aws_access`, `github_token`, `slack_token`, `iban`. | Field | Type | Default | Description | |-------|------|---------|-------------| | `detectors` | list | `[]` (all defaults) | Detector names to enable. Unknown names fail at compile-time. | | `action` | string | `tag` | `tag` stamps `
: ` on the upstream. `block` returns `403`. | | `direction` | string | `request` | `request` is the only path enforced today; `response` and `both` are accepted for forward compatibility. | | `header` | string | `dlp-detection` | Header name when `action: tag`. | | `rules` | list | `[]` | Custom regex rules layered on top of the catalogue. Same shape as the `pii.rules` block on `ai_proxy` origins. | The scan covers the request URI (path + query) and request headers; auth-class headers (`Authorization`, `Cookie`, `Set-Cookie`) are excluded so tokens carried by design don't self-flag. Body scanning is on the roadmap; the existing `pii:` block on `ai_proxy` origins handles request-body redaction with the same regex catalogue today. ### prompt_injection_v2 Successor to the v1 `prompt_injection` heuristic. The v2 policy splits detection from enforcement: a swappable detector returns a score in `[0.0, 1.0]` plus a categorical label, and the policy maps the score onto an action. The OSS build registers a heuristic detector by default (`detector: heuristic-v1`) so the policy works out of the box. Future builds register additional detectors (e.g. an ONNX classifier) without touching the policy core. ```yaml policies: - type: prompt_injection_v2 action: tag # tag (default) | block | log detector: heuristic-v1 # default; lookup is link-time threshold: 0.5 # fires when score >= threshold score_header: x-prompt-injection-score label_header: x-prompt-injection-label block_body: 'prompt injection detected' block_content_type: text/plain ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `detector` | string | `heuristic-v1` | Detector name. Resolved against the inventory registry; unknown names fail at compile time. | | `threshold` | float | `0.5` | Score threshold in `[0.0, 1.0]`; the policy fires when `score >= threshold`. | | `action` | string | `tag` | `tag` stamps the score / label headers on the upstream. `block` returns `403` with `block_body`. `log` writes a structured warn under `sbproxy::prompt_injection_v2`. | | `score_header` | string | `x-prompt-injection-score` | Header carrying the numeric score (formatted as `"%.3f"`) on `action: tag`. | | `label_header` | string | `x-prompt-injection-label` | Header carrying `clean` / `suspicious` / `injection` on `action: tag`. | | `block_body` | string | `prompt injection detected` | Response body returned on `action: block`. | | `block_content_type` | string | `text/plain` | Content-Type for the block body. | The OSS scaffold scans the request URI + non-auth headers (`Authorization`, `Cookie`, `Set-Cookie` are excluded so tokens carried by design don't self-flag) at request-filter time. Tag mode stamps the score / label headers via the existing trust-headers channel before `upstream_request_filter` builds the upstream request; block mode rejects with `403` immediately. Body-aware detection (the prompt typically lives in the JSON body) is on the roadmap and lands with the ONNX classifier follow-up. See [prompt-injection-v2.md](prompt-injection-v2.md) for the trait shape, the eval harness, and how to register a custom detector. ### waf Web Application Firewall. Built-in patterns cover SQL injection, XSS, and path traversal. Custom rules can extend behavior. ```yaml policies: - type: waf owasp_crs: enabled: true action_on_match: block test_mode: false fail_open: false custom_rules: [] ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `waf` | | `owasp_crs` | object | | OWASP Core Rule Set configuration. | | `action_on_match` | string | "block" | Action when a rule matches: `block`, `log`. | | `test_mode` | bool | false | If true, log matches but do not block. | | `fail_open` | bool | false | If true, allow requests through on WAF engine failure. | | `custom_rules` | list | | Custom WAF rules (regex patterns or JS-defined matchers). | ### ddos DDoS protection with per-IP rate tracking and temporary blocks. ```yaml policies: - type: ddos requests_per_second: 100 block_duration_secs: 300 whitelist: - 10.0.0.0/8 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `ddos` | | `requests_per_second` | int | 100 | Per-IP threshold that triggers blocking. | | `block_duration_secs` | int | 300 | Duration in seconds an IP stays blocked once the threshold trips. | | `whitelist` | list | `[]` | CIDR ranges that bypass DDoS checks. | | `detection` | object | | Go-compat nested form. When `detection.request_rate_threshold` is set, it overrides `requests_per_second`. | | `mitigation` | object | | Go-compat nested form. When `mitigation.block_duration` is set as a Go duration string (`10s`, `5m`, `1h`), it overrides `block_duration_secs`. | ### csrf Cross-Site Request Forgery protection for web applications that accept form submissions. ```yaml policies: - type: csrf secret_key: ${CSRF_SECRET} cookie_name: csrf_token header_name: X-CSRF-Token methods: [POST, PUT, DELETE, PATCH] safe_methods: [GET, HEAD, OPTIONS] cookie_path: / cookie_same_site: Lax exempt_paths: [/api/webhooks, /api/health] ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `csrf` | | `secret_key` | string | required | HMAC key used to sign CSRF tokens. Alias: `secret`. | | `header_name` | string | `X-CSRF-Token` | Header carrying the CSRF token | | `cookie_name` | string | `csrf_token` | Cookie carrying the canonical CSRF token | | `methods` | list | | Methods that require CSRF token validation. When empty, falls back to "anything not in `safe_methods`". | | `safe_methods` | list | `[GET, HEAD, OPTIONS]` | Methods exempt from CSRF checking | | `cookie_path` | string | | Cookie path | | `cookie_same_site` | string | | SameSite attribute (`Strict`, `Lax`, `None`) | | `exempt_paths` | list | | Paths exempt from CSRF checking | ### request_limit Cap request body size, header count, header value size, URL length, and query string length. Any field left unset means that dimension is not checked. ```yaml policies: - type: request_limit max_body_size: 1048576 max_header_count: 50 max_header_size: 8KB max_url_length: 2048 max_query_string_length: 1024 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `max_body_size` | int | unset | Maximum request body size in bytes. | | `max_header_count` | int | unset | Maximum number of request headers. Alias: `max_headers_count`. | | `max_header_size` | int or string | unset | Maximum size of a single header value. Strings like `"4KB"` or `"1MB"` are accepted. | | `max_url_length` | int | unset | Maximum URL length in characters. | | `max_query_string_length` | int | unset | Maximum query string length in characters. | | `max_request_size` | int or string | unset | Go-compat overall request size cap. Same string-or-number rules as `max_header_size`. | | `size_limits` | object | | Go-compat nested form. When set, fields here are merged into the policy at load time. | ### security_headers Inject security headers into every response to harden browser security. ```yaml policies: - type: security_headers headers: - name: Strict-Transport-Security value: "max-age=31536000; includeSubDomains; preload" - name: X-Frame-Options value: DENY - name: X-Content-Type-Options value: nosniff - name: Referrer-Policy value: strict-origin-when-cross-origin - name: Permissions-Policy value: "camera=(), microphone=(), geolocation=()" # Optional: detailed CSP block for nonce / dynamic routes only. content_security_policy: policy: "default-src 'self'; script-src 'self' https://cdn.example.com" enable_nonce: false report_only: false report_uri: "" ``` `headers` is a list of `{name, value}` pairs for any response header (HSTS, Cross-Origin-*, COEP/COOP/CORP, Referrer-Policy, Permissions-Policy, and so on). The optional `content_security_policy` block is for advanced CSP behavior only: per-request nonce injection, report-only mode, per-route overrides. For a plain CSP without nonce or dynamic routes, add a `Content-Security-Policy` entry to `headers` directly. | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `security_headers`. | | `headers` | list | `[]` | Canonical `{name, value}` pairs to inject. Takes precedence over the legacy flat fields below. | | `content_security_policy` | string or object | | CSP. Either a plain policy string or an object (see below). | | `x_frame_options` | string | | Legacy flat shortcut. Deprecated. | | `x_content_type_options` | string | | Legacy flat shortcut. Deprecated. | | `x_xss_protection` | string | | Legacy flat shortcut. Deprecated. | | `referrer_policy` | string | | Legacy flat shortcut. Deprecated. | | `permissions_policy` | string | | Legacy flat shortcut. Deprecated. | | `strict_transport_security` | string | | Legacy flat HSTS shortcut. Deprecated. | When `content_security_policy` is an object, it accepts: | Field | Type | Default | Description | |-------|------|---------|-------------| | `policy` | string | `""` | The CSP policy string. | | `enable_nonce` | bool | false | When true, generate a per-request nonce and inject it into `script-src` / `style-src` directives. | | `report_only` | bool | false | When true, emit `Content-Security-Policy-Report-Only` instead of `Content-Security-Policy`. | | `report_uri` | string | `""` | Appended to the policy as `; report-uri ` when set. | | `dynamic_routes` | map | `{}` | Per-route CSP overrides keyed by URL path. Exact key match wins, then longest matching prefix. | ### sri Subresource Integrity validation. When `enforce` is true, sub-resource responses must include valid integrity hashes using one of the configured algorithms. ```yaml policies: - type: sri enforce: true algorithms: [sha256, sha384, sha512] ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Must be `sri`. | | `enforce` | bool | false | When true, missing or invalid integrity hashes cause the response to be rejected. | | `algorithms` | list | `[]` | Accepted integrity hash algorithms (e.g. `sha256`, `sha384`, `sha512`). | ### assertion CEL assertion policy. Evaluates a CEL expression and logs/flags when it returns false. Unlike `expression`, assertions do not block traffic; they are informational only. ```yaml policies: - type: assertion expression: 'response.status_code < 500' name: "no-server-errors" ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `expression` | string | required | CEL expression evaluated for its truth value | | `name` | string | "assertion" | Human-readable name attached to assertion log entries | --- ## Transforms Transforms modify the response body before it reaches the client. They are specified as a list under `transforms` and run in order. Reach for transforms when you need to reshape API responses for different consumers. SBproxy supports nineteen transform types: `json`, `json_projection`, `json_schema`, `template`, `replace_strings`, `normalize`, `encoding`, `format_convert`, `payload_limit`, `discard`, `sse_chunking`, `html`, `optimize_html`, `html_to_markdown`, `markdown`, `css`, `lua_json`, `javascript`, `js_json`, plus a `noop` for testing. ### json Reshape JSON responses by setting or merging fields. ```yaml origins: "api.example.com": action: type: proxy url: https://backend.internal:8080 transforms: - type: json # Field-level edits handled by this transform. ``` For include/exclude projection, use `json_projection`: ```yaml transforms: - type: json_projection projection: include: [id, name, email, role] ``` Or to remove sensitive fields: ```yaml transforms: - type: json_projection projection: exclude: [password, ssn, internal_notes] ``` ### html Modify HTML responses by removing elements, injecting content at known positions, and rewriting attributes. ```yaml transforms: - type: html remove_selectors: [script, "#banner"] inject: - position: head_end content: '' - position: body_start content: '' - position: body_end content: '' rewrite_attributes: - selector: img attribute: loading value: lazy format_options: strip_comments: true strip_space: true lowercase_tags: false ``` `position` accepts `head_end`, `body_start`, or `body_end`. Each `inject` entry is `{position, content}`. ### css Modify CSS responses by injecting rules, removing rule blocks for specific selectors, and minifying. ```yaml transforms: - type: css inject: - "body { background: #fafafa; }" remove_selectors: [".legacy-banner"] minify: true ``` ### Common transform fields Every entry in the `transforms:` list is wrapped with these pipeline-level fields, parsed by `TransformConfig`: | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | required | Transform type discriminator (e.g. `json`, `template`). | | `content_types` | list | `[]` | Content-Type substrings the transform applies to. Empty matches all. | | `fail_on_error` | bool | false | When true, an error in this transform fails the whole response. | | `max_body_size` | int | 10485760 | Maximum body size, in bytes, that this transform will buffer. Larger bodies skip the transform. | | `disabled` | bool | false | When true, the transform is parsed but not applied. | Type-specific fields are listed below. ### json (field manipulation) Reshape JSON by setting, removing, and renaming fields. | Field | Type | Default | Description | |-------|------|---------|-------------| | `set` | map | `{}` | Fields to set or overwrite. Values may be any JSON. | | `remove` | list | `[]` | Field names to delete. | | `rename` | map | `{}` | `old_name -> new_name` mapping. Renames happen before `set`. | ### json_projection | Field | Type | Default | Description | |-------|------|---------|-------------| | `fields` | list | required | Field names to keep (default) or drop (when `exclude` is true). Alias: `include`. | | `exclude` | bool | false | When true, drop the listed fields instead of keeping them. | ### json_schema Validate the response body against a JSON Schema document. Schemas are compiled at config-load time. Remote `$ref` resolution is disabled to prevent SSRF. | Field | Type | Default | Description | |-------|------|---------|-------------| | `schema` | object | required | The JSON Schema document. | ### template Render the JSON body as input to a minijinja template. | Field | Type | Default | Description | |-------|------|---------|-------------| | `template` | string | required | Template source with `{{ variable }}` syntax. | ### replace_strings Apply a list of literal or regex find-and-replace rules to the body. ```yaml - type: replace_strings replacements: - find: "internal.example.com" replace: "public.example.com" - find: '\d{16}' replace: "[REDACTED]" regex: true ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `replacements` | list | required | Ordered list of replacement rules. | | `replacements[].find` | string | required | Literal substring or regex pattern. | | `replacements[].replace` | string | required | Replacement string. | | `replacements[].regex` | bool | false | When true, treat `find` as a regex. | ### normalize Whitespace and newline normalization. | Field | Type | Default | Description | |-------|------|---------|-------------| | `trim` | bool | false | Trim leading and trailing whitespace. | | `collapse_whitespace` | bool | false | Collapse runs of spaces and tabs into a single space. | | `normalize_newlines` | bool | false | Replace `\r\n` with `\n`. | ### encoding Base64 or URL encode/decode the body. | Field | Type | Default | Description | |-------|------|---------|-------------| | `encoding` | string | required | One of `base64_encode`, `base64_decode`, `url_encode`, `url_decode`. | ### format_convert Convert between JSON and YAML. | Field | Type | Default | Description | |-------|------|---------|-------------| | `from` | string | required | Source format: `json` or `yaml`. | | `to` | string | required | Target format: `json` or `yaml`. | ### payload_limit | Field | Type | Default | Description | |-------|------|---------|-------------| | `max_size` | int | required | Maximum allowed body size in bytes. | | `truncate` | bool | false | When true, truncate to `max_size`. When false, error on oversize. | ### discard Drop the response body entirely. Takes no fields. ```yaml - type: discard ``` ### sse_chunking Format the body as Server-Sent Events with the configured prefix and double-newline delimiters. | Field | Type | Default | Description | |-------|------|---------|-------------| | `line_prefix` | string | `"data: "` | Prefix prepended to each non-empty line. | ### optimize_html Minify HTML by removing comments and collapsing whitespace. | Field | Type | Default | Description | |-------|------|---------|-------------| | `remove_comments` | bool | true | Strip `` comments. | | `collapse_whitespace` | bool | true | Collapse runs of whitespace into a single space (preserves `
` and `` content). |
| `remove_optional_tags` | bool | false | Remove optional closing tags such as ``, `

`, `` (experimental). | ### html_to_markdown | Field | Type | Default | Description | |-------|------|---------|-------------| | `heading_style` | string | `"atx"` | Heading style: `atx` (uses `#`), `setext` (underline). | ### markdown Convert Markdown to HTML using `pulldown-cmark`. | Field | Type | Default | Description | |-------|------|---------|-------------| | `smart_punctuation` | bool | false | Enable smart punctuation (curly quotes, dashes). | | `tables` | bool | false | Enable GitHub-flavored tables. | | `strikethrough` | bool | false | Enable `~~strikethrough~~`. | ### Scripting transforms `lua_json` runs a Lua script against a parsed JSON body. `javascript` and `js_json` run JavaScript. Each is documented in [scripting.md](scripting.md). Replace any `type: lua` references in older configs with `type: lua_json`. | Type | Field | Default | Description | |------|-------|---------|-------------| | `lua_json` | `script` | required | Lua source. The Go-format function name is `modify_json(data, ctx)`; legacy scripts may use a `body` global. Alias: `lua_script`. | | `javascript` | `script` | required | JavaScript source. | | `javascript` | `function_name` | `transform` | Entrypoint function name. Receives the body as a string. | | `js_json` | `script` | required | JavaScript source. Alias: `js_script`. | | `js_json` | `function_name` | `modify_json` | Entrypoint function name. Receives the parsed JSON body. | --- ## Request modifiers Request modifiers run before the action and edit the request. Each entry is an object with one or more of `headers`, `url`, `query`, `method`, `body`, `lua_script`, or `js_script`. Multiple entries are applied in order. ### Header / URL / query / method / body ```yaml origins: "api.example.com": action: type: proxy url: https://backend.internal:8080 request_modifiers: - headers: set: X-Source: sbproxy add: X-Trace-Id: "{{ request.headers.x_request_id }}" remove: - X-Internal-Token url: path: replace: old: /old/ new: /new/ query: set: tenant: prod add: extra: "1" remove: - debug method: POST body: replace_json: injected: true source: proxy ``` | Field | Type | Description | |-------|------|-------------| | `headers.set` | map | Replace headers (overwrites existing) | | `headers.add` | map | Append headers (preserves existing) | | `headers.remove` | list | Remove headers (alias: `delete`) | | `url.path.replace.old` | string | Substring to find in the request path | | `url.path.replace.new` | string | Replacement string | | `query.set` | map | Replace query parameters | | `query.add` | map | Append query parameters | | `query.remove` | list | Remove query parameters (alias: `delete`) | | `method` | string | Override the HTTP method | | `body.replace` | string | Replace the body with this string | | `body.replace_json` | object | Replace the body with this JSON value | ### Scripted request modifiers Each modifier entry can supply a `lua_script` or `js_script` instead of (or in addition to) the structured fields above. Scripts run with full access to the request context. See [scripting.md](scripting.md) for the script API. ```yaml request_modifiers: - lua_script: | local access_level = "guest" if ip.in_cidr(request_ip, "10.0.1.0/24") then access_level = "admin" end request.headers["X-Access-Level"] = access_level return request ``` ```yaml request_modifiers: - js_script: | function modify_request(req, ctx) { req.headers["X-Injected"] = "from-js"; return req; } ``` --- ## Response modifiers Response modifiers run after the action and edit the response. Each entry is an object with one or more of `headers`, `status`, `body`, `lua_script`, or `js_script`. Multiple entries are applied in order. ```yaml origins: "api.example.com": action: type: proxy url: https://backend.internal:8080 response_modifiers: - headers: set: X-Content-Type-Options: nosniff X-Frame-Options: DENY remove: - Server - X-Powered-By status: code: 200 text: OK body: replace: '{"ok": true}' ``` | Field | Type | Description | |-------|------|-------------| | `headers.set` | map | Replace headers | | `headers.add` | map | Append headers | | `headers.remove` | list | Remove headers (alias: `delete`) | | `status.code` | int | Override the response status code | | `status.text` | string | Optional reason phrase (informational only; not sent in HTTP/2) | | `body.replace` | string | Replace the response body with this string | | `body.replace_json` | object | Replace the response body with this JSON value | For JSON-field-level edits (set fields, delete fields, etc.), use the `json` transform rather than a response modifier. ### Scripted response modifiers ```yaml response_modifiers: - lua_script: | if location.country_code ~= "US" and location.country_code ~= "CA" then response.status_code = 451 response.body = '{"error": "Content not available in your region"}' end return response ``` ```yaml response_modifiers: - js_script: | function modify_response(res, ctx) { res.headers["X-Injected"] = "from-js"; return res; } ``` --- ## Response cache Cache responses at the origin level to reduce backend load and improve response times for cacheable content. The `response_cache` block is a sibling of `action`. ```yaml origins: "api.example.com": action: type: proxy url: https://backend.internal:8080 response_cache: enabled: true ttl_secs: 300 cacheable_methods: [GET, HEAD] cacheable_status: [200, 301] max_size: 10000 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `enabled` | bool | false | Enable response caching | | `ttl_secs` | duration | 300 | Cache entry TTL. Accepts integers (`60`) or humanized strings (`60s`, `5m`, `2h30m`). Alias: `ttl`. | | `cacheable_methods` | list | `[GET]` | HTTP methods eligible for caching. Alias: `methods`. | | `cacheable_status` | list | `[200]` | Status codes eligible for caching. Alias: `status_codes`. | | `max_size` | int | 10000 | Upper bound on the in-memory cache size in entries. Ignored when an L2 Redis backend is attached. | When `proxy.l2_cache_settings` is configured with `driver: redis`, response cache entries are stored in the shared backend; the in-memory `max_size` becomes irrelevant. --- ## Forward rules Forward rules route specific requests to different origins based on path, header, or other conditions. They are evaluated in order; the first match wins. Common uses: path-based microservice routing and version routing. Forward rules are deserialized lazily; required fields are enforced when the rule is exercised, not at config-load time. ```yaml origins: "api.example.com": action: type: proxy url: https://default-backend.internal:8080 forward_rules: # Route /api/v2/* to the v2 backend - rules: - path: prefix: /api/v2/ origin: id: v2-backend hostname: v2-backend workspace_id: example version: "2.0.0" action: type: proxy url: https://v2-backend.internal:8080 # Route /health to a static response - rules: - path: exact: /health origin: id: health hostname: health workspace_id: example version: "1.0.0" action: type: static status: 200 content_type: application/json json_body: status: healthy # Route mobile users to mobile backend - rules: - user_agent: os_families: [iOS, Android] origin: id: mobile-backend hostname: mobile-backend workspace_id: example version: "1.0.0" action: type: proxy url: https://mobile-backend.internal:8080 ``` ### Rule matching Each forward rule has a `rules` array where each entry is a path matcher. The OSS deserializer accepts these forms only: | Field | Type | Description | |-------|------|-------------| | `path.prefix` | string | Path starts with this value. | | `path.exact` | string | Path matches this value exactly. If both `prefix` and `exact` are set on the same matcher, `prefix` wins. | | `match` | string | Shorthand. Equivalent to `path: { prefix: }`. | When a rule has multiple matcher entries, the rule fires when any one of them matches. Other Go-era fields (`methods`, `headers`, `query`, `ip`, `location`, `user_agent`, `content_types`, `protocol`) are not parsed by the Rust runtime today and are ignored if present. ### Forward rule fields The forward rule itself wraps the matcher list and the inline child origin to dispatch to. | Field | Type | Default | Description | |-------|------|---------|-------------| | `rules` | list | `[]` | Matcher entries. The rule fires when any one matches. | | `origin` | object | required | Inline child origin. See below. | The `origin` object is a full child origin config plus identifying metadata: | Field | Type | Default | Description | |-------|------|---------|-------------| | `id` | string | | Identifier surfaced in metrics and logs. | | `hostname` | string | | Informational hostname tag. The parent origin's hostname is what routed the request. | | `workspace_id` | string | | Workspace identifier. | | `version` | string | | Version label. | | `action` | object | required | Action executed when the rule fires. Same schema as a top-level `action`. | | `request_modifiers` | list | `[]` | Request modifiers applied before the action runs. | ### Inline origins Forward rules embed full origin configurations via the `origin` field. Each inline origin can have its own action, authentication, policies, and transforms, exactly like a top-level origin. ```yaml forward_rules: - rules: - path: prefix: /admin/ origin: id: admin hostname: admin workspace_id: example version: "1.0.0" action: type: proxy url: https://admin-backend.internal:8080 authentication: type: basic_auth users: - username: admin password: ${ADMIN_PASSWORD} policies: - type: rate_limiting requests_per_minute: 30 ``` --- ## Fallback origin When the primary action errors or the upstream returns a configured status code, the proxy can swap in a backup origin. The fallback runs the action you'd normally write at the top level (static, redirect, mock, proxy, anything), so you can serve a cached body, redirect to a status page, or route to a degraded backend. ```yaml origins: "api.local": action: type: proxy url: https://primary-backend:8080 fallback_origin: on_error: true on_status: [502, 503, 504] add_debug_header: true origin: id: degraded-stub action: type: static status: 200 content_type: application/json json_body: status: degraded message: primary upstream temporarily unavailable retry_after_secs: 30 ``` ### Trigger fields | Field | Type | Default | Description | |-------|------|---------|-------------| | `on_error` | bool | false | Trigger the fallback on transport-level upstream failures (DNS, connect, TLS, timeout). | | `on_status` | list[int] | `[]` | Trigger the fallback when the upstream responds with one of these status codes. Pair with `on_error` for full coverage. | | `add_debug_header` | bool | false | When true, the proxy sets `X-Fallback-Trigger` on the response so callers can tell the fallback path served the request. | | `origin` | object | required | Inline origin spec used to serve the request when a trigger fires. Must contain an `action` block; `id`, `hostname`, `workspace_id`, and `version` are accepted as optional metadata. | ### Inline origin The `origin:` field carries the same action types as a top-level origin (proxy, static, redirect, mock, echo, beacon, noop, ai_proxy, load_balancer, websocket, grpc). Authentication, policies, and transforms are not applied to the fallback path; only the action runs. If you need richer behaviour from the fallback, point its action at another origin via `proxy` and let the host router apply that origin's full chain. --- ## Variables, vaults, and secrets ### Variables User-defined key-value pairs available in template context as `{{ variables.name }}`. Any JSON type works, including nested objects. ```yaml origins: "api.example.com": variables: api_version: v2 base_url: https://api.example.com feature_flags: new_ui: true beta_api: false action: type: proxy url: "{{ variables.base_url }}/{{ variables.api_version }}" ``` ### Secret references Secrets are resolved through the top-level `proxy.secrets` block (see [Secrets](#secrets)). Once resolved, secrets are available in templates as `{{ secrets.name }}`. ```yaml proxy: secrets: backend: hashicorp hashicorp: addr: https://vault.example.com:8200 map: database_url: secret/data/prod/db_url stripe_key: secret/data/prod/stripe_key origins: "api.example.com": action: type: proxy url: "{{ secrets.database_url }}" ``` ### Template scopes Templates have access to these scopes: | Scope | Description | Example | |-------|-------------|---------| | `request` | Current HTTP request | `{{ request.headers.x_api_key }}` | | `variables` | User-defined variables | `{{ variables.api_version }}` | | `secrets` | Loaded secrets | `{{ secrets.api_token }}` | | `config` | Config metadata | `{{ config.hostname }}` | | `session` | Session data | `{{ session.auth.email }}` | | `env` | Config identity fields | `{{ env.workspace_id }}` | | `server` | Server-level vars | `{{ server.var_name }}` | --- ## Session config Configure session behavior for an origin. Sessions are stored in encrypted cookies. ```yaml origins: "app.example.com": session: cookie_name: sb_session max_age: 3600 same_site: Strict http_only: true secure: true allow_non_ssl: false ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `cookie_name` | string | | Session cookie name | | `max_age` | int | | Cookie lifetime in seconds. Alias: `cookie_max_age`. | | `http_only` | bool | false | Set the `HttpOnly` cookie attribute | | `secure` | bool | false | Set the `Secure` cookie attribute (HTTPS only) | | `same_site` | string | | SameSite attribute (`Strict`, `Lax`, `None`). Alias: `cookie_same_site`. | | `allow_non_ssl` | bool | false | Allow sessions over plain HTTP | Sessions disable themselves implicitly when the block is omitted. --- ## Compression Configure response compression on a per-origin basis. ```yaml origins: "api.example.com": compression: enabled: true algorithms: [br, gzip] min_size: 512 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `enabled` | bool | true | Master switch. Alias: `enable`. | | `algorithms` | list | | Allowed algorithms in priority order (e.g. `["br", "gzip"]`) | | `min_size` | int | 0 | Minimum response size in bytes before compression is applied | | `level` | int | | Go-compat compression level. Not used by the Rust runtime. | --- ## HSTS Inject the `Strict-Transport-Security` header on responses. ```yaml origins: "secure.example.com": hsts: max_age: 31536000 include_subdomains: true preload: true ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `max_age` | int | 31536000 | `max-age` directive in seconds | | `include_subdomains` | bool | false | Emit the `includeSubDomains` directive | | `preload` | bool | false | Emit the `preload` directive | --- ## Connection pool Per-origin connection pool tuning. When unset, falls back to proxy-wide defaults. ```yaml origins: "api.example.com": connection_pool: max_connections: 128 idle_timeout_secs: 90 max_lifetime_secs: 300 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `max_connections` | int | 128 | Maximum concurrent connections to the upstream | | `idle_timeout_secs` | int | 90 | Maximum idle time before a connection is closed | | `max_lifetime_secs` | int | 300 | Maximum total lifetime of a connection | --- ## Bot detection Bot detection blocks requests based on `User-Agent` substring matches. The deny list rejects user agents that contain any of the listed substrings (case-insensitive). The allow list exempts user agents from the deny check, so trusted crawlers can pass through even when their substring is otherwise denied. ```yaml origins: "api.example.com": bot_detection: enabled: true mode: block deny_list: - badbot - scrapy - python-requests allow_list: - Googlebot - bingbot ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `enabled` | bool | false | Master switch. When false, every request is admitted. | | `mode` | string | | Mode hint (`block`, `log`). Currently informational; the runtime always blocks denied agents. | | `deny_list` | list | `[]` | User-Agent substrings (case-insensitive) that are blocked with 403. | | `allow_list` | list | `[]` | User-Agent substrings (case-insensitive) that bypass the deny check. Evaluated before the deny list. | --- ## Threat protection Threat protection guards against pathological JSON request bodies. When the request `Content-Type` is `application/json`, the proxy parses the body and checks it against limits on nesting depth, key count, string length, array size, and total body size. A request that exceeds any limit is rejected before it reaches the upstream. ```yaml origins: "api.example.com": threat_protection: enabled: true json: max_depth: 32 max_keys: 1000 max_string_length: 65536 max_array_size: 10000 max_total_size: 1048576 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `enabled` | bool | false | Master switch for threat checks on this origin. | | `json` | object | | JSON-specific limits applied when the body is `application/json`. Omitting this block disables JSON checks even when `enabled` is true. | | `json.max_depth` | int | unlimited | Maximum nesting depth across objects and arrays. | | `json.max_keys` | int | unlimited | Maximum number of keys in any single object. | | `json.max_string_length` | int | unlimited | Maximum length of any single string value. | | `json.max_array_size` | int | unlimited | Maximum length of any single array. | | `json.max_total_size` | int | unlimited | Maximum total body size in bytes, checked before parsing. | --- ## Error pages Error pages let you replace upstream error responses with operator-defined bodies. Each entry declares the status codes it covers, the `Content-Type` it produces, and the response body. When more than one entry matches the status code, the proxy performs `Accept` header content negotiation across the candidates and picks the highest-quality match. With no concrete preference it prefers `application/json`, then `text/html`, then the first candidate. The block is a list at the origin level. Each entry's `status` field accepts a single integer or a list of integers. When `template` is true, the body is rendered with `{{ status_code }}` and `{{ request.path }}` substituted at request time. ```yaml origins: "api.example.com": error_pages: - status: [502, 503, 504] content_type: text/html; charset=utf-8 template: true body: |

Service unavailable

Status {{ status_code }} on {{ request.path }}.

- status: [502, 503, 504] content_type: application/json template: true body: '{"error":"upstream_unavailable","status":{{ status_code }},"path":"{{ request.path }}"}' - status: 404 content_type: application/json body: '{"error":"not_found"}' ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `status` | int or list | | Status code or list of status codes this entry covers. Required for the entry to match. | | `content_type` | string | `application/json` | `Content-Type` header sent with the response. | | `body` | string | `""` | Response body. May contain template placeholders when `template` is true. | | `template` | bool | false | When true, substitute `{{ status_code }}` and `{{ request.path }}` in the body. Both spaced and unspaced forms are accepted. | --- ## Problem details (RFC 9457) The `problem_details` block opts the origin into RFC 9457 `application/problem+json` responses for proxy-generated errors that are not matched by an `error_pages` entry. The two blocks compose: per-status custom pages still win when authored; `problem_details` catches everything else with a structured body. ```yaml origins: "api.example.com": error_pages: - status: 401 content_type: application/json body: '{"error":"unauthorized","hint":"set X-Api-Key"}' problem_details: enabled: true type_base_uri: "https://api.example.com/errors" include_detail: true ``` A proxy-generated 403 on this origin (no `error_pages` entry) renders as: ```json { "type": "https://api.example.com/errors/403", "title": "Forbidden", "status": 403, "detail": "policy denied", "instance": "/restricted" } ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `enabled` | bool | false | When true, render unmatched proxy-generated errors as `application/problem+json`. | | `type_base_uri` | string | | Base URI for the `type` field; the status code is appended (e.g. `https://api.example.com/errors/503`). When unset the renderer emits the RFC 9457 default `about:blank`. | | `include_detail` | bool | true | When false, the `detail` field is suppressed (operators can avoid leaking internal error text). | The renderer fires from the same proxy-generated error path that `error_pages` participates in (authentication denials, policy denials, default 404). Upstream-returned status codes are not rewritten; the renderer only handles errors the proxy itself generates. See [`examples/problem-details/`](https://github.com/soapbucket/sbproxy/tree/main/examples/problem-details). Spec: . The renderer covers both error sources: - **Proxy-generated errors** (authentication denials, policy denials, the default 404 for unknown origins) when no matching `error_pages` entry exists. - **Upstream failures** (connect refused, connect timeout, TLS handshake errors, mid-stream connection loss) routed through Pingora's `fail_to_proxy` path. The `detail` field carries the RFC 9209 error token (`connection_refused`, `connection_timeout`, `tls_protocol_error`, `connection_terminated`, `http_request_error`) so downstream tooling can break down by failure mode without scraping the body. --- ## Idempotency The `idempotency:` block opts the origin into RFC 8594-style cached retries. The middleware reads the `Idempotency-Key` request header, hashes the request body, and: - **First call** under a given key: forwards the request upstream and caches the response under `(workspace, key)` keyed by the body hash. - **Replay** with the same key + same body: returns the cached response with `x-sbproxy-idempotency: HIT`. The upstream is not contacted. - **Conflict** (same key, different body): returns 409 with the `ledger.idempotency_conflict` JSON body per the RFC. The middleware runs ahead of policy enforcement so a cached replay does not consume a rate-limit slot. ```yaml origins: "api.example.com": idempotency: enabled: true header_name: Idempotency-Key # default ttl_secs: 86400 # default (24 h) methods: [POST, PUT, PATCH] # default backend: memory # or `redis` ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `enabled` | bool | false | When true, the middleware engages on this origin. | | `header_name` | string | `Idempotency-Key` | Request header carrying the key. | | `ttl_secs` | int | 86400 | Cache entry TTL in seconds. | | `methods` | list | `[POST, PUT, PATCH]` | HTTP methods that engage the middleware. Other methods pass through. | | `backend` | enum | `memory` | `memory` (per-origin, per-replica) or `redis` (binds to `proxy.l2_store` for cluster-wide replay). | | `max_request_body_bytes` | int | 1048576 (1 MiB) | Per-request cap on buffered body bytes. Bodies larger than this skip the cache; response carries `x-sbproxy-idempotency: SKIPPED-OVERSIZE-REQUEST`. | | `max_response_body_bytes` | int | 1048576 (1 MiB) | Per-response cap on cached body bytes. Responses larger than this stream through uncached. | | `max_concurrent_buffers` | int | 256 | Per-origin cap on concurrent buffered requests. When the pool is exhausted, new requests skip the cache; response carries `x-sbproxy-idempotency: SKIPPED-POOL-FULL`. Worst-case memory per origin is roughly `max_concurrent_buffers * max_request_body_bytes`. | The `memory` backend is per-origin and per-replica: suitable for single-instance deployments and clusters with sticky routing. The `redis` backend binds at config-compile time to the cluster L2 store configured under `proxy.l2_store`; an origin asking for `redis` without that block surfaces a clear config-load error rather than silently downgrading. See [`examples/idempotency/`](https://github.com/soapbucket/sbproxy/tree/main/examples/idempotency). Spec: . > **AI gateway note.** The AI proxy path (`action: ai_proxy`) does not > currently engage this middleware. The AI gateway has its own > request-flow model and response capture is more involved for > streaming completions. Track the follow-up in > `docs/missing.md`. --- ## Rate limit headers The `rate_limit_headers` field at the origin level is reserved for future expansion and is not consumed by the open-source binary. To control `X-RateLimit-*` and `Retry-After` emission today, configure the `headers` block on the rate-limiting policy itself. ```yaml origins: "api.example.com": policies: - type: rate_limiting requests_per_minute: 600 headers: enabled: true include_retry_after: true ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `headers.enabled` | bool | false | When true, emit `X-RateLimit-Limit`, `X-RateLimit-Remaining`, and `X-RateLimit-Reset` on responses. | | `headers.include_retry_after` | bool | false | When true, emit `Retry-After` on 429 responses. | The origin-level `rate_limit_headers` block is accepted for forward compatibility but ignored by the OSS runtime. --- ## Idempotency-Key middleware `crates/sbproxy-middleware/src/idempotency.rs` ships an `Idempotency-Key` middleware that implements the cached-retry vs conflict semantics from Wave 3 / R3.2: 1. Request carries `Idempotency-Key`, cache miss: process the request, capture the response, persist `(workspace_id, key, body_hash, response, expires_at)` after the response is final. Default TTL 24 h. 2. Cache hit, body hash matches: return the cached response. The rate-limit middleware does not consume a slot. 3. Cache hit, body hash differs: return 409 `ledger.idempotency_conflict`. The rate-limit middleware does consume a slot per the A3.4 DoS rule. 4. No `Idempotency-Key` header: pass through. The middleware is library-level today and is consumed by the AI gateway path through `sbproxy-ai::idempotency`. There is no top-level `idempotency:` block on the origin schema yet; the AI handler enables the behaviour for AI traffic and the SHA-256 body-hash + cached-response shape (`CachedResponse`) is reused across both surfaces. Cache backends are `InMemoryIdempotencyCache` (tests, single instance) and `KvIdempotencyCache` (any `sbproxy_platform::storage::KVStore` implementation, including the Redis backend). ## Message signatures The `message_signatures` block declares the schema for RFC 9421 HTTP Message Signatures. The configuration type is defined in `sbproxy-middleware`, but the signing and verification path is not wired into the OSS request pipeline yet. The block parses cleanly so configs that target a future release validate today. ```yaml origins: "api.example.com": message_signatures: algorithm: hmac-sha256 key_id: proxy-key-1 covered_components: - "@method" - "@target-uri" - content-digest ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `algorithm` | string | | Signature algorithm identifier. Required. Examples: `hmac-sha256`, `ed25519`. | | `key_id` | string | | Key identifier emitted in the `Signature-Input` header. Required. | | `covered_components` | list | `[]` | HTTP message components covered by the signature, e.g. `@method`, `@target-uri`, `content-digest`. | --- ## Traffic capture The `traffic_capture` block is reserved for request mirroring and capture configuration. There is no consumer for it in the open-source binary. The field is accepted on the origin so configs that target a future release or an external capture hook validate without errors. Set the block only when an out-of-tree component reads it. For shadow traffic that is wired into the OSS request path, use [`mirror`](#request-mirror) instead. --- ## Host header semantics When the proxy forwards a request to an upstream, it controls the upstream `Host` header explicitly: 1. The default is the upstream URL's hostname. So `url: https://api.upstream.com:8443` causes the upstream to see `Host: api.upstream.com:8443`. This works correctly with vhost-routed services like Vercel, Cloudflare-fronted origins, S3 website endpoints, and AWS ALBs out of the box. 2. If the action sets `host_override: `, that value wins. 3. If a request modifier sets `Host`, the modifier takes precedence over both above (it runs after the proxy's default). Whenever the proxy rewrites `Host` (i.e. the upstream value differs from what the client sent), it also sets `X-Forwarded-Host: ` so the upstream can still observe the public name. Suppress that breadcrumb with `disable_forwarded_host_header: true`. The same `host_override` field is accepted on every URL-bearing action: `proxy`, each `load_balancer` target, `websocket`, `graphql`, `a2a`, `forward_auth`, and AI provider entries. `grpc` exposes the equivalent control as `authority`, matching the HTTP/2 spec name. --- ## Origin overrides Three knobs control how the proxy reaches the upstream, all independent so they compose: | Field | What it changes | curl analogue | |-------|-----------------|---------------| | `host_override` | Upstream `Host` HTTP header | `--header "Host: ..."` | | `sni_override` | TLS SNI server name (and cert verification target) | `--resolve` (TLS leg) | | `resolve_override` | Connect address (skips DNS for the URL host) | `--connect-to` | Common patterns: **Front a SaaS where the cert hostname differs from the URL host.** ```yaml action: type: proxy url: https://api.tenant.example.com sni_override: cdn.provider.net # cert is for *.provider.net host_override: api.tenant.example.com # upstream still expects the tenant hostname ``` **Pin a region without polluting the system resolver.** ```yaml action: type: proxy url: https://api.example.com resolve_override: 203.0.113.7:443 # eu-west-1 anycast ``` **Stage a cutover by pointing at a candidate IP.** ```yaml action: type: proxy url: https://api.example.com resolve_override: "[2001:db8::1]:8443" ``` `resolve_override` accepts `ip`, `ip:port`, `[ipv6]:port`, or `host:port`. When the port is omitted, the URL's port is used. The proxy still sends the URL's hostname in the request line; only the connect address changes. --- ## Trusted proxies and forwarding headers When SBproxy is itself behind another load balancer or CDN (Cloudflare, AWS ALB, Fly.io, internal LB), the immediate TCP peer is that LB, not the real client. To recover the real client identity safely, configure `proxy.trusted_proxies` with the source ranges of those upstream hops: ```yaml proxy: trusted_proxies: - 10.0.0.0/8 - 2001:db8::/32 # IPv6 supported ``` Behaviour: - If the immediate TCP peer falls inside any trusted CIDR, the proxy parses the inbound `X-Forwarded-For` chain and uses the leftmost untrusted hop as the real client IP. This becomes `ctx.client_ip` for the rest of the request: rate limits, IP filters, audit logs. - If the immediate TCP peer is **not** trusted, every inbound forwarding header is stripped on ingress. A direct client cannot spoof its source identity by setting `X-Forwarded-For: 1.2.3.4`. The proxy then sets the standard forwarding headers on every upstream request: | Header | Set to | Opt-out flag | |---|---|---| | `X-Forwarded-Host` | client's original `Host` (when proxy rewrites `Host`) | `disable_forwarded_host_header` | | `X-Forwarded-For` | client IP appended to existing chain | `disable_forwarded_for_header` | | `X-Real-IP` | the immediate client IP | `disable_real_ip_header` | | `X-Forwarded-Proto` | `https` if the listener was TLS, else `http` | `disable_forwarded_proto_header` | | `X-Forwarded-Port` | the listener port | `disable_forwarded_port_header` | | `Forwarded` (RFC 7239) | `for=; proto=; host=; by=` (IPv6 bracketed per RFC) | `disable_forwarded_header` | | `Via` | appended `1.1 sbproxy` | `disable_via_header` | All flags live on the action (or per-target on a load balancer). Default is enabled (no flag set). See [example 73](../examples/trusted-proxies/sb.yml) and [example 74](../examples/forwarding-headers/sb.yml). --- ## Request mirror Send a fire-and-forget copy of every matched request to a shadow upstream. The mirror response is read and discarded; the client only ever sees the primary's response. Useful for safe rollouts of new backends, replay-style testing, and capturing production traffic patterns without affecting end-users. ```yaml origins: "api.example.com": action: type: proxy url: https://primary.internal:8080 mirror: url: https://shadow.internal:8080 sample_rate: 0.1 # mirror ~10% of requests; default 1.0 timeout_ms: 5000 # mirror request timeout; default 5000 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `url` | string | required | Mirror upstream URL. IPv6 hosts must be bracketed (`http://[2001:db8::1]:8080`). | | `sample_rate` | float | `1.0` | Probability in `[0.0, 1.0]` that a given request is mirrored. | | `timeout_ms` | int | `5000` | Per-mirror request timeout. Independent of the primary upstream timeout. | | `mirror_body` | bool | `false` | Tee the inbound request body into the mirror request. Off by default, mirror sees only method, path, query, and headers (sufficient for read endpoints; safe for any case where shadow-replaying writes is unsafe). Set `true` to shadow-replay POST/PUT/PATCH endpoints during migrations. | | `max_body_bytes` | int | `1048576` | Body size cap (bytes). Bodies larger than this fire the mirror without a body so a single large upload can't blow up proxy memory. Defaults to 1 MiB. | Mirror requests carry `X-Sbproxy-Mirror: 1` and the original `X-Sbproxy-Request-Id` so the shadow upstream can distinguish them from real traffic. Method, path/query, and headers are mirrored; body teeing is not yet supported (sufficient for read endpoints; POST bodies are not replayed in this cut). Hop-by-hop headers and `Host` are not forwarded, `reqwest` rebuilds `Host` from the mirror URL. See [example 75](../examples/request-mirror/sb.yml). --- ## Upstream retries When an upstream connection fails (TCP refused, DNS failure, TLS handshake error, or connect timeout), the proxy can retry the request automatically. ```yaml origins: "api.example.com": action: type: proxy url: http://backend.internal:8080 retry: max_attempts: 3 retry_on: - connect_error - timeout backoff_ms: 100 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `max_attempts` | int | `1` | Total request attempts including the original. `1` disables retries. | | `retry_on` | array | `[connect_error, timeout]` | Retry conditions. Currently honoured: `connect_error`, `timeout`. Status-code retries (`502`, `503`, ...) are accepted but not yet wired in this cut because they require buffering the upstream response. | | `backoff_ms` | int | `100` | Base backoff before the next attempt. Doubles on each retry, capped at 5000ms. | `retry` is accepted on both `proxy` and `load_balancer` actions. For `load_balancer`, a failed target is reported to the outlier detector and circuit breaker so the next retry attempt selects a different healthy peer rather than retrying the same dead target. See [example 76](../examples/upstream-retries/sb.yml). --- ## Active health checks Configure background probes per `load_balancer` target. The proxy GETs the probe URL on a fixed interval and tracks consecutive success / failure counts. Targets that fail the threshold are excluded from `select_target` until they recover. Probe results also feed the outlier detector when one is configured, so passive and active signals share state. ```yaml action: type: load_balancer targets: - url: http://backend-1.internal:8080 health_check: path: /healthz interval_secs: 10 # probe period in seconds timeout_ms: 2000 unhealthy_threshold: 3 healthy_threshold: 2 - url: http://[2001:db8::1]:8080 health_check: path: /healthz ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `path` | string | `/healthz` | Path to probe. Must start with `/`. | | `interval_secs` | int | `10` | Probe period in seconds (alias: `period_secs`). | | `timeout_ms` | int | `2000` | Per-probe timeout. | | `unhealthy_threshold` | int | `3` | Consecutive failures required to mark unhealthy. | | `healthy_threshold` | int | `2` | Consecutive successes required to recover. | IPv6 targets are supported: the URL builder preserves bracketing. See [example 77](../examples/active-health-checks/sb.yml). --- ## Circuit breaker A formal Closed → Open → HalfOpen → Closed state machine attached to each `load_balancer` target. On `failure_threshold` consecutive failures (5xx response, connect error, timeout) the breaker trips Open; every subsequent request to that target is excluded from `select_target` and routed to a healthy peer instead. After `open_duration_secs`, the breaker enters HalfOpen and admits probe requests; on `success_threshold` consecutive successes it closes again, otherwise it re-opens. ```yaml action: type: load_balancer circuit_breaker: failure_threshold: 5 # trip after 5 consecutive failures success_threshold: 2 # close after 2 consecutive HalfOpen successes open_duration_secs: 30 # stay Open for 30s before trying probes targets: - url: http://backend-1.internal:8080 - url: http://backend-2.internal:8080 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `failure_threshold` | int | `5` | Consecutive failures before tripping Open. | | `success_threshold` | int | `2` | Consecutive successes in HalfOpen to return to Closed. | | `open_duration_secs` | int | `30` | How long the breaker stays Open before admitting probes. | The breaker is **complementary to** [outlier detection](#outlier-detection): | Signal | Trigger | |---|---| | Circuit breaker | `N` failures in a row, immediate isolation | | Outlier detection | Failure *rate* over a sliding window | Either signal independently ejects a target from `select_target`. Configure both for robust resilience: outlier detection catches "this target is bad in aggregate," the breaker catches "this target is hard down right now." When every target is tripped, the LB falls back to the unfiltered list rather than 502'ing the client. See [example 84](../examples/circuit-breaker/sb.yml). --- ## Outlier detection Track each `load_balancer` target's success/failure rate over a sliding window and eject targets whose error rate crosses the threshold. Failures are recorded from upstream 5xx responses and from connect errors; recovery happens automatically after the cooldown. ```yaml action: type: load_balancer outlier_detection: threshold: 0.5 # 50% error rate window_secs: 60 # sliding window length min_requests: 5 # minimum requests in window before ejection ejection_duration_secs: 30 # cooldown before re-admission targets: - url: http://backend-1.internal:8080 - url: http://backend-2.internal:8080 ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `threshold` | float | `0.5` | Failure rate at which to eject (0.0–1.0). | | `window_secs` | int | `60` | Sliding window length in seconds. | | `min_requests` | int | `5` | Minimum requests in the window before ejection is considered. | | `ejection_duration_secs` | int | `30` | How long to keep an ejected target out of rotation. | When all active targets are ejected, the proxy falls back to the unfiltered list rather than 502'ing the client (better to send to a flaky peer than to fail closed). See [example 78](../examples/outlier-detection/sb.yml). --- ## Service discovery Without service discovery, the proxy resolves an upstream hostname once when a connection is established and the connection pool reuses that connection (and that IP) for as long as the connection lives. When the upstream's IP set changes, K8s `Service` endpoints rotate, ECS Cloud Map adds a new task, the backend behind a `Headless` service scales horizontally, the proxy keeps using the stale IP until the connection eventually closes. `service_discovery` on a `proxy` action makes the proxy re-resolve the hostname every `refresh_secs` and rotate the chosen upstream IP across the current A/AAAA record set. ```yaml origins: "api.example.com": action: type: proxy url: https://backend.namespace.svc.cluster.local:8080 service_discovery: enabled: true refresh_secs: 30 # default ipv6: true # default; drop to false to skip AAAA ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `enabled` | bool | `true` | Master switch. The presence of the block usually means "I want it on"; set `false` to keep the config without enabling. | | `refresh_secs` | int | `30` | How often to re-resolve. Setting this below the upstream record's actual TTL has no effect, the system resolver applies its own caching, but the proxy will at least notice changes within `refresh_secs` of the upstream-side update. | | `ipv6` | bool | `true` | Whether AAAA records contribute to the rotation set. | The hostname stays as the SNI / `Host` header so TLS verification continues to match the certificate that was issued for the hostname. IPv6 resolved addresses are wrapped in brackets (`[2001:db8::1]:port`) when handed to Pingora. Round-robin selection within the resolved set spreads load across all current IPs. When DNS resolution fails (network glitch, hostname temporarily NXDOMAIN), the proxy falls back to letting Pingora's connect-time resolver handle the lookup. See [example 83](../examples/service-discovery/sb.yml). --- ## Correlation ID The proxy mints a per-request correlation identifier early in the request lifecycle. With the default policy: 1. If the inbound request carries `X-Request-Id`, its value becomes the request's correlation ID. Upstream callers (a frontend, an API client, another proxy) get to thread their traces through ours. 2. Otherwise the proxy generates a fresh UUID v4 (32 hex chars). 3. The chosen value is set on the upstream request under the same header name so the upstream sees the same ID the proxy logged. 4. The chosen value is echoed back to the client on the response, so the client can hand it to support to find the matching server logs. ```yaml proxy: correlation_id: enabled: true # default header: X-Request-Id # default; rename for shops that use X-Correlation-Id echo_response: true # default; set false to omit the response header ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `enabled` | bool | `true` | Master switch. | | `header` | string | `X-Request-Id` | Header name read on ingress, set on the upstream, and echoed on the response. | | `echo_response` | bool | `true` | Whether to set the header on the downstream response. | The same value is exposed as `ctx.request_id` to every other component: webhook envelopes (`X-Sbproxy-Request-Id`), access logs, alert webhooks, and the AI gateway's per-call records. Set `enabled: false` to opt out entirely. Inbound values longer than 256 characters are ignored (the proxy generates a fresh ID). Empty / whitespace-only inbound values are ignored. See [example 80](../examples/correlation-id/sb.yml). --- ## mTLS client authentication When set, the HTTPS listener requires (or optionally accepts) a client TLS certificate signed by the configured CA bundle. The verification happens during the TLS handshake, clients without a valid cert are rejected before `request_filter` ever runs. ```yaml proxy: http_bind_port: 8080 https_bind_port: 8443 tls_cert_file: /etc/ssl/sbproxy/server.pem tls_key_file: /etc/ssl/sbproxy/server.key mtls: client_ca_file: /etc/ssl/sbproxy/clients-ca.pem require: true # default; set false to allow anonymous TLS clients ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `client_ca_file` | string | required | PEM-encoded CA bundle used to verify client certs. May contain multiple `BEGIN CERTIFICATE` blocks; each becomes a trust anchor. | | `require` | bool | `true` | When `true`, the handshake fails if the client does not present a certificate. When `false`, anonymous clients are admitted and the upstream sees no `X-Client-Cert-*` headers (so it can choose its own policy). | After a successful handshake, the proxy strips any inbound `X-Client-Cert-*` headers (so a non-TLS client cannot forge them) and sets the verified cert metadata for the upstream: | Header | Value | |---|---| | `X-Client-Cert-Verified` | `1` | | `X-Client-Cert-CN` | Subject Common Name, when present | | `X-Client-Cert-SAN` | Comma-separated `DNS:`/`URI:`/`email:`/`IP:` SANs | | `X-Client-Cert-Organization` | Subject's `O` field, when present | | `X-Client-Cert-Serial` | hex serial number | | `X-Client-Cert-Fingerprint` | hex SHA-256 of the cert | CN and SAN are extracted by a wrapping `ClientCertVerifier` that captures them at handshake time and indexes by SHA-256 of the cert DER (which matches Pingora's internal `cert_digest`). Chain validation is unchanged. The cache is bounded so a churning client population does not grow it without bound. See [example 85](../examples/mtls-client-auth/sb.yml). --- ## Webhook envelope and signing Every webhook the proxy fires (`on_request`, `on_response`, alerting channels) carries a standard identifying envelope and optional HMAC-SHA256 signature. ### Envelope ```json { "event": "on_request", "proxy": { "instance_id": "sbproxy-host-7c4d8b9a", "version": "0.1.0", "config_revision": "a7b3f9c11d80" }, "request": { "id": "01j9x4af1k73c5dvkk1xvb6f9w", "received_at": "2026-04-25T07:32:00Z" }, "origin": { "name": "api.example.com" }, "method": "GET", "path": "/api/users", "host": "api.example.com", "client_ip": "203.0.113.7", "headers": { "...": "..." } } ``` `on_response` payloads include the same `proxy.*` and `request.id` fields, plus `status` and `duration_ms`, so receivers can correlate the request/response pair. ### Headers on the webhook request | Header | Value | |---|---| | `User-Agent` | `sbproxy/` | | `X-Sbproxy-Event` | `on_request`, `on_response`, or `alert` | | `X-Sbproxy-Instance` | per-process instance identifier | | `X-Sbproxy-Request-Id` | matches `request.id` in the envelope | | `X-Sbproxy-Config-Revision` | short hex hash of the loaded config | | `X-Sbproxy-Timestamp` | unix seconds at send time | | `X-Sbproxy-Signature` | `v1=` (only when `secret` is configured) | ### Signing Set a `secret` on the callback to enable HMAC-SHA256: ```yaml on_request: - url: https://hooks.example.com/sbproxy method: POST secret: shared-webhook-secret timeout: 5 ``` The signed material is `"."`. Receivers should: 1. Read `X-Sbproxy-Timestamp` and reject anything older than ~5 minutes (replay defence). 2. Compute `HMAC-SHA256(secret, timestamp + "." + raw_body)`. 3. Compare to `X-Sbproxy-Signature` (`v1=`) using a constant-time comparison. The same `secret` field is accepted on alert webhook channels (`proxy.alerting.channels[]`). See [example 79](../examples/webhook-signing/sb.yml). --- ## Secrets The top-level `proxy.secrets` block configures how `secret:` references are resolved at config-load time and how rotation is handled. ```yaml proxy: secrets: backend: hashicorp hashicorp: addr: https://vault.example.com:8200 token: ${VAULT_TOKEN} mount: secret map: openai_key: secret/data/prod/openai_key db_password: secret/data/prod/db_password rotation: grace_period_secs: 300 re_resolve_interval_secs: 60 fallback: cache ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `backend` | string | `env` | Backend used to resolve secrets. Supported: `env`, `local`, `hashicorp`. | | `hashicorp.addr` | string | | Vault server address (required when `backend = hashicorp`) | | `hashicorp.token` | string | from `VAULT_TOKEN` env var | Vault token | | `hashicorp.mount` | string | `secret` | KV secrets engine mount path | | `map` | map | | Logical-name to vault-path mapping | | `rotation.grace_period_secs` | int | 300 | Seconds the previous secret value remains valid after rotation | | `rotation.re_resolve_interval_secs` | int | 60 | How often to re-fetch secrets from the backend | | `fallback` | string | `cache` | Strategy when the backend is unavailable. Supported: `cache`, `reject`, `env`. | The `extensions` map at both the proxy and the origin level holds opaque blocks consumed by enterprise / third-party crates. OSS does not parse them. ### `vault://` reference URI In addition to `${ENV}`, `file:`, and `secret:`, secret-bearing fields accept a unified `vault://` reference URI that names a backend + path + optional sub-field. The parser ships in `sbproxy-vault`; the resolver will dispatch into the configured backend once the per-backend implementations land. #### Grammar ``` vault:///[?version=][&key=] ``` * `` is the registered backend name (operator-chosen identifier under `proxy.vault:`, `tenants[].vault:`, or `origins[].vault:` once those scopes ship). * `` is the backend-specific path inside the vault. The parser carries it verbatim; each backend validates its own shape at resolve time. * `version=` pins a secret version where the backend supports versioning (HashiCorp KVv2, AWS Secrets Manager). Ignored by versionless backends. * `key=` extracts a sub-field from a JSON secret payload. When omitted the entire payload is returned. * Additional query parameters carry through to the backend as opaque hints; the parser does not interpret them. #### Examples ```yaml authentication: type: bearer tokens: - vault://hashi/secret/data/openai-prod?key=api_key - vault://aws/prod/openai-keys?version=3&key=api_key - vault://k8s/default/sbproxy-secrets/openai-key - vault://file/etc/sbproxy/secrets/openai - vault://env/OPENAI_API_KEY - vault://sqlite/credentials/openai?version=3&key=current ``` #### Backward compatibility Existing `${ENV}`, `file:/path/to/secret`, and `secret:` shapes keep working unchanged. The resolver tries each parser in turn: a string that does not start with `vault://` falls through to the legacy resolvers exactly as before. #### Multi-tenant resolution The URI itself is tenant-agnostic. The `` segment names a backend block; the block is configured per-scope at `proxy.vault`, `tenants[].vault`, or `origins[].vault`. Resolution order at request time is origin scope, then tenant scope, then proxy scope; the first scope that declares the named backend serves the reference. ```yaml proxy: vault: - name: hashi type: hashicorp addr: https://vault.shared.example/v1 token: vault://env/VAULT_TOKEN_SHARED tenants: - id: acme-corp vault: - name: hashi # same name, different Vault instance type: hashicorp addr: https://vault.acme.example/v1 token: vault://env/VAULT_TOKEN_ACME - id: beta-corp vault: - name: hashi type: hashicorp addr: https://vault.beta.example/v1 token: vault://env/VAULT_TOKEN_BETA origins: api.acme.example.com: tenant_id: acme-corp action: type: ai_proxy providers: - name: openai api_key: vault://hashi/secret/data/openai-prod?key=api_key ``` The `vault://hashi/secret/data/openai-prod` reference in the origin above resolves against acme-corp's hashi block (Vault at `vault.acme.example`). A tenant that does not redeclare a named backend transparently inherits the proxy default, so single-tenant configs need no changes. The request's `tenant_id` (stamped by the routing layer) is the resolution context, not part of the URI. Tenant and origin vault scopes land alongside the credentials epic; today's vault block is proxy-scope only. --- ## Environment variables Reference environment variables anywhere in the config with `${VAR_NAME}` syntax to keep secrets out of config files. ```yaml origins: "api.example.com": action: type: proxy url: ${BACKEND_URL} authentication: type: api_key api_keys: - ${API_KEY} ``` Environment variables are resolved at config load time. An unset variable leaves the literal `${VAR_NAME}` string in place rather than failing the load. Common pattern: load variables from `.env` with your shell or Docker: ```bash export BACKEND_URL=https://backend.internal:8080 export API_KEY=my-secret-key sbproxy serve -f sb.yml ``` --- ## ACME / auto TLS SBproxy can automatically provision and renew TLS certificates using the ACME protocol (Let's Encrypt or any ACME-compatible CA). ### Production setup (Let's Encrypt) ```yaml proxy: http_bind_port: 80 https_bind_port: 443 acme: enabled: true email: admin@example.com storage_path: /var/lib/sbproxy/certs origins: "api.example.com": action: type: proxy url: https://backend.internal:8080 force_ssl: true ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `enabled` | bool | false | Master switch for ACME-managed TLS | | `email` | string | | Account contact email registered with the ACME directory | | `directory_url` | string | Let's Encrypt production | ACME directory URL | | `challenge_types` | list | `[tls-alpn-01, http-01]` | Allowed challenge types in priority order | | `storage_backend` | string | `redb` | Backing store for issued certificates (`redb`, `sqlite`) | | `storage_path` | string | `/var/lib/sbproxy/certs` | Filesystem path for the certificate store | | `renew_before_days` | int | 30 | Days before expiry to attempt renewal | ### Local development (Pebble) Pebble is a test ACME server suitable for local development. Point `directory_url` at it: ```yaml proxy: http_bind_port: 8080 https_bind_port: 8443 acme: enabled: true email: test@example.com directory_url: https://pebble:14000/dir storage_path: /tmp/certs ``` --- ## Redis integration Redis has two roles in SBproxy: distributed caching (L2 cache) and real-time messaging (config sync, cache invalidation). Both blocks are nested under `proxy:`. ### L2 cache (distributed rate limiting and caching) ```yaml proxy: l2_cache_settings: driver: redis params: dsn: redis://redis.internal:6379/0 ``` When configured, rate limit counters are shared across all proxy instances. Response cache entries can also be stored in Redis for shared caching. The deserializer also accepts `l2_cache:` as a canonical alias. ### Messenger (real-time config updates) ```yaml proxy: messenger_settings: driver: redis params: dsn: redis://redis.internal:6379 ``` When configured, config changes pushed via the API propagate to all proxy instances in real time over Redis Streams. The Redis driver expects `params.dsn`. SQS uses `queue_url`, `region`, `api_key`. GCP Pub/Sub uses `project`, `topic`, `subscription`, `access_token`. The `memory` driver takes no params and is single-replica only. ### Full Redis setup ```yaml proxy: http_bind_port: 8080 https_bind_port: 8443 l2_cache_settings: driver: redis params: dsn: redis://redis.internal:6379/0 messenger_settings: driver: redis params: dsn: redis://redis.internal:6379 origins: "api.example.com": action: type: proxy url: https://backend.internal:8080 policies: - type: rate_limiting requests_per_minute: 100 response_cache: enabled: true ttl_secs: 300 ``` --- ## Validation Check the configuration for errors without starting the proxy: ```bash sbproxy validate /etc/sbproxy/sb.yml ## or, equivalently, on a running --config invocation sbproxy --config /etc/sbproxy/sb.yml --check ``` This catches: - YAML syntax errors - Missing required top-level fields - Unknown action / policy / transform types Validate every config change before deploying to production. Metrics are exposed via the embedded admin server: set `proxy.admin.enabled: true`, `proxy.admin.port: 9090`, and tune `proxy.metrics.max_cardinality_per_label` for high-traffic deployments. For production deployments, the planned `sbproxy plan` and `sbproxy apply` subcommands give a Terraform-style diff-and-confirm path on top of `validate`. The audit and design for those subcommands lives in [adr-config-plan-apply.md](adr-config-plan-apply.md); they are not implemented in this release. --- ## CORS Configure Cross-Origin Resource Sharing as a top-level origin field: ```yaml origins: "api.example.com": action: type: proxy url: https://backend.internal:8080 cors: enable: true allow_origins: ["https://app.example.com", "https://admin.example.com"] allow_methods: [GET, POST, PUT, DELETE, OPTIONS] allow_headers: [Content-Type, Authorization, X-Requested-With] expose_headers: [X-Request-ID, X-RateLimit-Remaining] max_age: 3600 allow_credentials: true ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `enable` | bool | false | Enable CORS header injection. Alias: `enabled`. | | `allow_origins` | list | | Allowed origins (use `["*"]` for any). Alias: `allowed_origins`. | | `allow_methods` | list | standard methods | Allowed HTTP methods. Alias: `allowed_methods`. | | `allow_headers` | list | standard headers | Allowed request headers. Alias: `allowed_headers`. | | `expose_headers` | list | | Headers exposed to the browser | | `max_age` | int | | Preflight cache duration in seconds | | `allow_credentials` | bool | false | Allow credentials (cookies, auth headers) | --- ## Quick reference: config field locations A common mistake is nesting fields inside `action` when they should be siblings. The correct layout: ```yaml origins: "api.example.com": # These are ALL at the same level (siblings of action): action: { ... } authentication: { ... } policies: [ ... ] transforms: [ ... ] request_modifiers: [ ... ] response_modifiers: [ ... ] forward_rules: [ ... ] response_cache: { ... } variables: { ... } session: { ... } cors: { ... } compression: { ... } hsts: { ... } connection_pool: { ... } mirror: { ... } # shadow traffic; sibling of action on_request: [ ... ] # webhook callbacks on_response: [ ... ] extensions: { ... } ``` None of these belong inside the `action` block. The `action` block only contains action-specific fields (type, url, targets, providers, etc.). A handful of fields *do* live inside an action because they govern how the proxy talks to that specific upstream: ```yaml action: type: proxy url: https://upstream.example/api host_override: api.upstream.example # rewrite the upstream Host disable_via_header: true # any of the disable_*_header flags retry: { ... } # connect-error retry policy ``` `load_balancer` actions accept an `outlier_detection` block at the action level and per-target `health_check`, `host_override`, and `disable_*_header` flags inside each target. ## Environment variable templating in header modifiers Request and response header modifiers may reference environment variables using the `{{env.NAME}}` template form. To prevent multi-tenant exfiltration of process secrets, env expansion is gated by an explicit allowlist on `TemplateContext::allowed_env_vars`. This change is tracked under OPENSOURCE.md H4. - The default allowlist is empty. With the default, every `{{env.X}}` template resolves to the empty string and a `tracing::warn!` is logged. This includes well-known secret names like `AWS_SECRET_ACCESS_KEY`, `GITHUB_TOKEN`, and any custom `_TOKEN` / `_KEY` env vars set on the proxy process. - Operators opt in per-installation by adding env var names to `TemplateContext::allowed_env_vars` when populating the per-request template context. Names are matched literally; case matters. - Allowlisted env vars that are unset at the OS level resolve to the literal `{{env.X}}` string so misconfiguration shows up as obviously broken header values rather than silently empty ones. Example header modifier and the matching allowlist a deployment would use: ```yaml request_modifiers: - headers: set: X-Build-Id: "{{env.SBPROXY_BUILD_ID}}" X-Region: "{{env.SBPROXY_REGION}}" ``` ```rust,no_run // Inside the proxy runtime that builds TemplateContext per request. let mut tmpl = sbproxy_middleware::modifiers::TemplateContext::new(); tmpl.allowed_env_vars.push("SBPROXY_BUILD_ID".to_string()); tmpl.allowed_env_vars.push("SBPROXY_REGION".to_string()); ``` A header value of `{{env.AWS_SECRET_ACCESS_KEY}}` will not resolve unless `AWS_SECRET_ACCESS_KEY` is added to that allowlist. There is no global "allow all env vars" switch. ================================================================ # docs/content-digest.md ================================================================ ## content_digest policy *Last modified: 2026-05-31* The `content_digest` policy verifies an inbound request body against the digest the client advertises in the `Content-Digest:` header (RFC 9530). On mismatch, malformed header, or unsupported algorithm, the proxy rejects the request before forwarding. The intended audience is integrity-critical inboxes: webhook receivers, agent endpoints, payment callbacks, audit-ingest paths. The policy honours `Content-Digest:` first and falls back to `Repr-Digest:` if `Content-Digest:` is absent. RFC 9530 §2 makes the two interchangeable for inbound traffic that does not decode `Content-Encoding`. SHA-256 and SHA-512 are supported; unknown algorithms fall through to the configured failure mode. Verification runs in `request_body_filter` once the body is fully buffered. The pairing enforcer sets `ctx.validate_request_body = true` so the proxy buffers the body for hashing; bypass it on routes that do not need this check. ## Config ```yaml origins: "webhook.example.com": upstream: https://api.internal policies: - type: content_digest # What to do when the client did not send any digest header. # `require` (default): reject. `skip`: pass through unverified # (useful when the origin mixes integrity-required and # integrity-optional traffic on the same hostname). on_missing: require # HTTP status returned on every failure path (missing when # required, mismatch, malformed, unsupported algorithm). reject_status: 400 ``` ## Failure modes | Condition | Behaviour | |---|---| | Header present, digest matches | Pass; sets `ctx.content_digest_verified = true` | | Header present, digest mismatch | Reject with `reject_status` | | Header present, algorithm not in {sha-256, sha-512} | Reject with `reject_status` | | Header present, parse error | Reject with `reject_status` | | Header absent, `on_missing: require` | Reject with `reject_status` | | Header absent, `on_missing: skip` | Pass through unverified | ## Why the verified flag matters `ctx.content_digest_verified = true` propagates the verification result to downstream phases. HTTP Message Signatures audit can attest that the body matches the signed digest component without re-hashing, and billing surfaces that quote by body size get an integrity guarantee for free. The flag is consumed inside the proxy; it does not leak to clients. ## Out of scope RFC 9530 §6.4 trailer-section digests are not supported because Pingora 0.8's `ProxyHttp` trait does not expose an `request_trailer_filter` hook. Clients that send the digest in the trailer section are treated as if the header is absent, so `on_missing: require` rejects them (the safer default). ## See also * [features.md](./features.md) - tour with policy examples. * [examples/content-digest/](../examples/content-digest/) - runnable webhook receiver fixture. * [configuration.md](./configuration.md) - the full schema. ================================================================ # docs/content-for-agents.md ================================================================ ## Content for agents *Last modified: 2026-05-08* This guide is the operator-facing companion to the content-shaping pillar. If you have SBproxy running and you have already read [configuration.md](configuration.md) and [ai-crawl-control.md](ai-crawl-control.md), this is the next document. It covers how the proxy negotiates a content shape with an agent, how the body is transformed into that shape, what license posture the proxy advertises in four well-known documents, and how operators stamp the per-route editorial signal that ties everything together. The reader is a publisher or platform engineer who wants to turn on agent-aware content delivery. The audience is not Rust developers; the focus is configuration, wire shapes, and the operational guarantees you get for them. ## What ships The content-shaping surface area: - **Two-pass `Accept` resolution.** A pricing pass and a transformation pass. Agents declare a shape preference via `Accept`; the proxy matches a tier on the pricing pass and runs a body transform on the transformation pass. The two passes can diverge by design under q-value tie-breaks. - **JSON envelope.** A structured response shape for `Accept: application/json`. Wraps the page's Markdown body with title, URL, license URN, citation flag, token estimate, and pass-through schema.org JSON-LD. Versioned via the `Content-Type` profile parameter. - **`Content-Signal` response header.** A per-route editorial signal in a closed value set: `ai-train`, `ai-input`, `search`. Stamped on 200 responses; consumed by RSL projections, TDMRep projections, and the JSON envelope. - **`x-markdown-tokens` response header.** Approximate token count of the Markdown body, computed once per response and stamped on Markdown and JSON envelope responses. Same value the JSON envelope's `token_estimate` field carries. - **Citation block transform.** Prepends a source / license / fetched-at line to Markdown bodies when the matched tier asserts `citation_required`. - **Boilerplate stripping.** Drops navigation, footer, aside, and comment-section nodes before the HTML-to-Markdown transform runs. Cuts token counts on typical news / blog pages by 30 to 60 percent without losing article content. - **Four projection documents.** `robots.txt`, `llms.txt` (and `llms-full.txt`), `/licenses.xml`, and `/.well-known/tdmrep.json`. Each is generated from the operator's compiled `ai_crawl_control` policy, regenerated atomically on every config reload, and served from the same hostname as the rest of the origin. - **aipref signal parsing.** The inbound `aipref:` request header is parsed into a typed signal and surfaced to the scripting layer (CEL / Lua / JavaScript / WASM). Default-permissive when the header is absent or malformed. ## Concept map ``` +---------+ 1: GET /article +-----------+ | agent |---------------------------------------->| sbproxy | +---------+ Accept: text/markdown | | | +-----+-----+ | | | | Pass 1: pricing shape | | (declaration order, q-values stripped) | | | v | +-----+-----+ | | response | | | pipeline | | +-----+-----+ | | | | Pass 2: transformation shape | | (q-value-aware; selects body transform) | v +-----------------------------+-----------------------------+ | | | v v v boilerplate markup json_envelope (strip nav, (HTML to (wrap Markdown + footer, aside, Markdown) title + license + comment-section) tokens + JSON-LD) | | | +--------------+--------------+--------------+--------------+ | | v v citation_block response headers (prepends source Content-Signal: ai-train / license line x-markdown-tokens: 1420 when required) Content-Type: application/json; profile="https://sbproxy.dev/ schema/json-envelope/v1" Projection routes (served from the same hostname): /robots.txt -> robots projection /llms.txt -> llms.txt projection /llms-full.txt -> llms-full.txt projection /licenses.xml -> RSL 1.0 projection /.well-known/tdmrep.json -> W3C TDMRep projection ``` Caption: the same request produces three things. A 402 challenge that prices the request against the pricing-pass shape. A response body transformed into the transformation-pass shape. A set of four well-known documents that advertise the same license and pricing posture in machine-readable form, served at canonical URLs so cooperative crawlers can discover them without a 402 round-trip. ## Configuring content negotiation The two-pass shape resolution is automatic for any origin that has an `ai_crawl_control` policy. The compiler synthesises an `auto_content_negotiate` action at the head of the response pipeline so neither the operator's `action:` nor `transforms:` block has to mention shape resolution explicitly. ### Auto-prepended action When an origin declares `ai_crawl_control` with no explicit `content_negotiate` action, the compiler prepends one: ```yaml origins: "blog.example.com": action: type: proxy url: https://test.sbproxy.dev policies: - type: ai_crawl_control price: 0.001 currency: USD content_signal: ai-train tiers: - route_pattern: /articles/* content_shape: markdown price: amount_micros: 1000 currency: USD citation_required: true - route_pattern: /articles/* content_shape: html price: amount_micros: 500 currency: USD ``` There is no `content_negotiate` action in the YAML. The compiler synthesises one with `default_content_shape: html`. An incoming `Accept: text/markdown` request is resolved as Markdown on both passes; an incoming `Accept: */*` falls back to HTML; an incoming `Accept: text/html;q=1.0, text/markdown;q=0.9` is priced as HTML (declaration order) and transformed as HTML (q-value winner). ### Override with an explicit action When the operator wants control over the wildcard default, declare a `content_negotiate` action explicitly. The compiler skips the synthesis step in that case. ```yaml origins: "docs.example.com": action: type: content_negotiate default_content_shape: markdown policies: - type: ai_crawl_control price: 0.001 currency: USD ``` With `default_content_shape: markdown`, an `Accept: */*` request resolves to Markdown for both pricing and transformation. An agent that sends no `Accept` header at all gets the Markdown projection. The valid values for `default_content_shape` are `html`, `markdown`, `json`, `pdf`. Absence equals `html`. ### Q-value tie-break Pass 2 is q-value-aware. When two recognised media types tie at the same q-value, the proxy resolves them in canonical preference order: `markdown` beats `json` beats `html` beats `pdf`. This is fixed by the proxy and not configurable, because the canonical order is a transformation-capability constraint, not a pricing decision. The pricing pass remains declaration-order first-match. Operators express pricing intent through the order of tiers in the `ai_crawl_control` policy; agents express transformation preference through q-values. The two surfaces are deliberately independent. ### Worked examples ```bash ## Markdown shape, Markdown tier, Markdown response. curl -i -H 'Host: blog.example.com' \ -H 'User-Agent: GPTBot/1.0' \ -H 'Accept: text/markdown' \ -H 'crawler-payment: tok_a89be2f1' \ http://localhost:8080/articles/foo ``` Expected: `200 OK`, `Content-Type: text/markdown`, body in Markdown, `Content-Signal: ai-train`, `x-markdown-tokens: `. ```bash ## HTML pricing, Markdown rendering (q-value tie-break). curl -i -H 'Host: blog.example.com' \ -H 'User-Agent: GPTBot/1.0' \ -H 'Accept: text/markdown;q=0.9, text/html;q=0.9' \ -H 'crawler-payment: tok_a89be2f1' \ http://localhost:8080/articles/foo ``` Expected: priced against the Markdown tier (declaration order picks `text/markdown` first), but the response body is Markdown because the q-value tie-break in Pass 2 prefers Markdown over HTML. ```bash ## JSON envelope shape. curl -i -H 'Host: blog.example.com' \ -H 'User-Agent: GPTBot/1.0' \ -H 'Accept: application/json' \ -H 'crawler-payment: tok_a89be2f1' \ http://localhost:8080/articles/foo ``` Expected: `200 OK`, `Content-Type: application/json; profile="https://sbproxy.dev/schema/json-envelope/v1"`, body is the JSON envelope (see "JSON envelope shape" below). ## The four projections The proxy serves four well-known documents on every hostname that has an `ai_crawl_control` policy. They are not static files; they are projections of the operator's compiled config. Each one regenerates atomically on every config reload, served from an in-memory cache that the data plane reads with a single atomic load. There is no separate sync process and no separate config store. ### `robots.txt` Served at `/robots.txt`. Format follows IETF draft-koster-rep-ai (the AI-extended robots.txt). ```text ## Generated by SBproxy. Do not edit. ## Config version: 0xa3f9d2c1 User-agent: GPTBot Disallow: /premium/* Crawl-delay: 1 ## SBproxy-AI-Extension: pay-per-crawl price=0.005 currency=USD shape=html User-agent: * Disallow: ``` One `User-agent:` stanza per agent class with at least one priced tier. The `# SBproxy-AI-Extension:` comment lines carry pricing metadata for cooperative crawlers; the prefix is intentionally non-standard pending IETF standardisation. Agent classes resolved from `tiers[].agent_id` selectors; `*` is the wildcard. ### `llms.txt` and `llms-full.txt` Served at `/llms.txt` (concise) and `/llms-full.txt` (full). Format follows the Anthropic / Mistral convention: a metadata block followed by a Markdown site description. ```text ## sitename: blog.example.com ## version: 0xa3f9d2c1 ## payment: pay-per-request ## shapes: html, markdown, json ## Pay-per-crawl content This site is monetized via SBproxy. Cooperative crawlers can read the license terms at /licenses.xml and the rights reservation at /.well-known/tdmrep.json. ``` `llms-full.txt` adds a Markdown listing of every priced route. Both bodies regenerate at config reload time. ### `/licenses.xml` Served at `/licenses.xml`. RSL 1.0 format. The root element is ``; one `` element wraps the `` body. ```xml ai-train ``` The `` value is the canonical "every URL on this origin" glob (`https:///*`); the wire format follows the prose spec at https://rslstandard.org/rsl. The URN format is `urn:rsl:1.0::`. The same URN appears in the `license` field of the JSON envelope so an agent that consumes the envelope and the licenses.xml document sees a consistent identifier. The `Content-Signal` to `` mapping is documented in detail in [rsl.md](rsl.md). ### `/.well-known/tdmrep.json` Served at `/.well-known/tdmrep.json`. W3C TDMRep CG-FINAL format: a bare JSON array at the document root, no envelope object. One entry per priced route. Each entry is an object with three hyphenated keys: `location` (URL the policy applies to), `tdm-reservation` (`1` reserves rights, `0` waives them), and `tdm-policy` (URL of the policy document the agent can fetch to negotiate access). ```json [ { "location": "/articles/*", "tdm-reservation": 1, "tdm-policy": "https://blog.example.com/licenses.xml" } ] ``` When the origin asserts a recognised `Content-Signal` (`ai-train`, `ai-input`, or `search`), each priced route in the policy emits an entry with `tdm-reservation: 1` and a `tdm-policy` pointing at the companion `/licenses.xml` document on the same origin. When the signal is absent, the array is empty (the response middleware instead stamps a `TDM-Reservation: 1` header on every response, so the right is reserved at the header layer rather than asserted in the body). The wire format follows the prose spec at https://www.w3.org/community/reports/tdmrep/CG-FINAL-tdmrep-20240510/. The W3C TDMRep CG-FINAL is prose-only; there is no canonical JSON Schema published upstream. ### Refresh-on-config-reload semantics The four projections live in a single `Arc` cache, swapped atomically on every config reload via `ArcSwap::store`. Readers pay one atomic load per request; writers pay one store per reload. There is no locking on the data path. The reload path computes a config version hash, passes it to the projection engine, and stamps it on every regenerated document. The hot path checks the version against the live pipeline before serving so a stale cache hit is impossible in steady state. Every projection regeneration emits one `AdminAuditEvent` per (hostname, projection kind) pair with `action: PolicyProjectionRefresh`, `target_kind: "PolicyProjection"`, and an `after.doc_hash` SHA-256 of the body. An operator with 10 origins sees 40 audit events per reload. The hash lets external auditors verify that the served document matches what was recorded at reload time. ### Operator preview via CLI Operators preview a projection before pushing config with the `sbproxy projections render` CLI subcommand. The CLI compiles the YAML the same way the proxy boot path does, runs the projection engine on the compiled output, and writes the document to stdout. ```bash sbproxy projections render --kind robots --config ./sb.yml sbproxy projections render --kind llms --config ./sb.yml sbproxy projections render --kind licenses --config ./sb.yml sbproxy projections render --kind tdmrep --config ./sb.yml ``` The output is byte-for-byte identical to the document the proxy would serve for the same config. Use it in CI to gate config changes on the projection content. ## Per-tier `content_signal` config `Content-Signal` is a per-route editorial declaration. Operators set it at the origin level (one value for the whole hostname) or at the tier level (overriding the origin value for matching routes). ```yaml origins: "blog.example.com": action: type: proxy url: https://test.sbproxy.dev policies: - type: ai_crawl_control content_signal: ai-train # origin-level default tiers: - route_pattern: /premium/* content_signal: ai-input # override: premium content licensed for inference, not training price: amount_micros: 5000 currency: USD - route_pattern: /articles/* price: amount_micros: 1000 currency: USD ``` The valid values are `ai-train`, `ai-input`, `search`. The set is closed; an unknown value rejects the config at load time with an error referencing this guide. The matched tier's value (or the origin default when no tier matches) is stamped on `Content-Signal:` on every 200 response. A missing value means the response carries no header; existing crawlers see no change. The `Content-Signal` header is a cooperative signal for standards-compliant crawlers and a mandatory field in the `` element of `/licenses.xml`. It is not security-critical; a motivated crawler can ignore it. The fact that it is asserted on the wire is what makes it actionable downstream: the JSON envelope's `license` URN and the `/licenses.xml` body together carry the operator's binding declaration of license terms. ## JSON envelope shape When the agent sends `Accept: application/json` and the matched tier resolves to `Json` shape, the proxy wraps the page's Markdown body in a structured envelope. ```json { "schema_version": "1", "title": "Article Title", "url": "https://blog.example.com/articles/foo", "license": "urn:rsl:1.0:blog.example.com:0xa3f9d2c1", "content_md": "# Article Title\n\nBody in Markdown...", "fetched_at": "2026-05-01T12:00:00Z", "citation_required": true, "schema_org": { "@context": "https://schema.org", "@type": "Article" }, "token_estimate": 1420 } ``` | Field | Type | Notes | |---|---|---| | `schema_version` | string | Currently `"1"`. String, not integer, for forward-compat. | | `title` | string | Page title. Empty string when no title is determinable. | | `url` | string | Canonical URL. Falls back to the request URL when the upstream sends no `Content-Location`. | | `license` | string | RSL URN from `/licenses.xml` for this origin, or `"all-rights-reserved"` when no RSL policy is configured. Never empty. | | `content_md` | string | Markdown body. Same content as the `text/markdown` response for the same request. | | `fetched_at` | string | RFC 3339 timestamp at which the proxy fetched the upstream response. UTC, millisecond precision. | | `citation_required` | bool | `true` when the matched tier sets `citation_required: true`. | | `schema_org` | object | Pass-through of the page's first JSON-LD block. `null` or absent when the page has none. | | `token_estimate` | integer | Approximate token count of `content_md`. Identical to the `x-markdown-tokens` response header value. | The response is served with: ``` Content-Type: application/json; profile="https://sbproxy.dev/schema/json-envelope/v1" ``` The `profile` parameter follows RFC 6906. The URL is a stable documentation anchor; agents can branch on it to handle multiple schema versions during a dual-emit window. The profile URL is independent of the `schema_version` field; both will track together in practice but are separate fields because `schema_version` is in the body (for parsers that read the body before headers) and `profile` is in the header (for parsers that decide before parsing). ### Versioning and dual-emit `schema_version` is a string for forward-compat with potential `"1.1"` soft additions. Adding an optional field is non-breaking and does not bump the version. Removing a field, renaming a field, or changing a field's type is breaking and bumps to `"2"`. A v2 ships with a dual-emit window: the proxy emits both v1 and v2 envelopes depending on the agent's `Accept` profile parameter. An agent that sends `Accept: application/json; profile="https://sbproxy.dev/schema/json-envelope/v1"` receives v1; an agent that sends the v2 profile URL receives v2. After the deprecation window, the v1 profile gets `406 Not Acceptable` with an upgrade prompt. ### PII redaction The redaction middleware (in `sbproxy-security::pii`) runs over the entire serialised envelope body. The `content_md` field is the primary redaction target; `title`, `url`, `license`, and the metadata fields are proxy-generated and not subject to content redaction. `schema_org` is upstream pass-through and is redacted along with `content_md` because the operator's PII policy may not be aware of every field the upstream embeds. This is fail-safe over precision. A future revision can add a per-origin `pii_exclude_fields` config to exempt specific JSON paths from redaction. ## Transforms Four response-body transforms are added to the response pipeline in this order: 1. **`boilerplate`**: drops `