Overview
Circuit Breaker in Bifrost Enterprise automatically reroutes LLM requests to a fallback provider when a primary provider endpoint shows signs of degradation. Instead of letting throttled or degraded requests fail, Bifrost detects the signal in the response headers, opens the circuit, and transparently redirects subsequent requests to your configured fallback — until the cooldown window expires and the primary is retried.Key Features
| Feature | Description |
|---|---|
| Header-based signals | Trips on HTTP response headers returned by the provider — no latency heuristics or error-rate windows to tune |
| Per-model failover | Each policy targets a specific provider + model combination; other traffic is unaffected |
| Per-key sub-circuits | Optionally track state per API key so a single degraded key doesn’t block healthy ones |
| AND / OR operators | Open the circuit when any signal matches (OR) or only when all match simultaneously (AND) |
| Dynamic cooldown | Read cooldown duration directly from a response header (e.g. retry-after-ms) or fall back to a configured static duration |
How It Works
Every request that matches a circuit breaker policy passes through two hooks:- Pre-request hook — checks whether the circuit is open. If open, the request is immediately rerouted to the fallback provider and model. The original target is not contacted.
- Post-response hook — evaluates the response headers from the primary target against the policy’s condition. If the condition matches, the circuit opens for the configured cooldown duration.
Configuration
Web UI
Navigate to Circuit Breaker in the Bifrost dashboard to create and manage policies.
config.json
Add acircuit_breaker_config block at the root of your config.json:
Policy Properties
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | Yes | — | Unique name for this policy |
enabled | boolean | No | true | When false, the policy is registered but all hooks skip it |
primary_provider | string | Yes | — | Provider to monitor (e.g. azure, openai) |
primary_model | string | Yes | — | Model name as it appears in requests (e.g. gpt-4o-ptu) |
primary_key_ids | string[] | No | [] | API key UUIDs to track individually. See Key-Level Sub-Circuits |
fallback_provider | string | Yes | — | Provider to route to when the circuit is open |
fallback_model | string | Yes | — | Model to request from the fallback provider |
condition | object | Yes | — | Signal condition that opens the circuit. See Signals |
default_cooldown | string | No | 30s | How long to keep the circuit open. Accepts a Go duration string: 30s, 5m, 1h. See Cooldown |
cooldown_header | string | No | — | Response header name to read the cooldown duration from (in milliseconds). Falls back to default_cooldown when absent or unparsable. See Cooldown |
Condition Properties
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
operator | OR | AND | No | OR | How multiple signals are combined. OR opens the circuit when any signal matches; AND requires all signals to match simultaneously |
signals | Signal[] | Yes | — | List of response signals to evaluate. At least one required |
Signal Properties
| Property | Type | Required | Description |
|---|---|---|---|
source | response_header | Yes | What part of the HTTP response to inspect. Currently only response_header is supported |
header_name | string | Yes | HTTP response header name to inspect (case-insensitive) |
header_value | string | No | Trips when the header value exactly equals this string (case-insensitive). Mutually exclusive with header_contains |
header_contains | string | No | Trips when the header value contains this substring (case-insensitive). Mutually exclusive with header_value. If neither is set, the signal trips whenever the header is present |
Signals
Signals define what Bifrost watches for in the provider’s HTTP response. Each signal inspects a single response header using one of three match modes:| Match Mode | Config | Trips when… |
|---|---|---|
| Exists | Only header_name set | The header is present in the response, regardless of value |
| Equals | header_name + header_value | The header value exactly matches (case-insensitive) |
| Contains | header_name + header_contains | The header value contains the substring (case-insensitive) |
Key-Level Sub-Circuits
By default, a policy uses a single shared circuit for all API keys serving the configured primary provider and model. If one key is degraded, the circuit opens and all requests to that provider+model route to the fallback — even requests that could have been served by a healthy key. Setprimary_key_ids to a list of key UUIDs to enable per-key tracking:
Cooldown
When the circuit opens, Bifrost blocks the primary provider for a cooldown duration before probing it again.Static cooldown
Setdefault_cooldown to a Go duration string. The circuit stays open for exactly this duration:
ns, us, ms, s, m, h.
Header-driven cooldown
Some providers return a header telling clients how long to back off. Setcooldown_header to read that value (expected in milliseconds):
retry-after-ms is present and parsable, Bifrost uses its value as the cooldown. If the header is absent or cannot be parsed, default_cooldown is used as the fallback.
Example: Azure PTU → PAYG Spillover
Azure OpenAI Provisioned Throughput Units (PTU) offer predictable latency at fixed capacity. When PTU capacity is exhausted, Azure signals spillover via a response header. This policy detects that signal and routes subsequent requests to a Pay-As-You-Go deployment until the PTU recovers.- Requests arrive targeting
gpt-4o-ptuon Azure. - When PTU capacity is exhausted, Azure returns
X-Ms-Is-Spilled-Over: truein the response. - Bifrost detects the header, opens the circuit for 30 seconds.
- All subsequent requests within the cooldown window are transparently rerouted to
gpt-4o-paygo— no changes required in your application. - After 30 seconds, Bifrost probes the PTU deployment again. If spillover is no longer signalled, the circuit closes and PTU traffic resumes.

