Governance - Bifrost

Governance lets you control who can call which providers, how much they can spend, how fast they can go, and how traffic is routed. Everything is declared under bifrost.governance in your values file and seeded into the database at startup.

The governance plugin must also be enabled for enforcement to take effect:

bifrost:
  plugins:
    governance:
      enabled: true

See the Plugins page for plugin configuration details.

Admin Authentication

Protect the Bifrost dashboard and management API with username/password auth.

kubectl create secret generic bifrost-admin-credentials \
  --from-literal=username='admin' \
  --from-literal=password='your-secure-admin-password'

bifrost:
  governance:
    authConfig:
      isEnabled: true
      disableAuthOnInference: false   # keep auth on inference routes
      existingSecret: "bifrost-admin-credentials"
      usernameKey: "username"
      passwordKey: "password"

helm upgrade bifrost bifrost/bifrost --reuse-values -f governance-auth-values.yaml

Budgets

Spending caps that reset on a configurable period. Budgets are referenced by ID from virtual keys, teams, customers, or providers.

Reset duration	Syntax
30 seconds	`"30s"`
5 minutes	`"5m"`
1 hour	`"1h"`
1 day	`"1d"`
1 week	`"1w"`
1 month	`"1M"`
1 year	`"1Y"`

bifrost:
  governance:
    budgets:
      - id: "budget-dev"
        max_limit: 50          # $50 per month
        reset_duration: "1M"

      - id: "budget-production"
        max_limit: 500         # $500 per month
        reset_duration: "1M"

      - id: "budget-testing"
        max_limit: 10          # $10 per day
        reset_duration: "1d"

      - id: "budget-enterprise"
        max_limit: 5000        # $5000 per month
        reset_duration: "1M"

Rate Limits

Token and request-count caps per time window. Referenced by ID from virtual keys, teams, customers, or providers.

bifrost:
  governance:
    rateLimits:
      - id: "rate-limit-standard"
        token_max_limit: 100000       # 100K tokens per hour
        token_reset_duration: "1h"
        request_max_limit: 1000       # 1000 requests per hour
        request_reset_duration: "1h"

      - id: "rate-limit-high"
        token_max_limit: 500000       # 500K tokens per hour
        token_reset_duration: "1h"
        request_max_limit: 5000
        request_reset_duration: "1h"

      - id: "rate-limit-burst"
        token_max_limit: 50000        # 50K tokens per minute (burst)
        token_reset_duration: "1m"
        request_max_limit: 500
        request_reset_duration: "1m"

      - id: "rate-limit-testing"
        token_max_limit: 10000
        token_reset_duration: "1h"
        request_max_limit: 100
        request_reset_duration: "1h"

Customers & Teams

Optional organizational hierarchy. Virtual keys can be assigned to customers or teams, inheriting their budgets and rate limits.

bifrost:
  governance:
    customers:
      - id: "customer-acme"
        name: "Acme Corp"
        budget_id: "budget-production"
        rate_limit_id: "rate-limit-high"

      - id: "customer-startup"
        name: "Startup Inc"
        budget_id: "budget-dev"
        rate_limit_id: "rate-limit-standard"

    teams:
      - id: "team-platform"
        name: "Platform Team"
        customer_id: "customer-acme"
        budget_id: "budget-enterprise"
        rate_limit_id: "rate-limit-high"

      - id: "team-ml"
        name: "ML Team"
        customer_id: "customer-acme"
        budget_id: "budget-production"
        rate_limit_id: "rate-limit-standard"

Virtual Keys

Virtual keys are the primary access tokens issued to callers. They scope which providers, models, and underlying API keys are accessible.

bifrost:
  governance:
    virtualKeys:
      # 1. Unrestricted dev key - access to every provider
      - id: "vk-dev-all"
        name: "Dev: all providers"
        value: "vk-dev-all-secret-token"
        is_active: true
        budget_id: "budget-dev"
        rate_limit_id: "rate-limit-standard"
        # No provider_configs → all providers allowed

      # 2. OpenAI only - restricted to two models
      - id: "vk-openai-prod"
        name: "OpenAI Production"
        value: "vk-openai-prod-secret-token"
        is_active: true
        budget_id: "budget-production"
        rate_limit_id: "rate-limit-high"
        provider_configs:
          - provider: "openai"
            weight: 1
            allowed_models: ["gpt-4o", "gpt-4o-mini"]

      # 3. Multi-provider with weighted routing
      - id: "vk-multi"
        name: "Multi-provider weighted"
        value: "vk-multi-secret-token"
        is_active: true
        budget_id: "budget-production"
        rate_limit_id: "rate-limit-high"
        provider_configs:
          - provider: "openai"
            weight: 2         # 50%
            allowed_models: ["*"]
          - provider: "anthropic"
            weight: 1         # 25%
            allowed_models: ["*"]
          - provider: "groq"
            weight: 1         # 25%
            allowed_models: ["*"]

      # 4. Team-scoped key
      - id: "vk-platform-team"
        name: "Platform Team Key"
        value: "vk-platform-team-token"
        is_active: true
        team_id: "team-platform"       # inherits team budget/rate-limit
        provider_configs:
          - provider: "openai"
            weight: 1
            allowed_models: ["*"]
            key_ids: ["openai-primary"]  # pin to specific configured key by name

      # 5. Restricted testing key
      - id: "vk-testing"
        name: "Testing (gpt-4o-mini only)"
        value: "vk-testing-token"
        is_active: true
        budget_id: "budget-testing"
        rate_limit_id: "rate-limit-testing"
        provider_configs:
          - provider: "openai"
            weight: 1
            allowed_models: ["gpt-4o-mini"]

      # 6. Batch API key
      - id: "vk-batch"
        name: "Batch API workloads"
        value: "vk-batch-token"
        is_active: true
        budget_id: "budget-production"
        rate_limit_id: "rate-limit-burst"
        provider_configs:
          - provider: "openai"
            weight: 1
            allowed_models: ["*"]
            key_ids: ["openai-batch"]    # only the batch-flagged key

provider_configs[].key_ids and provider_configs[].keys are both supported in Helm values. Prefer key_ids for parity with config.json (key_ids should contain provider key names). Use a virtual key in API calls:

curl http://localhost:8080/v1/chat/completions \
  -H "x-bf-vk: vk-openai-prod-secret-token" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

Model Limits

Apply budgets and rate limits at the model level. Each entry is keyed on model_name (use "*" for all models), an optional provider, and a scope that determines who the limit applies to.

Field	Default	Description
`id`	—	Unique identifier
`model_name`	—	Model name, or `"*"` to match all models
`provider`	(all)	Provider name; omit to cover all providers
`scope`	`"global"`	`"global"` (all traffic) or `"virtual_key"` (one VK)
`scope_id`	—	Required when `scope` is `"virtual_key"` — the virtual key ID
`budget_id`	—	References a `governance.budgets` entry
`rate_limit_id`	—	References a `governance.rate_limits` entry

bifrost:
  governance:
    modelConfigs:
      # Global cap on a specific model across all traffic
      - id: "mc-gpt4o-global"
        model_name: "gpt-4o"
        provider: "openai"
        budget_id: "budget-production"
        rate_limit_id: "rate-limit-high"

      # Global provider-level budget (all models, all traffic, openai only)
      - id: "mc-openai-provider"
        model_name: "*"
        provider: "openai"
        budget_id: "budget-production"

      # VK-scoped top-level budget (all models, all providers, one VK)
      - id: "mc-vk-dev-toplevel"
        model_name: "*"
        scope: "virtual_key"
        scope_id: "vk-dev-all"
        budget_id: "budget-dev"

      # VK-scoped per-provider budget (all models, anthropic only, one VK)
      - id: "mc-vk-dev-anthropic"
        model_name: "*"
        provider: "anthropic"
        scope: "virtual_key"
        scope_id: "vk-dev-all"
        budget_id: "budget-testing"
        rate_limit_id: "rate-limit-standard"

Provider Governance

Apply budgets and rate limits at the provider level:

bifrost:
  governance:
    providers:
      - name: "openai"
        budget_id: "budget-production"
        rate_limit_id: "rate-limit-high"
        send_back_raw_request: false
        send_back_raw_response: false

      - name: "anthropic"
        budget_id: "budget-production"
        rate_limit_id: "rate-limit-standard"

Routing Rules

CEL-expression-based routing rules redirect requests to different providers or models based on request attributes.

Field	Description
`cel_expression`	CEL expression evaluated against the request; if `true`, rule fires
`targets`	Provider/model targets with weights
`fallbacks`	Providers to try if all targets fail
`scope`	`global`, `team`, `customer`, or `virtual_key`
`scope_id`	Required for non-global scopes
`priority`	Lower number = evaluated first

bifrost:
  governance:
    routingRules:
      # Route all GPT requests to Azure
      - id: "route-gpt-to-azure"
        name: "GPT → Azure"
        description: "Route all GPT model requests to Azure OpenAI"
        enabled: true
        cel_expression: "model.startsWith('gpt-')"
        targets:
          - provider: "azure"
            model: ""        # empty = use original model name
            weight: 1.0
        fallbacks: ["openai"]
        scope: "global"
        priority: 0

      # Route heavy models to a slower but cheaper provider
      - id: "route-heavy-to-groq"
        name: "Large context → Groq"
        enabled: true
        cel_expression: "model == 'gpt-4o' && request_body.max_tokens > 4000"
        targets:
          - provider: "groq"
            model: "llama-3.3-70b-versatile"
            weight: 1.0
        fallbacks: ["openai"]
        scope: "global"
        priority: 1

      # Team-scoped rule
      - id: "route-ml-team-bedrock"
        name: "ML Team → Bedrock"
        enabled: true
        cel_expression: "true"    # match all requests for this scope
        targets:
          - provider: "bedrock"
            model: ""
            weight: 1.0
        fallbacks: ["openai"]
        scope: "team"
        scope_id: "team-ml"
        priority: 0

Complexity Router Configuration

If you use complexity_tier in routing rules, you can seed the analyzer thresholds and keyword lists from Helm. The chart renders this block to governance.complexity_analyzer_config in config.json. Omit this block, or leave complexityAnalyzerConfig: null, to use the built-in defaults.

bifrost:
  governance:
    complexityAnalyzerConfig:
      tier_boundaries:
        simple_medium: 0.15
        medium_complex: 0.35
        complex_reasoning: 0.60
      keywords:
        code_keywords: ["function", "class", "api", "debug", "deploy"]
        reasoning_keywords: ["step by step", "explain why", "tradeoffs", "root cause analysis"]
        technical_keywords: ["architecture", "kubernetes", "latency", "authentication"]
        simple_keywords: ["hello", "hi", "thanks", "what is", "define"]

In the default split mode, runtime UI and API edits are preserved while the matching Helm-rendered section is unchanged. When Helm changes a section, tier boundaries are replaced from the rendered config.json, while keyword lists are merged additively with stored runtime keywords (union with duplicates removed). Use bifrost.sourceOfTruth: config.json only when Helm should replace stored governance state. See Source of Truth & Reconciliation for the full startup rules.

Full Example

# governance-full-values.yaml
image:
  tag: "v1.4.11"

bifrost:
  encryptionKeySecret:
    name: "bifrost-encryption"
    key: "encryption-key"

  plugins:
    governance:
      enabled: true
      config:
        is_vk_mandatory: true

  governance:
    authConfig:
      isEnabled: true
      existingSecret: "bifrost-admin-credentials"
      usernameKey: "username"
      passwordKey: "password"

    budgets:
      - id: "budget-production"
        max_limit: 500
        reset_duration: "1M"
      - id: "budget-dev"
        max_limit: 50
        reset_duration: "1M"

    rateLimits:
      - id: "rate-limit-standard"
        token_max_limit: 100000
        token_reset_duration: "1h"
        request_max_limit: 1000
        request_reset_duration: "1h"

    virtualKeys:
      - id: "vk-production"
        name: "Production"
        value: "vk-prod-secret-token"
        is_active: true
        budget_id: "budget-production"
        rate_limit_id: "rate-limit-standard"
        provider_configs:
          - provider: "openai"
            weight: 1
            allowed_models: ["gpt-4o", "gpt-4o-mini"]

kubectl create secret generic bifrost-encryption \
  --from-literal=encryption-key='your-32-byte-key'

kubectl create secret generic bifrost-admin-credentials \
  --from-literal=username='admin' \
  --from-literal=password='secure-admin-password'

helm install bifrost bifrost/bifrost -f governance-full-values.yaml

Access Profiles (Enterprise)

You can seed enterprise access_profiles directly from Helm values. The chart renders bifrost.accessProfiles into top-level access_profiles in config.json.

bifrost:
  accessProfiles:
    - name: "platform-default"
      description: "Default profile for platform users"
      is_active: true
      tags: ["platform", "default"]
      provider_configs:
        - provider_name: "openai"
          all_models_allowed: false
          allowed_models: ["gpt-4o", "gpt-4o-mini"]
      mcp_servers:
        - mcp_server_id: "github"
      mcp_tool_overrides:
        - mcp_client_id: "github"
          tool_name: "create_pull_request"
          action: "include"

​Admin Authentication

​Budgets

​Rate Limits

​Customers & Teams

​Virtual Keys

​Model Limits

​Provider Governance

​Routing Rules

​Complexity Router Configuration

​Full Example

​Access Profiles (Enterprise)

Admin Authentication

Budgets

Rate Limits

Customers & Teams

Virtual Keys

Model Limits

Provider Governance

Routing Rules

Complexity Router Configuration

Full Example

Access Profiles (Enterprise)