> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getbifrost.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Provider Routing

> Understand how Bifrost routes requests across AI providers using governance rules and adaptive load balancing.

## Overview

Bifrost offers two powerful methods for routing requests across AI providers, each serving different use cases:

1. **Governance-based Routing**: Explicit, user-defined routing rules configured via Virtual Keys
2. **Adaptive Load Balancing**: Automatic, performance-based routing powered by real-time metrics (Enterprise feature)

When both methods are available, **governance takes precedence** because users have explicitly defined their routing preferences through provider configurations on Virtual Keys.

<Info>
  **When to use which method:**

  * Use **Governance** when you need explicit control, compliance requirements, or specific cost optimization strategies
  * Use **Adaptive Load Balancing** for automatic performance optimization and minimal configuration overhead
</Info>

***

## The Model Catalog

The Model Catalog is Bifrost's central registry that tracks which models are available from which providers. It powers both governance-based routing and adaptive load balancing by maintaining an up-to-date mapping of models to providers.

<Info>
  **Architecture Documentation**: For detailed technical documentation on the
  Model Catalog implementation, including API reference, thread safety, and
  advanced usage patterns, see [Model Catalog
  Architecture](/architecture/framework/model-catalog).
</Info>

### Data Sources

The Model Catalog combines two data sources to maintain a comprehensive and up-to-date model registry:

1. **Pricing Data** (Primary source)
   * Downloaded from a remote URL (configurable, defaults to `https://getbifrost.ai/datasheet`)
   * Contains model names, pricing tiers, and provider mappings
   * Synced to database on startup and refreshed periodically (default: every 24 hours)
   * Used for cost calculation and initial model-to-provider mapping
   * **Stored as**: In-memory map `pricingData[model|provider|mode]` for O(1) lookups

2. **Provider List Models API** (Secondary source)
   * Calls each provider's `/v1/models` endpoint during startup
   * Enriches the catalog with provider-specific models and aliases
   * Re-fetched when providers are added/updated via API or dashboard
   * Adds models that may not be in pricing data yet (e.g., newly released models)
   * **Stored as**: In-memory map `modelPool[provider][]models`

<Info>
  **Why two sources?** Pricing data provides comprehensive model coverage with
  cost information, while the List Models API ensures you can use newly released
  models immediately without waiting for pricing data updates.
</Info>

### How Model Availability is Determined

Bifrost uses a sophisticated multi-step process to determine if a model is available for a provider:

<AccordionGroup>
  <Accordion title="GetModelsForProvider(provider)">
    **Purpose**: Find all models available for a specific provider

    **Lookup Process**:

    1. Check `modelPool[provider]` for direct matches
    2. Return all models in that provider's slice

    **Example**:

    ```go theme={null}
    models := GetModelsForProvider("openai")
    // Returns: ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-3.5-turbo", ...]
    ```

    **Used by**:

    * Routing Methods to validate `allowed_models`
    * Dashboard model selector dropdowns
    * API responses for `/v1/models?provider=openai`
  </Accordion>

  <Accordion title="GetProvidersForModel(model)">
    **Purpose**: Find all providers that support a specific model

    **Lookup Process**:

    1. **Direct lookup**: Check each provider's model list in `modelPool`
    2. **Cross-provider resolution**: Apply special handling for proxy providers

    **Special Cross-Provider Rules**:

    <Steps>
      <Step title="OpenRouter Format">
        If model is not found directly, check if `provider/model` exists in OpenRouter

        ```go theme={null}
        // Request: claude-3-5-sonnet
        // Checks: openrouter models for "anthropic/claude-3-5-sonnet"
        // Result: Adds "openrouter" to providers list
        ```
      </Step>

      <Step title="Vertex Format">
        If model is not found directly, check if `provider/model` exists in Vertex

        ```go theme={null}
        // Request: claude-3-5-sonnet
        // Checks: vertex models for "anthropic/claude-3-5-sonnet"
        // Result: Adds "vertex" to providers list
        ```
      </Step>

      <Step title="Groq OpenAI Compatibility">
        For GPT models, check if `openai/model` exists in Groq

        ```go theme={null}
        // Request: gpt-3.5-turbo
        // Checks: groq models for "openai/gpt-3.5-turbo"
        // Result: Adds "groq" to providers list
        ```
      </Step>

      <Step title="Bedrock Claude Models">
        For Claude models, check Bedrock with flexible matching

        ```go theme={null}
        // Request: claude-3-5-sonnet
        // Checks: bedrock models containing "claude-3-5-sonnet"
        // Matches: "anthropic.claude-3-5-sonnet-20240620-v1:0"
        // Result: Adds "bedrock" to providers list
        ```
      </Step>
    </Steps>

    **Example**:

    ```go theme={null}
    providers := GetProvidersForModel("claude-3-5-sonnet")
    // Returns: ["anthropic", "vertex", "bedrock", "openrouter"]
    // Even though the request was just "claude-3-5-sonnet"!
    ```

    **Used by**:

    * Load balancing to find candidate providers
    * Fallback generation
    * Model validation in requests
  </Accordion>

  <Accordion title="Pricing Lookup with Fallbacks">
    **Purpose**: Get pricing data for cost calculation and model validation

    **Lookup Key**: `model|provider|mode` (e.g., `gpt-4o|openai|chat`)

    **Fallback Chain**:

    1. **Primary lookup**: `model|provider|requestType`
    2. **Gemini → Vertex**: If Gemini not found, try Vertex with same model
    3. **Vertex format stripping**: For `provider/model`, strip prefix and retry
    4. **Bedrock prefix handling**: For Claude models, try with `anthropic.` prefix
    5. **Responses → Chat**: If Responses mode not found, try Chat mode

    **Example Flow**:

    ```go theme={null}
    // Request: claude-3-5-sonnet on Gemini (Responses API)

    // 1. Try: claude-3-5-sonnet|gemini|responses → Not found
    // 2. Try: claude-3-5-sonnet|vertex|responses → Not found
    // 3. Try: claude-3-5-sonnet|vertex|chat → ✅ Found!

    // Pricing returned from vertex/chat mode
    ```

    **Used by**:

    * Cost calculation for billing
    * Model validation during routing
    * Budget enforcement
  </Accordion>
</AccordionGroup>

### Syncing Behavior

<AccordionGroup>
  <Accordion title="Initial Sync (Startup)">
    When Bifrost starts, it performs a complete model catalog initialization:

    **Step-by-step process** (from `server.go:Bootstrap()`):

    <Steps>
      <Step title="Load Pricing Data">
        ```go theme={null}
        // 1. Download from URL
        pricingData := loadPricingFromURL(ctx)

        // 2. Store in database (if configStore available)
        configStore.CreateModelPrices(ctx, pricingData)

        // 3. Load into memory cache
        mc.pricingData = map[string]TableModelPricing{...}
        ```
      </Step>

      <Step title="Populate Initial Model Pool">
        ```go theme={null}
        // Build modelPool from pricing data
        mc.populateModelPoolFromPricingData()
        // Result: modelPool[provider] = [models from pricing]
        ```
      </Step>

      <Step title="Fetch Dynamic Models">
        ```go theme={null}
        // Call ListAllModels for all configured providers
        modelData, err := client.ListAllModels(ctx, nil)

        // Add results to model pool
        mc.AddModelDataToPool(modelData)
        // Result: modelPool enriched with provider-specific models
        ```
      </Step>

      <Step title="Handle Failures Gracefully">
        If list models API fails for a provider:

        ```json theme={null}
        {"level":"warn","message":"failed to list models for provider ollama: connection refused"}
        ```

        * Logged as warning, **does not stop startup**
        * Provider remains usable with models from pricing data
        * Can be manually refreshed later via API
      </Step>
    </Steps>

    **Result**: Bifrost is ready with a comprehensive model catalog combining both sources.
  </Accordion>

  <Accordion title="Ongoing Sync (Background)">
    While Bifrost is running, the catalog stays up-to-date through background workers:

    **Pricing Data Sync**:

    * Background worker runs every **1 hour** (ticker interval)
    * Checks if **24 hours** have elapsed since last sync (configurable)
    * If yes, downloads fresh pricing data and updates database + memory cache
    * Timer resets after successful sync

    **List Models API Sync**:
    Triggered by these events:

    1. **Provider Added**: When a new provider is configured
       ```bash theme={null}
       POST /api/v1/providers
       # Automatically calls ListModels for the new provider
       ```

    2. **Provider Updated**: When provider config changes (keys, endpoints, etc.)
       ```bash theme={null}
       PUT /api/v1/providers/{provider}
       # Refetches models to detect changes
       ```

    3. **Manual Refresh**: Via API endpoint
       ```bash theme={null}
       POST /api/v1/providers/{provider}/models/refetch
       # Explicitly refetches models for a provider
       ```

    4. **Manual Delete + Refetch**: Clear and reload models
       ```bash theme={null}
       DELETE /api/v1/providers/{provider}/models
       POST /api/v1/providers/{provider}/models/refetch
       # Useful when models are out of sync
       ```

    **Failure Handling**:

    * Pricing URL fails but database has data → Use cached database records
    * Pricing URL fails and no database data → Error logged, existing memory cache retained
    * List models API fails → Log warning, retain existing model pool entries
  </Accordion>

  <Accordion title="Fallback Strategy">
    Bifrost's multi-layered approach ensures high availability:

    **Layer 1: Pricing Data Persistence**

    ```
    URL fails → Database → Memory cache → Continue operation
    ```

    **Layer 2: Model Pool Redundancy**

    ```
    ListModels fails → Pricing data models → Continue with reduced catalog
    ```

    **Layer 3: Runtime Validation**

    ```
    Model not in catalog → Special cross-provider rules → May still work
    ```

    **Example Scenario**:

    ```
    Situation:
    - Pricing URL is down
    - OpenAI ListModels API is down
    - User requests gpt-4o on OpenAI

    Bifrost's Response:
    1. ✅ Pricing data available from database (last sync 12h ago)
    2. ✅ Model pool has gpt-4o from previous ListModels call
    3. ✅ Request proceeds normally
    4. 📊 Cost calculated from cached pricing data
    ```

    This design ensures **requests never fail due to sync issues** as long as one data source is available.
  </Accordion>
</AccordionGroup>

### Allowed Models Behavior with Examples

The `allowed_models` field in provider configs controls which models can be used with that provider. Understanding its behavior is crucial for governance routing.

<Tabs>
  <Tab title="Wildcard allowed_models (Use Catalog)">
    **Configuration**:

    ```json theme={null}
    {
      "provider_configs": [
        {
          "provider": "openai",
          "allowed_models": ["*"],
          "weight": 1.0
        }
      ]
    }
    ```

    **Behavior**:

    * Bifrost calls `GetModelsForProvider("openai")`
    * Returns all models in `modelPool["openai"]`
    * Request validated against catalog

    **Examples**:

    ```bash theme={null}
    # ✅ Allowed (in catalog)
    curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

    # ✅ Allowed (in catalog)
    curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-3.5-turbo"}'

    # ❌ Rejected (not in OpenAI catalog)
    curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet"}'
    ```

    **Use Cases**:

    * Default behavior for most deployments
    * Automatically stays up-to-date with provider's model offerings
    * No manual model list maintenance required

    <Warning>
      Using `"allowed_models": []` (empty array) means **deny all models** - no
      requests will be served. Use `["*"]` to allow all models via the catalog.
    </Warning>
  </Tab>

  <Tab title="Explicit allowed_models (Strict Control)">
    **Configuration**:

    ```json theme={null}
    {
      "provider_configs": [
        {
          "provider": "openai",
          "allowed_models": ["gpt-4o", "gpt-4o-mini"], // Only these two
          "weight": 1.0
        },
        {
          "provider": "anthropic",
          "allowed_models": ["claude-3-5-sonnet-20241022"], // Specific version
          "weight": 1.0
        }
      ]
    }
    ```

    **Behavior**:

    * Bifrost validates request model against explicit list
    * Catalog is **ignored** for this provider
    * Supports both direct matches and provider-prefixed entries
    * Case-sensitive matching

    **Examples**:

    ```bash theme={null}
    # ✅ Allowed (in explicit list)
    curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

    # ❌ Rejected (not in explicit list)
    curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4-turbo"}'
    # Even though gpt-4-turbo is in the OpenAI catalog!

    # ✅ Allowed (exact match for Anthropic)
    curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet-20241022"}'

    # ❌ Rejected (version mismatch)
    curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet-20240620"}'
    ```

    **Provider-Prefixed Entries**:

    You can also use provider-prefixed model names in `allowed_models`. Bifrost will strip the prefix and match against the requested model:

    ```json theme={null}
    {
      "provider_configs": [
        {
          "provider": "openrouter",
          "allowed_models": ["openai/gpt-4o", "anthropic/claude-3-5-sonnet"],
          "weight": 1.0
        }
      ]
    }
    ```

    **How it works**:

    ```bash theme={null}
    # Request without prefix
    curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

    # 1. Checks: "openai/gpt-4o" in allowed_models
    # 2. Strips prefix: "openai/gpt-4o" → "gpt-4o"
    # 3. Compares: "gpt-4o" == "gpt-4o" ✅
    # 4. Result: Allowed and routed to OpenRouter
    ```

    This is particularly useful for proxy providers (OpenRouter, Vertex) where you want to explicitly control which upstream models are accessible.

    **Use Cases**:

    * Compliance requirements (only approved models)
    * Cost control (restrict to cheaper models)
    * Version pinning (prevent automatic updates)
    * Testing specific model versions
    * **Explicit cross-provider routing** (e.g., only allow OpenAI models via OpenRouter)
  </Tab>

  <Tab title="Aliases (Key-Level)">
    **Key Concept**: Aliases are **key-level** mappings that allow user-friendly model names to map to provider-specific identifiers.

    **How Aliases Work**:

    * Defined at the **Key level**, not Virtual Key level
    * Structure: `aliases: {"user-facing-name": "provider-specific-id"}`
    * **Alias key** (left side): User-facing model name used in requests
    * **Provider ID** (right side): Provider-specific identifier sent to the API

    **Azure OpenAI Example**:

    Provider configuration with alias mapping:

    ```json theme={null}
    {
      "providers": {
        "azure": {
          "keys": [
            {
              "name": "azure-prod-key",
              "value": "your-api-key",
              "aliases": {
                "gpt-4o": "my-prod-gpt4o-deployment",
                "gpt-4o-mini": "my-mini-deployment"
              },
              "azure_key_config": {
                "endpoint": "https://your-resource.openai.azure.com"
              }
            }
          ]
        }
      }
    }
    ```

    **What Happens**:

    1. **Allowed models derived from aliases**: `["gpt-4o", "gpt-4o-mini"]`
    2. **User requests with alias**: `{"model": "gpt-4o"}`
    3. **Bifrost validates**: `gpt-4o` is in derived allowed models ✅
    4. **Bifrost resolves alias**: `gpt-4o` → `my-prod-gpt4o-deployment`
    5. **Sent to Azure**: Uses `my-prod-gpt4o-deployment` as the deployment name
    6. **Pricing lookup**: If pricing for resolved ID not found, falls back to alias `gpt-4o`

    **Bedrock Example with Inference Profiles**:

    ```json theme={null}
    {
      "providers": {
        "bedrock": {
          "keys": [
            {
              "name": "bedrock-key",
              "aliases": {
                "claude-sonnet": "us.anthropic.claude-3-5-sonnet-20241022-v2:0",
                "claude-opus": "us.anthropic.claude-3-opus-20240229-v1:0"
              },
              "bedrock_key_config": {
                "access_key": "your-access-key",
                "secret_key": "your-secret-key",
                "region": "us-east-1"
              }
            }
          ]
        }
      }
    }
    ```

    **What Happens**:

    1. **Allowed models**: `["claude-sonnet", "claude-opus"]` (from alias keys)
    2. **User requests**: `{"model": "claude-sonnet"}`
    3. **Bifrost validates**: `claude-sonnet` in allowed models ✅
    4. **Resolves alias**: `claude-sonnet` → `us.anthropic.claude-3-5-sonnet-20241022-v2:0`
    5. **Sent to Bedrock**: Full ARN used in API call

    **Priority of Model Restrictions**:

    When determining allowed models for a key:

    ```
    1. If key.models is NOT empty → Use key.models
    2. Else if aliases exist → Use alias keys
    3. Else → All models allowed (use Model Catalog)
    ```

    **Example with Both**:

    ```json theme={null}
    {
      "keys": [
        {
          "models": ["gpt-4o", "gpt-3.5-turbo"], // Explicit restriction
          "aliases": {
            "gpt-4o": "my-deployment",
            "gpt-4-turbo": "another-deployment" // NOT accessible!
          },
          "azure_key_config": {
            "endpoint": "https://your-resource.openai.azure.com"
          }
        }
      ]
    }
    ```

    Result: Only `["gpt-4o", "gpt-3.5-turbo"]` allowed (models field takes priority)

    **Vertex Example** (similar pattern):

    ```json theme={null}
    {
      "keys": [
        {
          "aliases": {
            "claude-3-5-sonnet": "anthropic/claude-3-5-sonnet@20241022",
            "gemini-pro": "google/gemini-1.5-pro"
          },
          "vertex_key_config": {
            "project_id": "my-project",
            "region": "us-central1"
          }
        }
      ]
    }
    ```

    **Use Cases for Aliases**:

    * **Azure**: Map generic model names to specific deployment names in your Azure resource
    * **Bedrock**: Use short aliases for long inference profile ARNs
    * **Vertex**: Map to specific model versions or regional endpoints
    * **Multi-environment**: Different aliases per key (dev/staging/prod)

    **Key Insight**:

    ```
    User Request: {"model": "gpt-4o"}
                  ↓
    Validation: Check if "gpt-4o" in allowed models (derived from aliases)
                  ↓
    Mapping: aliases["gpt-4o"] → "my-prod-gpt4o-deployment"
                  ↓
    API Call: Uses "my-prod-gpt4o-deployment" as deployment ID
                  ↓
    Pricing: Falls back to "gpt-4o" if resolved ID not in pricing data
    ```

    This allows user-friendly model names in requests while supporting provider-specific identifier patterns at the key level.
  </Tab>

  <Tab title="Cross-Provider Model Routing">
    **Configuration**:

    ```json theme={null}
    {
      "provider_configs": [
        {
          "provider": "openai",
          "allowed_models": ["gpt-4o"],
          "weight": 0.5
        },
        {
          "provider": "azure",
          "allowed_models": ["gpt-4o"],
          "weight": 0.5
        }
      ]
    }
    ```

    **Request**:

    ```bash theme={null}
    curl -H "x-bf-vk: vk-123" \
         -d '{"model": "gpt-4o"}'
    ```

    **Routing Behavior**:

    1. **Model validation**: Both providers have `gpt-4o` in allowed\_models ✅
    2. **Weighted selection**: 50% chance each
    3. **Provider selected**: Let's say Azure
    4. **Model transformation**: `gpt-4o` → `azure/gpt-4o`
    5. **Fallbacks**: `["openai/gpt-4o"]` (remaining providers)

    **Special Cross-Provider Scenarios**:

    <Steps>
      <Step title="OpenRouter as Universal Proxy">
        ```json theme={null}
        {
          "provider_configs": [
            {
              "provider": "openrouter",
              "allowed_models": ["*"]
            }
          ]
        }
        ```

        Request `claude-3-5-sonnet`:

        * Bifrost checks: `GetModelsForProvider("openrouter")`
        * Finds: `anthropic/claude-3-5-sonnet` in OpenRouter catalog
        * ✅ Allowed, routes to OpenRouter
      </Step>

      <Step title="Weighted Routing via Proxy Provider">
        **Use Case**: Route 99% of OpenAI traffic through OpenRouter for cost savings, keep 1% direct for fallback

        ```json theme={null}
        {
          "provider_configs": [
            {
              "provider": "openai",
              "allowed_models": ["gpt-4o"],
              "weight": 0.01  // 1% direct to OpenAI
            },
            {
              "provider": "openrouter",
              "allowed_models": ["openai/gpt-4o"],  // Provider-prefixed
              "weight": 0.99  // 99% via OpenRouter
            }
          ]
        }
        ```

        Request `gpt-4o`:

        * **OpenAI check**: `"gpt-4o"` in `["gpt-4o"]` → ✅ Allowed
        * **OpenRouter check**: Strips prefix from `"openai/gpt-4o"` → matches `"gpt-4o"` → ✅ Allowed
        * **Weighted selection**: 99% chance → OpenRouter selected
        * **Final model**: `openrouter/gpt-4o`
        * **Fallbacks**: `["openai/gpt-4o"]` (1% provider as fallback)

        **Why this works**: Bifrost now supports provider-prefixed entries in `allowed_models`, so `"openai/gpt-4o"` matches requests for `"gpt-4o"`.
      </Step>

      <Step title="Vertex as Multi-Provider Gateway">
        ```json theme={null}
        {
          "provider_configs": [
            {
              "provider": "vertex",
              "allowed_models": ["claude-3-5-sonnet", "gemini-1.5-pro"]
            }
          ]
        }
        ```

        Request `claude-3-5-sonnet`:

        * Model catalog lookup: `GetProvidersForModel("claude-3-5-sonnet")`
        * Finds: `["anthropic", "vertex", "bedrock"]`
        * Validation: `claude-3-5-sonnet` in allowed\_models ✅
        * Sends to Vertex as: `anthropic/claude-3-5-sonnet`
      </Step>

      <Step title="Groq OpenAI Compatibility">
        ```json theme={null}
        {
          "provider_configs": [
            {
              "provider": "groq",
              "allowed_models": ["gpt-3.5-turbo"]
            }
          ]
        }
        ```

        Request `gpt-3.5-turbo`:

        * Special handling: Checks Groq catalog for `openai/gpt-3.5-turbo`
        * ✅ Found, validation passes
        * Sends to Groq as: `openai/gpt-3.5-turbo`
      </Step>
    </Steps>
  </Tab>
</Tabs>

### How It's Used in Routing

<Tabs>
  <Tab title="Governance Routing">
    When a Virtual Key has `provider_configs`, governance uses the model catalog for validation:

    **Wildcard allowed\_models Example**:

    ```json theme={null}
    {
      "provider_configs": [
        {
          "provider": "openai",
          "allowed_models": ["*"],
          "weight": 0.5
        }
      ]
    }
    ```

    **Request Flow**:

    ```bash theme={null}
    curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

    # 1. Governance checks: Is "gpt-4o" in GetModelsForProvider("openai")?
    # 2. Catalog lookup: modelPool["openai"] contains "gpt-4o" ✅
    # 3. Validation passes, provider selected
    # 4. Model becomes: "openai/gpt-4o"
    ```

    **Rejection Example**:

    ```bash theme={null}
    curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet"}'

    # 1. Governance checks: Is "claude-3-5-sonnet" in GetModelsForProvider("openai")?
    # 2. Catalog lookup: modelPool["openai"] does NOT contain "claude-3-5-sonnet" ❌
    # 3. Validation fails, request rejected
    # 4. Error: "model not allowed for any configured provider"
    ```
  </Tab>

  <Tab title="Load Balancing">
    When load balancing selects providers, it queries the catalog to find candidates:

    **Request Flow**:

    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -d '{"model": "gpt-4o", "messages": [...]}'

    # 1. Load balancer: GetProvidersForModel("gpt-4o")
    # 2. Catalog returns: ["openai", "azure", "groq"]
    # 3. Filter by configured providers: ["openai", "azure"]  (groq not configured)
    # 4. Performance scoring: openai=0.95, azure=0.87
    # 5. Select: openai (highest score)
    # 6. Model becomes: "openai/gpt-4o"
    # 7. Fallbacks: ["azure/gpt-4o"]
    ```

    **Cross-Provider Discovery**:

    ```bash theme={null}
    curl -d '{"model": "claude-3-5-sonnet"}'

    # 1. Load balancer: GetProvidersForModel("claude-3-5-sonnet")
    # 2. Catalog checks:
    #    - Direct: ["anthropic"] ✅
    #    - OpenRouter: Has "anthropic/claude-3-5-sonnet" ✅
    #    - Vertex: Has "anthropic/claude-3-5-sonnet" ✅
    #    - Bedrock: Has "anthropic.claude-3-5-sonnet-..." ✅
    # 3. Catalog returns: ["anthropic", "openrouter", "vertex", "bedrock"]
    # 4. Performance scoring across all four
    # 5. Best performer selected
    ```

    This is how Bifrost achieves **intelligent cross-provider routing** without manual configuration.
  </Tab>
</Tabs>

<Note>
  **Model Catalog is essential for cross-provider routing**. Without it, Bifrost
  wouldn't know that `gpt-4o` is available from OpenAI, Azure, and Groq, or that
  `claude-3-5-sonnet` can be routed through Anthropic, Vertex, Bedrock, and
  OpenRouter. This knowledge powers both governance validation and load
  balancing provider discovery.
</Note>

***

## Default Provider Resolution

<Info>
  Default provider resolution via model catalog is available in **Bifrost
  v1.5.0-prerelease7 and above**.
</Info>

When a request includes a bare model name without a `provider/` prefix (e.g., `"model": "gpt-4o"` instead of `"model": "openai/gpt-4o"`), Bifrost automatically resolves the provider using the Model Catalog. Note that this default behavior is applied **after all other routing engines** have run.

### How It Works

1. **Request arrives** without a provider prefix (e.g., `"model": "gpt-4o"`)
2. **Catalog lookup**: Bifrost calls `GetProvidersForModel("gpt-4o")` to find all providers that support the model
3. **Provider selected**: A provider from the catalog's available list is used (e.g., `openai`)
4. **Request continues**: The resolved `provider/model` string is used for load balancing and fallback handling

This is logged as the **`model-catalog`** routing engine in telemetry and routing logs, with a message like:

```
No provider specified for model gpt-4o, found 3 options in model catalog:
[openai, azure, groq], selecting first: openai
```

### Example

```bash theme={null}
# These two requests are equivalent when the model catalog
# maps gpt-4o → openai as the first provider:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
```

<Note>
  If the model catalog is not available or the model is not found in any
  provider, the request returns an error asking for the `provider/model` format.
  For deterministic provider selection, always use the explicit `provider/model`
  prefix.
</Note>

***

## Governance-based Routing

Governance-based routing allows you to explicitly define which providers and models should handle requests for a specific Virtual Key. This method provides precise control over routing decisions.

### How It Works

When a Virtual Key has `provider_configs` defined:

1. **Request arrives** with a Virtual Key (e.g., `x-bf-vk: vk-prod-main`)
2. **Model validation**: Bifrost checks if the requested model is allowed for any configured provider
3. **Provider filtering**: Providers are filtered based on:
   * Model availability in `allowed_models`
   * Budget limits (current usage vs max limit)
   * Rate limits (tokens/requests per time window)
4. **Weighted selection**: A provider is selected using weighted random distribution
5. **Provider prefix added**: Model string becomes `provider/model` (e.g., `openai/gpt-4o`)
6. **Fallbacks created**: Remaining providers sorted by weight (descending) are added as fallbacks

### Configuration Example

```json theme={null}
{
  "provider_configs": [
    {
      "provider": "openai",
      "allowed_models": ["gpt-4o", "gpt-4o-mini"],
      "weight": 0.3,
      "budget": {
        "max_limit": 100.0,
        "current_usage": 45.0
      }
    },
    {
      "provider": "azure",
      "allowed_models": ["gpt-4o"],
      "weight": 0.7,
      "rate_limit": {
        "token_max_limit": 100000,
        "token_reset_duration": "1m"
      }
    }
  ]
}
```

### Request Flow

<Steps>
  <Step title="Request with Virtual Key">
    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -H "x-bf-vk: vk-prod-main" \
      -d '{"model": "gpt-4o", "messages": [...]}'
    ```
  </Step>

  <Step title="Governance Evaluation">
    * OpenAI: ✅ Has `gpt-4o` in allowed\_models, budget OK, weight 0.3
    * Azure: ✅ Has `gpt-4o` in allowed\_models, rate limit OK, weight 0.7
  </Step>

  <Step title="Weighted Selection">
    * 70% chance → Azure
    * 30% chance → OpenAI
  </Step>

  <Step title="Request Transformation">
    ```json theme={null}
    {
      "model": "azure/gpt-4o",
      "messages": [...],
      "fallbacks": ["openai/gpt-4o"]
    }
    ```
  </Step>
</Steps>

### Key Features

| Feature                   | Description                                                   |
| ------------------------- | ------------------------------------------------------------- |
| **Explicit Control**      | Define exactly which providers and models are accessible      |
| **Budget Enforcement**    | Automatically exclude providers exceeding budget limits       |
| **Rate Limit Protection** | Skip providers that have hit rate limits                      |
| **Weighted Distribution** | Control traffic distribution with custom weights              |
| **Automatic Fallbacks**   | Failed providers automatically retry with next highest weight |

### Best Practices

<AccordionGroup>
  <Accordion title="Cost Optimization">
    Assign higher weights to cheaper providers for cost-sensitive workloads:

    ```json theme={null}
    {
      "provider_configs": [
        {"provider": "groq", "allowed_models": ["*"], "key_ids": ["*"], "weight": 0.7},
        {"provider": "openai", "allowed_models": ["*"], "key_ids": ["*"], "weight": 0.3}
      ]
    }
    ```
  </Accordion>

  <Accordion title="Environment Separation">
    Create different Virtual Keys for dev/staging/prod with different provider access:

    ```json theme={null}
    {
      "virtual_keys": [
        {
          "id": "vk-dev",
          "provider_configs": [{"provider": "ollama", "allowed_models": ["*"], "key_ids": ["*"]}]
        },
        {
          "id": "vk-prod",
          "provider_configs": [
            {"provider": "openai", "allowed_models": ["*"], "key_ids": ["*"]},
            {"provider": "azure", "allowed_models": ["*"], "key_ids": ["*"]}
          ]
        }
      ]
    }
    ```
  </Accordion>

  <Accordion title="Compliance & Data Residency">
    Restrict specific Virtual Keys to compliant providers:

    ```json theme={null}
    {
      "provider_configs": [
        {"provider": "azure", "allowed_models": ["gpt-4o"]},
        {"provider": "bedrock", "allowed_models": ["claude-3-sonnet-20240229"]}
      ]
    }
    ```
  </Accordion>
</AccordionGroup>

<Note>
  **`allowed_models: ["*"]`**: Allows all models supported by the provider, validated via the Model Catalog (populated from pricing data and the provider's list models API). See the [Model Catalog section](#the-model-catalog) above for how syncing works. For configuration instructions, see [Governance Routing](/features/governance/routing).

  **`allowed_models: []` (empty array)**: Denies **all** models - no requests will be served for this provider config. This is deny-by-default behavior introduced in v1.5.0.

  **Empty `provider_configs`**: When `provider_configs` is empty (no providers configured), **all providers are blocked** (deny-by-default). You must explicitly add provider configurations to allow traffic through a Virtual Key.
</Note>

***

## Adaptive Load Balancing

<Info>
  **Enterprise Feature**: Adaptive Load Balancing is available in Bifrost
  Enterprise. [Contact us](https://www.getmaxim.ai/bifrost/enterprise) to enable
  it.
</Info>

Adaptive Load Balancing automatically optimizes routing based on real-time performance metrics. It operates at **two levels** to provide both macro-level provider selection and micro-level key optimization.

### Two-Level Architecture

<Card title="Why Two Levels?" icon="layer-group">
  Separating provider selection (direction) from key selection (route) enables:

  * **Provider-level optimization**: Choose the best provider for a model based on aggregate performance
  * **Key-level optimization**: Within that provider, choose the best API key based on individual key performance
  * **Resilience**: Even when provider is specified (by governance or user), key-level load balancing still optimizes which API key to use
</Card>

```mermaid theme={null}
flowchart TB
    Request["Request: gpt-4o"]

    subgraph Level1["Level 1: Direction (Provider Selection)"]
        Cat["Model Catalog Lookup"]
        Providers["Candidate Providers:<br/>openai, azure, groq"]
        Filter["Filter by allowed_models<br/>and key availability"]
        Score["Score by performance:<br/>error rate, latency, utilization"]
        Select["Select: openai"]
    end

    subgraph Level2["Level 2: Route (Key Selection)"]
        Keys["Available OpenAI Keys:<br/>key-1, key-2, key-3"]
        KeyScore["Score each key:<br/>error rate, latency, TPM hits"]
        KeySelect["Select: key-2<br/>(best performing)"]
    end

    Request --> Cat --> Providers --> Filter --> Score --> Select
    Select --> Keys --> KeyScore --> KeySelect --> Response["Execute with<br/>openai/gpt-4o + key-2"]
```

### Level 1: Direction (Provider Selection)

**When it runs**: Only when the model string has **no** provider prefix (e.g., `gpt-4o`)

**How it works**:

1. **Model catalog lookup**: Find all configured providers that support the requested model
2. **Provider filtering**: Filter based on:
   * Allowed models from keys configuration
   * Keys availability for the provider
3. **Performance scoring**: Calculate scores for each provider based on:
   * Error rates (50% weight)
   * Latency (20% weight, using MV-TACOS algorithm)
   * Utilization (5% weight)
   * Momentum bias (recovery acceleration)
4. **Smart selection**: Choose provider using weighted random with jitter and exploration
5. **Fallbacks created**: Remaining providers sorted by performance score (descending) are added as fallbacks

### Level 2: Route (Key Selection)

**When it runs**: **Always**, even when provider is already specified (by governance, user, or Level 1)

**How it works**:

1. **Get available keys**: Fetch all keys for the selected provider
2. **Filter by configuration**: Apply model restrictions from key configuration
3. **Performance scoring**: Calculate score for each key based on:
   * Error rates (recent failures)
   * Latency (response time)
   * TPM hits (rate limit violations)
   * Current state (Healthy, Degraded, Failed, Recovering)
4. **Weighted random selection**: Choose key with exploration (25% chance to probe recovering keys)
5. **Circuit breaker**: Skip keys with zero weight (TPM hits, repeated failures)

### Scoring Algorithm

The load balancer computes a performance score for each provider-model combination:

$$
Score = (P_{error} \times 0.5) + (P_{latency} \times 0.2) + (P_{util} \times 0.05) - M_{momentum}
$$

<Tip>
  Lower penalties = Higher weights = More traffic. The system self-heals by
  quickly penalizing failing routes but enabling fast recovery once issues are
  resolved.
</Tip>

### Request Flow

<Steps>
  <Step title="Request without Provider Prefix">
    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -d '{"model": "gpt-4o", "messages": [...]}'
    ```
  </Step>

  <Step title="Model Catalog Lookup">
    Providers supporting `gpt-4o`: \[openai, azure, groq]
  </Step>

  <Step title="Performance Evaluation">
    * OpenAI: Score 0.92 (low latency, 99% success rate)
    * Azure: Score 0.85 (medium latency, 98% success rate)
    * Groq: Score 0.65 (high latency recently)
  </Step>

  <Step title="Provider Selection">
    OpenAI selected (highest score within jitter band)
  </Step>

  <Step title="Request Transformation">
    ```json theme={null}
    {
      "model": "openai/gpt-4o",
      "messages": [...],
      "fallbacks": ["azure/gpt-4o", "groq/gpt-4o"]
    }
    ```
  </Step>
</Steps>

### Key Features

| Feature                    | Description                                                         |
| -------------------------- | ------------------------------------------------------------------- |
| **Automatic Optimization** | No manual weight tuning required                                    |
| **Real-time Adaptation**   | Weights recomputed every 5 seconds based on live metrics            |
| **Circuit Breakers**       | Failing routes automatically removed from rotation                  |
| **Fast Recovery**          | 90% penalty reduction in 30 seconds after issues resolve            |
| **Health States**          | Routes transition between Healthy, Degraded, Failed, and Recovering |
| **Smart Exploration**      | 25% chance to probe potentially recovered routes                    |

### Dashboard Visibility

Monitor load balancing performance in real-time:

<Frame>
  <img src="https://mintcdn.com/bifrost/_ilYf7u7HJP58LQG/media/ui-load-balancing.png?fit=max&auto=format&n=_ilYf7u7HJP58LQG&q=85&s=ebca4e200b7528b855dd931d673dca5c" alt="Adaptive Load Balancing Dashboard" width="3492" height="2358" data-path="media/ui-load-balancing.png" />
</Frame>

The dashboard shows:

* Weight distribution across provider-model-key routes
* Performance metrics (error rates, latency, success rates)
* State transitions (Healthy → Degraded → Failed → Recovering)
* Actual vs expected traffic distribution

***

## How Governance and Load Balancing Interact

When both methods are available in your Bifrost deployment, they work together in a complementary way across two levels.

<Warning>
  **Key Insight**: Load balancing has **two levels**:

  * **Level 1 (Direction/Provider)**: Skipped when provider is already specified
  * **Level 2 (Route/Key)**: **Always runs**, even when provider is specified

  This means key-level optimization works regardless of how the provider was chosen!
</Warning>

### Execution Flow

```mermaid theme={null}
flowchart TD
    Start["Request: gpt-4o"]

    subgraph Governance["Governance Plugin (HTTPTransportIntercept)"]
        HasVK{"Has VK with<br/>provider_configs?"}
        GovRoute["Provider Selection:<br/>Weighted random"]
        AddPrefix["Add prefix:<br/>azure/gpt-4o"]
    end

    subgraph LB1["Load Balancer Level 1 (Middleware)"]
        PrefixCheck{"Has provider<br/>prefix?"}
        LBProvider["Provider Selection:<br/>Performance-based"]
        AddLBPrefix["Add prefix:<br/>openai/gpt-4o"]
    end

    subgraph LB2["Load Balancer Level 2 (Key Selector)"]
        GetKeys["Get available keys<br/>for selected provider"]
        ScoreKeys["Score keys by<br/>performance metrics"]
        SelectKey["Select best key"]
    end

    Start --> HasVK
    HasVK -->|Yes| GovRoute --> AddPrefix
    HasVK -->|No| PrefixCheck
    AddPrefix --> PrefixCheck
    PrefixCheck -->|Yes, skip Level 1| GetKeys
    PrefixCheck -->|No| LBProvider --> AddLBPrefix --> GetKeys
    GetKeys --> ScoreKeys --> SelectKey --> Execute["Execute request<br/>with selected provider + key"]
```

### Execution Order

1. **HTTPTransportIntercept** (Governance Plugin - Provider Level)
   * Runs first in the request pipeline
   * Checks if Virtual Key has `provider_configs`
   * If yes: adds provider prefix (e.g., `azure/gpt-4o`)
   * **Result**: Provider is selected by governance rules

2. **Middleware** (Load Balancing Plugin - Provider Level / Direction)
   * Runs after HTTPTransportIntercept
   * Checks if model string contains "/"
   * If yes: **skips provider selection** (already determined by governance or user)
   * If no: performs performance-based provider selection
   * **Result**: Provider prefix added if not already present

3. **KeySelector** (Load Balancing - Key Level / Route)
   * **Always runs** during request execution in Bifrost core
   * Gets all keys for the selected provider
   * Filters keys based on model restrictions
   * Scores each key by performance metrics
   * Selects best key using weighted random + exploration
   * **Result**: Optimal key selected within the provider

<Info>
  **Important**: Even when governance specifies `azure/gpt-4o`, load balancing
  **still optimizes which Azure key to use** based on performance metrics. This
  is the power of the two-level architecture!
</Info>

### Example Scenarios

<Tabs>
  <Tab title="Governance Only">
    **Setup:**

    * Virtual Key has `provider_configs` defined
    * No adaptive load balancing enabled

    **Request:**

    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -H "x-bf-vk: vk-prod-main" \
      -d '{"model": "gpt-4o", "messages": [...]}'
    ```

    **Behavior:**

    1. **Governance** applies weighted provider routing → selects Azure (70% weight)
    2. Model becomes `azure/gpt-4o`
    3. **Standard key selection** (non-adaptive) chooses an Azure key based on static weights
    4. Request forwarded to Azure with selected key
  </Tab>

  <Tab title="Load Balancing Only">
    **Setup:**

    * **No Virtual Key** (do not send `x-bf-vk`) → this is the **Load Balancing–only** setup
    * **Virtual Key with empty / missing `provider_configs`** → **blocks all providers** (deny-by-default) and therefore is **NOT** an LB-only setup
    * Adaptive load balancing enabled

    **Request:**

    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -d '{"model": "gpt-4o", "messages": [...]}'
    ```

    **Behavior:**

    1. **Load Balancing Level 1** applies performance-based provider routing → selects OpenAI (best performing)
    2. Model becomes `openai/gpt-4o`
    3. **Load Balancing Level 2** selects best OpenAI key based on performance metrics (error rate, latency, TPM status)
    4. Request forwarded to OpenAI with optimal key
  </Tab>

  <Tab title="Both Available (Governance + Load Balancing)">
    **Setup:**

    * Virtual Key has `provider_configs` defined
    * Adaptive load balancing enabled
    * Azure has 3 keys: `azure-key-1`, `azure-key-2`, `azure-key-3`

    **Request:**

    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -H "x-bf-vk: vk-prod-main" \
      -d '{"model": "gpt-4o", "messages": [...]}'
    ```

    **Behavior:**

    1. **Governance** applies first (respects explicit user config) → selects Azure provider
    2. Model becomes `azure/gpt-4o`
    3. **Load Balancing Level 1** sees "/" and **skips provider selection** (already decided)
    4. **Load Balancing Level 2** still runs! Selects best Azure key based on performance:
       * `azure-key-1`: 99% success rate, 150ms avg latency → score 0.95
       * `azure-key-2`: 85% success rate, 200ms avg latency → score 0.60 (degraded)
       * `azure-key-3`: Hit TPM limit → score 0.0 (circuit broken)
       * **Selects `azure-key-1`** (highest score)
    5. Request forwarded to Azure with `azure-key-1`

    **Why?** Governance controls provider selection (explicit user intent), but load balancing still optimizes key selection (automatic performance optimization).
  </Tab>

  <Tab title="Manual Provider Selection">
    **Setup:**

    * Both governance and load balancing enabled
    * OpenAI has 2 keys available

    **Request:**

    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -d '{"model": "openai/gpt-4o", "messages": [...]}'
    ```

    **Behavior:**

    1. **Governance** sees "/" and skips
    2. **Load Balancing Level 1** sees "/" and **skips provider selection**
    3. **Load Balancing Level 2** still runs! Selects best OpenAI key based on current metrics
    4. Request forwarded to OpenAI with optimal key

    **Why?** User explicitly specified the provider, but key-level optimization still provides value by selecting the best-performing OpenAI key.
  </Tab>
</Tabs>

### Provider vs Key Selection Rules

| Scenario                          | Provider Selection                              | Key Selection                              |
| --------------------------------- | ----------------------------------------------- | ------------------------------------------ |
| VK with provider\_configs         | **Governance** (weighted random)                | **Standard** or **Adaptive** (if enabled)  |
| VK without provider\_configs + LB | **Blocked** (empty = no providers allowed)      | N/A                                        |
| No VK + LB                        | **Load Balancing Level 1** (performance)        | **Load Balancing Level 2** (performance)   |
| Model with provider prefix + LB   | **Skip** (already specified)                    | **Load Balancing Level 2** (performance) ✅ |
| No Load Balancing enabled         | **Governance** or **User** or **Model Catalog** | **Standard** (static weights)              |

<Note>
  **Critical Insight**:

  * **Provider selection** respects the hierarchy: Governance → Load Balancing Level 1 → User specification
  * **Key selection** runs independently and benefits from load balancing **even when provider is predetermined**

  This separation is what makes the two-level architecture so powerful!
</Note>

***

## Routing Rules (Dynamic Expression-Based Routing)

<Info>
  **Position in routing pipeline**: Routing Rules execute **before governance
  provider selection** and can override it. They are evaluated before adaptive
  load balancing, enabling dynamic provider/model overrides based on runtime
  conditions like headers, parameters, capacity metrics, and organizational
  hierarchy.
</Info>

### Overview

Routing Rules provide sophisticated, expression-based control over request routing using CEL expressions. Unlike governance routing (static weights), routing rules evaluate conditions dynamically at request time.

### When Routing Rules Execute

```mermaid theme={null}
flowchart TD
    Start["Request: model + provider"]

    subgraph Rules["1. Routing Rules Layer (Evaluated First)"]
        RuleMatch{"CEL Expression<br/>Matches?"}
        RuleDecision["Override:<br/>New provider/model/fallbacks"]
        NoMatch["No match:<br/>Continue to Governance"]
    end

    subgraph Gov["2. Governance Layer (if no routing rule matched)"]
        VKValidation["Virtual Key Validation"]
        GovRouting["Provider Governance Routing<br/>(weighted random)"]
    end

    subgraph LB["3. Load Balancing Layer"]
        LB1["Level 1: Provider Selection"]
        LB2["Level 2: Key Selection"]
    end

    Start --> RuleMatch
    RuleMatch -->|Yes| RuleDecision --> LB1
    RuleMatch -->|No| NoMatch --> VKValidation --> GovRouting --> LB1
    LB1 --> LB2 --> Execute["Execute with<br/>selected provider + key"]
```

### How It Works

1. **Routing rules evaluate first** in scope precedence order (VirtualKey → Team → Customer → Global)
2. **If a routing rule matches**: provider/model/fallbacks are overridden, governance provider\_configs are skipped
3. **If no routing rule matches**: governance provider selection runs (weighted random)
4. **Load balancing Level 1**: skipped if provider already determined (has "/" prefix)
5. **Load balancing Level 2** (key selection): always runs to select the best key within the determined provider

### Available CEL Variables

Routing rules access request context through CEL variables:

```cel theme={null}
// Request context
model                      // Requested model
provider                   // Current provider

// Headers and parameters (case-insensitive)
headers["x-tier"]          // Request header
params["region"]           // Query parameter

// Organization context
virtual_key_id             // VirtualKey ID
team_name                  // Team name
customer_id                // Customer ID

// Capacity metrics (0-100 percentage)
budget_used                // Budget usage %
tokens_used                // Token rate limit usage %
request                    // Request rate limit usage %
```

### Examples

#### Route based on user tier

```cel theme={null}
headers["x-tier"] == "premium"   // → openai/gpt-4o
```

#### Route to fallback when budget high

```cel theme={null}
budget_used > 85                 // → groq/llama-2 (cheaper)
```

#### Route by team

```cel theme={null}
team_name == "ml-research"       // → anthropic/claude-3-opus
```

#### Complex multi-condition routing

```cel theme={null}
headers["x-environment"] == "production" &&
tokens_used < 75 &&
team_name == "ai-platform"       // → openai/gpt-4o
```

### Scope Hierarchy

Rules are evaluated in organizational precedence order (first-match-wins):

```
1. VirtualKey scope (highest priority)
2. Team scope
3. Customer scope
4. Global scope (lowest priority)
```

Within each scope, rules are sorted by **priority** (ascending: 0 before 10).

### Key Features

| Feature                | Description                                                            |
| ---------------------- | ---------------------------------------------------------------------- |
| **CEL Expressions**    | Powerful, composable condition language with multiple operators        |
| **Scope Hierarchy**    | Rules at VirtualKey/Team/Customer/Global levels with proper precedence |
| **Dynamic Override**   | Override provider and/or model based on runtime conditions             |
| **Fallback Chains**    | Define multiple fallback providers for automatic failover              |
| **Priority Ordering**  | Lower priority evaluated first within same scope                       |
| **Capacity Awareness** | Access real-time budget and rate limit usage percentages               |

### Integration with Governance

Routing Rules execute **before** governance provider selection and can override it:

**If a routing rule matches**:

```
Routing Rules evaluate
                    ↓
Rule matches: budget_used > 85
                    ↓
Override: groq/llama-2 (cheaper provider)
                    ↓
Governance provider_configs SKIPPED
                    ↓
Load Balancing selects best key
```

**If no routing rule matches**:

```
Routing Rules evaluate
                    ↓
No matching rule
                    ↓
Governance decides: azure/gpt-4o (70% weight)
                    ↓
Load Balancing selects best key
```

**Key Insight**: Routing rules have higher precedence than governance provider\_configs. If a routing rule matches, governance provider\_configs are bypassed entirely.

### Integration with Load Balancing

Routing Rules work **before** load balancing:

```
Routing Rules decide: openai/gpt-4o
                    ↓
Load Balancing Level 1: Skipped (provider already determined)
                    ↓
Load Balancing Level 2: Selects best OpenAI key based on performance
```

Even when routing rules determine the provider, load balancing Level 2 still optimizes which API key to use within that provider.

### Use Cases

* **Tier-based routing**: Premium users → fast providers
* **Capacity failover**: High budget usage → cheaper providers
* **Team preferences**: Different teams → different providers
* **A/B testing**: Route subset of traffic to test models
* **Regional routing**: EU users → EU providers (data residency)
* **Complex logic**: Combine multiple conditions for sophisticated routing

### Dashboard & API

Routing rules can be configured through:

* **Dashboard**: Visual rule builder with CEL expression editor
* **API**: `POST /api/governance/routing-rules` and related endpoints
* **Scope**: Create rules at global, customer, team, or virtual key levels
* **Priority**: Order rules within scope with numeric priority

For complete documentation, see [Routing Rules Documentation](/providers/routing-rules).

***

## Choosing the Right Approach

1. **Use Governance When:**

   ✅ **Compliance requirements**: Need to ensure data stays in specific regions or providers
   ✅ **Cost optimization**: Want explicit control over traffic distribution to cheaper providers
   ✅ **Budget enforcement**: Need hard limits on spending per provider
   ✅ **Environment separation**: Different teams/apps need different provider access
   ✅ **Rate limit management**: Need to respect provider-specific rate limits

2. **Use Routing Rules When:**

   ✅ **Dynamic routing**: Route based on runtime request context (headers, parameters)
   ✅ **Capacity-aware routing**: Switch to fallback when budget/rate limits high
   ✅ **Organization-based routing**: Different rules for teams/customers
   ✅ **A/B testing**: Route subset of traffic to test new models
   ✅ **Complex conditions**: Multiple criteria (e.g., tier + capacity + team)

3. **Use Load Balancing When:**

   ✅ **Performance optimization**: Want automatic routing to best-performing providers
   ✅ **Minimal configuration**: Prefer hands-off operation with intelligent defaults
   ✅ **Dynamic workloads**: Traffic patterns change frequently
   ✅ **Automatic failover**: Need instant adaptation to provider issues
   ✅ **Multi-provider redundancy**: Want seamless provider switching based on availability

4. **Use All Three Together:**

   ✅ **Complete solution**: Governance provides base routing, routing rules add dynamic override, load balancing optimizes keys
   ✅ **Maximum flexibility**: Different Virtual Keys use different strategies (governance vs routing rules vs load balancing)
   ✅ **Enterprise deployments**: Complex organizations with multiple requirements per layer

***

## Additional Resources

<CardGroup cols={2}>
  <Card title="Governance Routing" icon="shield-check" href="/features/governance/routing">
    Configuration instructions for setting up governance routing via Virtual
    Keys (Web UI, API, config.json)
  </Card>

  <Card title="Routing Rules" icon="sliders" href="/providers/routing-rules">
    Dynamic, expression-based routing using CEL expressions for runtime
    conditions
  </Card>

  <Card title="Adaptive Load Balancing" icon="brain" href="/enterprise/adaptive-load-balancing">
    Technical implementation details: scoring algorithms, weight calculations,
    and performance characteristics
  </Card>

  <Card title="Virtual Keys" icon="key" href="/features/governance/virtual-keys">
    Learn how to create and configure Virtual Keys
  </Card>

  <Card title="Fallbacks" icon="arrow-rotate-right" href="/features/fallbacks">
    Understand how automatic fallbacks work across providers
  </Card>
</CardGroup>
