> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getbifrost.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Provider Routing

> Understand how Bifrost routes requests across AI providers using governance rules and adaptive load balancing.

## Overview

Bifrost offers two powerful methods for routing requests across AI providers, each serving different use cases:

1. **Governance-based Routing**: Explicit, user-defined routing rules configured via Virtual Keys
2. **Adaptive Load Balancing**: Automatic, performance-based routing powered by real-time metrics (Enterprise feature)

When both methods are available, **governance takes precedence** because users have explicitly defined their routing preferences through provider configurations on Virtual Keys.

<Info>
  **When to use which method:**

  * Use **Governance** when you need explicit control, compliance requirements, or specific cost optimization strategies
  * Use **Adaptive Load Balancing** for automatic performance optimization and minimal configuration overhead
</Info>

***

## The Model Catalog

The Model Catalog is Bifrost's central registry that tracks which models are available from which providers. It powers both governance-based routing and adaptive load balancing by maintaining an up-to-date mapping of models to providers.

<Info>
  **Architecture Documentation**: For detailed technical documentation on the
  Model Catalog implementation, including API reference, thread safety, and
  advanced usage patterns, see [Model Catalog
  Architecture](/architecture/framework/model-catalog).
</Info>

### Data Sources

The Model Catalog combines two data sources to maintain a comprehensive and up-to-date model registry:

1. **Pricing Data** (Primary source)
   * Downloaded from a remote URL (configurable, defaults to `https://getbifrost.ai/datasheet`)
   * Contains model names, pricing tiers, and provider mappings
   * Synced to database on startup and refreshed periodically (default: every 24 hours)
   * Used for cost calculation and initial model-to-provider mapping
   * **Stored as**: In-memory map `pricingData[model|provider|mode]` for O(1) lookups

2. **Provider List Models API** (Secondary source)
   * Calls each provider's `/v1/models` endpoint during startup
   * Enriches the catalog with provider-specific models and aliases
   * Re-fetched when providers are added/updated via API or dashboard
   * Adds models that may not be in pricing data yet (e.g., newly released models)
   * **Stored as**: In-memory map `modelPool[provider][]models`

<Info>
  **Why two sources?** Pricing data provides comprehensive model coverage with
  cost information, while the List Models API ensures you can use newly released
  models immediately without waiting for pricing data updates.
</Info>

### How Model Availability is Determined

Bifrost uses a sophisticated multi-step process to determine if a model is available for a provider:

<AccordionGroup>
  <Accordion title="GetModelsForProvider(provider)">
    **Purpose**: Find all models available for a specific provider

    **Lookup Process**:

    1. Check `modelPool[provider]` for direct matches
    2. Return all models in that provider's slice

    **Example**:

    ```go theme={null}
    models := GetModelsForProvider("openai")
    // Returns: ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-3.5-turbo", ...]
    ```

    **Used by**:

    * Routing Methods to validate `allowed_models`
    * Dashboard model selector dropdowns
    * API responses for `/v1/models?provider=openai`
  </Accordion>

  <Accordion title="GetProvidersForModel(model)">
    **Purpose**: Find all providers that support a specific model

    **Lookup Process**:

    1. **Direct lookup**: Check each provider's model list in `modelPool`
    2. **Cross-provider resolution**: Apply special handling for proxy providers

    **Special Cross-Provider Rules**:

    <Steps>
      <Step title="OpenRouter Format">
        If model is not found directly, check if `provider/model` exists in OpenRouter

        ```go theme={null}
        // Request: claude-3-5-sonnet
        // Checks: openrouter models for "anthropic/claude-3-5-sonnet"
        // Result: Adds "openrouter" to providers list
        ```
      </Step>

      <Step title="Vertex Format">
        If model is not found directly, check if `provider/model` exists in Vertex

        ```go theme={null}
        // Request: claude-3-5-sonnet
        // Checks: vertex models for "anthropic/claude-3-5-sonnet"
        // Result: Adds "vertex" to providers list
        ```
      </Step>

      <Step title="Groq OpenAI Compatibility">
        For GPT models, check if `openai/model` exists in Groq

        ```go theme={null}
        // Request: gpt-3.5-turbo
        // Checks: groq models for "openai/gpt-3.5-turbo"
        // Result: Adds "groq" to providers list
        ```
      </Step>

      <Step title="Bedrock Claude Models">
        For Claude models, check Bedrock with flexible matching

        ```go theme={null}
        // Request: claude-3-5-sonnet
        // Checks: bedrock models containing "claude-3-5-sonnet"
        // Matches: "anthropic.claude-3-5-sonnet-20240620-v1:0"
        // Result: Adds "bedrock" to providers list
        ```
      </Step>
    </Steps>

    **Example**:

    ```go theme={null}
    providers := GetProvidersForModel("claude-3-5-sonnet")
    // Returns: ["anthropic", "vertex", "bedrock", "openrouter"]
    // Even though the request was just "claude-3-5-sonnet"!
    ```

    **Used by**:

    * Load balancing to find candidate providers
    * Fallback generation
    * Model validation in requests
  </Accordion>

  <Accordion title="Pricing Lookup with Fallbacks">
    **Purpose**: Get pricing data for cost calculation and model validation

    **Lookup Key**: `model|provider|mode` (e.g., `gpt-4o|openai|chat`)

    **Fallback Chain**:

    1. **Primary lookup**: `model|provider|requestType`
    2. **Gemini → Vertex**: If Gemini not found, try Vertex with same model
    3. **Vertex format stripping**: For `provider/model`, strip prefix and retry
    4. **Bedrock prefix handling**: For Claude models, try with `anthropic.` prefix
    5. **Responses → Chat**: If Responses mode not found, try Chat mode

    **Example Flow**:

    ```go theme={null}
    // Request: claude-3-5-sonnet on Gemini (Responses API)

    // 1. Try: claude-3-5-sonnet|gemini|responses → Not found
    // 2. Try: claude-3-5-sonnet|vertex|responses → Not found
    // 3. Try: claude-3-5-sonnet|vertex|chat → ✅ Found!

    // Pricing returned from vertex/chat mode
    ```

    **Used by**:

    * Cost calculation for billing
    * Model validation during routing
    * Budget enforcement
  </Accordion>
</AccordionGroup>

### Syncing Behavior

<AccordionGroup>
  <Accordion title="Initial Sync (Startup)">
    When Bifrost starts, it performs a complete model catalog initialization:

    **Step-by-step process** (from `server.go:Bootstrap()`):

    <Steps>
      <Step title="Load Pricing Data">
        ```go theme={null}
        // 1. Download from URL
        pricingData := loadPricingFromURL(ctx)

        // 2. Store in database (if configStore available)
        configStore.CreateModelPrices(ctx, pricingData)

        // 3. Load into memory cache
        mc.pricingData = map[string]TableModelPricing{...}
        ```
      </Step>

      <Step title="Populate Initial Model Pool">
        ```go theme={null}
        // Build modelPool from pricing data
        mc.populateModelPoolFromPricingData()
        // Result: modelPool[provider] = [models from pricing]
        ```
      </Step>

      <Step title="Fetch Dynamic Models">
        ```go theme={null}
        // Call ListAllModels for all configured providers
        modelData, err := client.ListAllModels(ctx, nil)

        // Add results to model pool
        mc.AddModelDataToPool(modelData)
        // Result: modelPool enriched with provider-specific models
        ```
      </Step>

      <Step title="Handle Failures Gracefully">
        If list models API fails for a provider:

        ```json theme={null}
        {"level":"warn","message":"failed to list models for provider ollama: connection refused"}
        ```

        * Logged as warning, **does not stop startup**
        * Provider remains usable with models from pricing data
        * Can be manually refreshed later via API
      </Step>
    </Steps>

    **Result**: Bifrost is ready with a comprehensive model catalog combining both sources.
  </Accordion>

  <Accordion title="Ongoing Sync (Background)">
    While Bifrost is running, the catalog stays up-to-date through background workers:

    **Pricing Data Sync**:

    * Background worker runs every **1 hour** (ticker interval)
    * Checks if **24 hours** have elapsed since last sync (configurable)
    * If yes, downloads fresh pricing data and updates database + memory cache
    * Timer resets after successful sync

    **List Models API Sync**:
    Triggered by these events:

    1. **Provider Added**: When a new provider is configured
       ```bash theme={null}
       POST /api/v1/providers
       # Automatically calls ListModels for the new provider
       ```

    2. **Provider Updated**: When provider config changes (keys, endpoints, etc.)
       ```bash theme={null}
       PUT /api/v1/providers/{provider}
       # Refetches models to detect changes
       ```

    3. **Manual Refresh**: Via API endpoint
       ```bash theme={null}
       POST /api/v1/providers/{provider}/models/refetch
       # Explicitly refetches models for a provider
       ```

    4. **Manual Delete + Refetch**: Clear and reload models
       ```bash theme={null}
       DELETE /api/v1/providers/{provider}/models
       POST /api/v1/providers/{provider}/models/refetch
       # Useful when models are out of sync
       ```

    **Failure Handling**:

    * Pricing URL fails but database has data → Use cached database records
    * Pricing URL fails and no database data → Error logged, existing memory cache retained
    * List models API fails → Log warning, retain existing model pool entries
  </Accordion>

  <Accordion title="Fallback Strategy">
    Bifrost's multi-layered approach ensures high availability:

    **Layer 1: Pricing Data Persistence**

    ```
    URL fails → Database → Memory cache → Continue operation
    ```

    **Layer 2: Model Pool Redundancy**

    ```
    ListModels fails → Pricing data models → Continue with reduced catalog
    ```

    **Layer 3: Runtime Validation**

    ```
    Model not in catalog → Special cross-provider rules → May still work
    ```

    **Example Scenario**:

    ```
    Situation:
    - Pricing URL is down
    - OpenAI ListModels API is down
    - User requests gpt-4o on OpenAI

    Bifrost's Response:
    1. ✅ Pricing data available from database (last sync 12h ago)
    2. ✅ Model pool has gpt-4o from previous ListModels call
    3. ✅ Request proceeds normally
    4. 📊 Cost calculated from cached pricing data
    ```

    This design ensures **requests never fail due to sync issues** as long as one data source is available.
  </Accordion>
</AccordionGroup>

### Allowed Models Behavior with Examples

The `allowed_models` field in provider configs controls which models can be used with that provider. Understanding its behavior is crucial for governance routing.

<Tabs>
  <Tab title="Wildcard allowed_models (Use Catalog)">
    **Configuration**:

    ```json theme={null}
    {
      "provider_configs": [
        {
          "provider": "openai",
          "allowed_models": ["*"],
          "weight": 1.0
        }
      ]
    }
    ```

    **Behavior**:

    * Bifrost calls `GetModelsForProvider("openai")`
    * Returns all models in `modelPool["openai"]`
    * Request validated against catalog

    **Examples**:

    ```bash theme={null}
    # ✅ Allowed (in catalog)
    curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

    # ✅ Allowed (in catalog)
    curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-3.5-turbo"}'

    # ❌ Rejected (not in OpenAI catalog)
    curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet"}'
    ```

    **Use Cases**:

    * Default behavior for most deployments
    * Automatically stays up-to-date with provider's model offerings
    * No manual model list maintenance required

    <Warning>
      Using `"allowed_models": []` (empty array) means **deny all models** - no
      requests will be served. Use `["*"]` to allow all models via the catalog.
    </Warning>
  </Tab>

  <Tab title="Explicit allowed_models (Strict Control)">
    **Configuration**:

    ```json theme={null}
    {
      "provider_configs": [
        {
          "provider": "openai",
          "allowed_models": ["gpt-4o", "gpt-4o-mini"], // Only these two
          "weight": 1.0
        },
        {
          "provider": "anthropic",
          "allowed_models": ["claude-3-5-sonnet-20241022"], // Specific version
          "weight": 1.0
        }
      ]
    }
    ```

    **Behavior**:

    * Bifrost validates request model against explicit list
    * Catalog is **ignored** for this provider
    * Supports both direct matches and provider-prefixed entries
    * Case-sensitive matching

    **Examples**:

    ```bash theme={null}
    # ✅ Allowed (in explicit list)
    curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

    # ❌ Rejected (not in explicit list)
    curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4-turbo"}'
    # Even though gpt-4-turbo is in the OpenAI catalog!

    # ✅ Allowed (exact match for Anthropic)
    curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet-20241022"}'

    # ❌ Rejected (version mismatch)
    curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet-20240620"}'
    ```

    **Provider-Prefixed Entries**:

    You can also use provider-prefixed model names in `allowed_models`. Bifrost will strip the prefix and match against the requested model:

    ```json theme={null}
    {
      "provider_configs": [
        {
          "provider": "openrouter",
          "allowed_models": ["openai/gpt-4o", "anthropic/claude-3-5-sonnet"],
          "weight": 1.0
        }
      ]
    }
    ```

    **How it works**:

    ```bash theme={null}
    # Request without prefix
    curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

    # 1. Checks: "openai/gpt-4o" in allowed_models
    # 2. Strips prefix: "openai/gpt-4o" → "gpt-4o"
    # 3. Compares: "gpt-4o" == "gpt-4o" ✅
    # 4. Result: Allowed and routed to OpenRouter
    ```

    This is particularly useful for proxy providers (OpenRouter, Vertex) where you want to explicitly control which upstream models are accessible.

    **Use Cases**:

    * Compliance requirements (only approved models)
    * Cost control (restrict to cheaper models)
    * Version pinning (prevent automatic updates)
    * Testing specific model versions
    * **Explicit cross-provider routing** (e.g., only allow OpenAI models via OpenRouter)
  </Tab>

  <Tab title="Aliases (Key-Level)">
    **Key Concept**: Aliases are **key-level** mappings that allow user-friendly model names to map to provider-specific identifiers.

    **How Aliases Work**:

    * Defined at the **Key level**, not Virtual Key level
    * Structure: `aliases: {"user-facing-name": "provider-specific-id"}`
    * **Alias key** (left side): User-facing model name used in requests
    * **Provider ID** (right side): Provider-specific identifier sent to the API

    **Azure OpenAI Example**:

    Provider configuration with alias mapping:

    ```json theme={null}
    {
      "providers": {
        "azure": {
          "keys": [
            {
              "name": "azure-prod-key",
              "value": "your-api-key",
              "aliases": {
                "gpt-4o": "my-prod-gpt4o-deployment",
                "gpt-4o-mini": "my-mini-deployment"
              },
              "azure_key_config": {
                "endpoint": "https://your-resource.openai.azure.com"
              }
            }
          ]
        }
      }
    }
    ```

    **What Happens**:

    1. **Allowed models derived from aliases**: `["gpt-4o", "gpt-4o-mini"]`
    2. **User requests with alias**: `{"model": "gpt-4o"}`
    3. **Bifrost validates**: `gpt-4o` is in derived allowed models ✅
    4. **Bifrost resolves alias**: `gpt-4o` → `my-prod-gpt4o-deployment`
    5. **Sent to Azure**: Uses `my-prod-gpt4o-deployment` as the deployment name
    6. **Pricing lookup**: If pricing for resolved ID not found, falls back to alias `gpt-4o`

    **Bedrock Example with Inference Profiles**:

    ```json theme={null}
    {
      "providers": {
        "bedrock": {
          "keys": [
            {
              "name": "bedrock-key",
              "aliases": {
                "claude-sonnet": "us.anthropic.claude-3-5-sonnet-20241022-v2:0",
                "claude-opus": "us.anthropic.claude-3-opus-20240229-v1:0"
              },
              "bedrock_key_config": {
                "access_key": "your-access-key",
                "secret_key": "your-secret-key",
                "region": "us-east-1"
              }
            }
          ]
        }
      }
    }
    ```

    **What Happens**:

    1. **Allowed models**: `["claude-sonnet", "claude-opus"]` (from alias keys)
    2. **User requests**: `{"model": "claude-sonnet"}`
    3. **Bifrost validates**: `claude-sonnet` in allowed models ✅
    4. **Resolves alias**: `claude-sonnet` → `us.anthropic.claude-3-5-sonnet-20241022-v2:0`
    5. **Sent to Bedrock**: Full ARN used in API call

    **Priority of Model Restrictions**:

    When determining allowed models for a key:

    ```
    1. If key.models is NOT empty → Use key.models
    2. Else if aliases exist → Use alias keys
    3. Else → All models allowed (use Model Catalog)
    ```

    **Example with Both**:

    ```json theme={null}
    {
      "keys": [
        {
          "models": ["gpt-4o", "gpt-3.5-turbo"], // Explicit restriction
          "aliases": {
            "gpt-4o": "my-deployment",
            "gpt-4-turbo": "another-deployment" // NOT accessible!
          },
          "azure_key_config": {
            "endpoint": "https://your-resource.openai.azure.com"
          }
        }
      ]
    }
    ```

    Result: Only `["gpt-4o", "gpt-3.5-turbo"]` allowed (models field takes priority)

    **Vertex Example** (similar pattern):

    ```json theme={null}
    {
      "keys": [
        {
          "aliases": {
            "claude-3-5-sonnet": "anthropic/claude-3-5-sonnet@20241022",
            "gemini-pro": "google/gemini-1.5-pro"
          },
          "vertex_key_config": {
            "project_id": "my-project",
            "region": "us-central1"
          }
        }
      ]
    }
    ```

    **Use Cases for Aliases**:

    * **Azure**: Map generic model names to specific deployment names in your Azure resource
    * **Bedrock**: Use short aliases for long inference profile ARNs
    * **Vertex**: Map to specific model versions or regional endpoints
    * **Multi-environment**: Different aliases per key (dev/staging/prod)

    **Key Insight**:

    ```
    User Request: {"model": "gpt-4o"}
                  ↓
    Validation: Check if "gpt-4o" in allowed models (derived from aliases)
                  ↓
    Mapping: aliases["gpt-4o"] → "my-prod-gpt4o-deployment"
                  ↓
    API Call: Uses "my-prod-gpt4o-deployment" as deployment ID
                  ↓
    Pricing: Falls back to "gpt-4o" if resolved ID not in pricing data
    ```

    This allows user-friendly model names in requests while supporting provider-specific identifier patterns at the key level.
  </Tab>

  <Tab title="Cross-Provider Model Routing">
    **Configuration**:

    ```json theme={null}
    {
      "provider_configs": [
        {
          "provider": "openai",
          "allowed_models": ["gpt-4o"],
          "weight": 0.5
        },
        {
          "provider": "azure",
          "allowed_models": ["gpt-4o"],
          "weight": 0.5
        }
      ]
    }
    ```

    **Request**:

    ```bash theme={null}
    curl -H "x-bf-vk: vk-123" \
         -d '{"model": "gpt-4o"}'
    ```

    **Routing Behavior**:

    1. **Model validation**: Both providers have `gpt-4o` in allowed\_models ✅
    2. **Weighted selection**: 50% chance each
    3. **Provider selected**: Let's say Azure
    4. **Model transformation**: `gpt-4o` → `azure/gpt-4o`
    5. **Fallbacks**: `["openai/gpt-4o"]` (remaining providers)

    **Special Cross-Provider Scenarios**:

    <Steps>
      <Step title="OpenRouter as Universal Proxy">
        ```json theme={null}
        {
          "provider_configs": [
            {
              "provider": "openrouter",
              "allowed_models": ["*"]
            }
          ]
        }
        ```

        Request `claude-3-5-sonnet`:

        * Bifrost checks: `GetModelsForProvider("openrouter")`
        * Finds: `anthropic/claude-3-5-sonnet` in OpenRouter catalog
        * ✅ Allowed, routes to OpenRouter
      </Step>

      <Step title="Weighted Routing via Proxy Provider">
        **Use Case**: Route 99% of OpenAI traffic through OpenRouter for cost savings, keep 1% direct for fallback

        ```json theme={null}
        {
          "provider_configs": [
            {
              "provider": "openai",
              "allowed_models": ["gpt-4o"],
              "weight": 0.01  // 1% direct to OpenAI
            },
            {
              "provider": "openrouter",
              "allowed_models": ["openai/gpt-4o"],  // Provider-prefixed
              "weight": 0.99  // 99% via OpenRouter
            }
          ]
        }
        ```

        Request `gpt-4o`:

        * **OpenAI check**: `"gpt-4o"` in `["gpt-4o"]` → ✅ Allowed
        * **OpenRouter check**: Strips prefix from `"openai/gpt-4o"` → matches `"gpt-4o"` → ✅ Allowed
        * **Weighted selection**: 99% chance → OpenRouter selected
        * **Final model**: `openrouter/gpt-4o`
        * **Fallbacks**: `["openai/gpt-4o"]` (1% provider as fallback)

        **Why this works**: Bifrost now supports provider-prefixed entries in `allowed_models`, so `"openai/gpt-4o"` matches requests for `"gpt-4o"`.
      </Step>

      <Step title="Vertex as Multi-Provider Gateway">
        ```json theme={null}
        {
          "provider_configs": [
            {
              "provider": "vertex",
              "allowed_models": ["claude-3-5-sonnet", "gemini-1.5-pro"]
            }
          ]
        }
        ```

        Request `claude-3-5-sonnet`:

        * Model catalog lookup: `GetProvidersForModel("claude-3-5-sonnet")`
        * Finds: `["anthropic", "vertex", "bedrock"]`
        * Validation: `claude-3-5-sonnet` in allowed\_models ✅
        * Sends to Vertex as: `anthropic/claude-3-5-sonnet`
      </Step>

      <Step title="Groq OpenAI Compatibility">
        ```json theme={null}
        {
          "provider_configs": [
            {
              "provider": "groq",
              "allowed_models": ["gpt-3.5-turbo"]
            }
          ]
        }
        ```

        Request `gpt-3.5-turbo`:

        * Special handling: Checks Groq catalog for `openai/gpt-3.5-turbo`
        * ✅ Found, validation passes
        * Sends to Groq as: `openai/gpt-3.5-turbo`
      </Step>
    </Steps>
  </Tab>
</Tabs>

### How It's Used in Routing

<Tabs>
  <Tab title="Governance Routing">
    When a Virtual Key has `provider_configs`, governance uses the model catalog for validation:

    **Wildcard allowed\_models Example**:

    ```json theme={null}
    {
      "provider_configs": [
        {
          "provider": "openai",
          "allowed_models": ["*"],
          "weight": 0.5
        }
      ]
    }
    ```

    **Request Flow**:

    ```bash theme={null}
    curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

    # 1. Governance checks: Is "gpt-4o" in GetModelsForProvider("openai")?
    # 2. Catalog lookup: modelPool["openai"] contains "gpt-4o" ✅
    # 3. Validation passes, provider selected
    # 4. Model becomes: "openai/gpt-4o"
    ```

    **Rejection Example**:

    ```bash theme={null}
    curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet"}'

    # 1. Governance checks: Is "claude-3-5-sonnet" in GetModelsForProvider("openai")?
    # 2. Catalog lookup: modelPool["openai"] does NOT contain "claude-3-5-sonnet" ❌
    # 3. Validation fails, request rejected
    # 4. Error: "model not allowed for any configured provider"
    ```
  </Tab>

  <Tab title="Load Balancing">
    When load balancing selects providers, it queries the catalog to find candidates:

    **Request Flow**:

    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -d '{"model": "gpt-4o", "messages": [...]}'

    # 1. Load balancer: GetProvidersForModel("gpt-4o")
    # 2. Catalog returns: ["openai", "azure", "groq"]
    # 3. Filter by configured providers: ["openai", "azure"]  (groq not configured)
    # 4. Performance scoring: openai=0.95, azure=0.87
    # 5. Select: openai (highest score)
    # 6. Model becomes: "openai/gpt-4o"
    # 7. Fallbacks: ["azure/gpt-4o"]
    ```

    **Cross-Provider Discovery**:

    ```bash theme={null}
    curl -d '{"model": "claude-3-5-sonnet"}'

    # 1. Load balancer: GetProvidersForModel("claude-3-5-sonnet")
    # 2. Catalog checks:
    #    - Direct: ["anthropic"] ✅
    #    - OpenRouter: Has "anthropic/claude-3-5-sonnet" ✅
    #    - Vertex: Has "anthropic/claude-3-5-sonnet" ✅
    #    - Bedrock: Has "anthropic.claude-3-5-sonnet-..." ✅
    # 3. Catalog returns: ["anthropic", "openrouter", "vertex", "bedrock"]
    # 4. Performance scoring across all four
    # 5. Best performer selected
    ```

    This is how Bifrost achieves **intelligent cross-provider routing** without manual configuration.
  </Tab>
</Tabs>

<Note>
  **Model Catalog is essential for cross-provider routing**. Without it, Bifrost
  wouldn't know that `gpt-4o` is available from OpenAI, Azure, and Groq, or that
  `claude-3-5-sonnet` can be routed through Anthropic, Vertex, Bedrock, and
  OpenRouter. This knowledge powers both governance validation and load
  balancing provider discovery.
</Note>

***

## Default Provider Resolution

<Info>
  Default provider resolution via model catalog is available in **Bifrost
  v1.5.0-prerelease7 and above**.
</Info>

When a request includes a bare model name without a `provider/` prefix (e.g., `"model": "gpt-4o"` instead of `"model": "openai/gpt-4o"`), Bifrost automatically resolves the provider using the Model Catalog. This default behavior is applied **after all other routing engines** have run — the built-in `model-catalog-resolver` PreRequestHook plugin is registered as the last routing layer (order 9 within `builtin`), so governance routing rules, VK load balancing, and enterprise LB all get first crack.

### How It Works

1. **Request arrives** without a provider prefix (e.g., `"model": "gpt-4o"`)
2. Governance, VK LB, and enterprise LB all run first; if any of them sets `req.Provider`, the resolver no-ops
3. **Catalog lookup** (if `req.Provider` is still empty): Bifrost calls `GetProvidersForModel("gpt-4o")` to find all providers that support the model
4. **Provider selected**: If the request came in via an integration route (OpenAI / Anthropic / GenAI / Bedrock / Cohere) and the catalog includes that integration's canonical provider in the candidate list, it is preferred. Otherwise the first candidate is selected.
5. **Request continues**: The resolved `provider/model` is used for the provider call, fallback handling, and Level 2 key selection.

This is logged as the **`model-catalog`** routing engine in telemetry and routing logs, with a message like:

```
No provider specified for model gpt-4o, found 3 options in model catalog:
[openai, azure, groq], selected: openai
```

### Example

```bash theme={null}
# These two requests are equivalent when the model catalog
# maps gpt-4o → openai as the first provider:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
```

<Note>
  If the model catalog is not available or the model is not found in any
  provider, the request returns an error asking for the `provider/model` format.
  For deterministic provider selection, always use the explicit `provider/model`
  prefix.
</Note>

### Routing allowlist enforcement

When a Virtual Key has `provider_configs`, governance publishes the VK's allowed-provider set to the request context (`BifrostContextKeyRoutingAllowedProviders`). The constraint is then enforced at **two levels**:

1. **Cooperative filtering (observability-first):** Enterprise LB and the model-catalog-resolver intersect their catalog candidates with the allowlist before picking a provider. This produces clean routing-engine logs explaining *why* a candidate was excluded ("filtered N catalog candidates by routing allowlist").

2. **Hard enforcement in core:** After all `PreRequestHook` plugins have run, the core validates the final `req.Provider` against the allowlist. If `req.Provider` isn't in the allowlist, the request fails with HTTP 400. Fallbacks that target non-allowed providers are silently filtered out.

**Why two levels:** cooperative filtering surfaces routing decisions in observability; core enforcement makes the constraint a *guarantee* that no plugin (or user-specified `provider/model` prefix) can bypass. A user request for `model: "anthropic/claude-3"` against a VK that allows only `[openai, azure]` is rejected by core enforcement even though the user provided an explicit prefix.

Custom routing plugins can set the same context key to constrain downstream routing for any reason — geo restrictions, A/B test cohorts, tier-based gating, etc. The semantics are **fail-closed**: setting `BifrostContextKeyRoutingAllowedProviders` to an empty slice means "no provider is permitted for this request" → HTTP 400.

***

## Governance-based Routing

Governance-based routing allows you to explicitly define which providers and models should handle requests for a specific Virtual Key. This method provides precise control over routing decisions.

### How It Works

When a Virtual Key has `provider_configs` defined:

1. **Request arrives** with a Virtual Key (e.g., `x-bf-vk: vk-prod-main`)
2. **Model validation**: Bifrost checks if the requested model is allowed for any configured provider
3. **Provider filtering**: Providers are filtered based on:
   * Model availability in `allowed_models`
   * Budget limits (current usage vs max limit)
   * Rate limits (tokens/requests per time window)
4. **Weighted selection**: A provider is selected using weighted random distribution
5. **Provider prefix added**: Model string becomes `provider/model` (e.g., `openai/gpt-4o`)
6. **Fallbacks created**: Remaining providers sorted by weight (descending) are added as fallbacks

### Configuration Example

```json theme={null}
{
  "provider_configs": [
    {
      "provider": "openai",
      "allowed_models": ["gpt-4o", "gpt-4o-mini"],
      "weight": 0.3,
      "budget": {
        "max_limit": 100.0,
        "current_usage": 45.0
      }
    },
    {
      "provider": "azure",
      "allowed_models": ["gpt-4o"],
      "weight": 0.7,
      "rate_limit": {
        "token_max_limit": 100000,
        "token_reset_duration": "1m"
      }
    }
  ]
}
```

### Request Flow

<Steps>
  <Step title="Request with Virtual Key">
    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -H "x-bf-vk: vk-prod-main" \
      -d '{"model": "gpt-4o", "messages": [...]}'
    ```
  </Step>

  <Step title="Governance Evaluation">
    * OpenAI: ✅ Has `gpt-4o` in allowed\_models, budget OK, weight 0.3
    * Azure: ✅ Has `gpt-4o` in allowed\_models, rate limit OK, weight 0.7
  </Step>

  <Step title="Weighted Selection">
    * 70% chance → Azure
    * 30% chance → OpenAI
  </Step>

  <Step title="Request Transformation">
    ```json theme={null}
    {
      "model": "azure/gpt-4o",
      "messages": [...],
      "fallbacks": ["openai/gpt-4o"]
    }
    ```
  </Step>
</Steps>

### Key Features

| Feature                   | Description                                                   |
| ------------------------- | ------------------------------------------------------------- |
| **Explicit Control**      | Define exactly which providers and models are accessible      |
| **Budget Enforcement**    | Automatically exclude providers exceeding budget limits       |
| **Rate Limit Protection** | Skip providers that have hit rate limits                      |
| **Weighted Distribution** | Control traffic distribution with custom weights              |
| **Automatic Fallbacks**   | Failed providers automatically retry with next highest weight |

### Best Practices

<AccordionGroup>
  <Accordion title="Cost Optimization">
    Assign higher weights to cheaper providers for cost-sensitive workloads:

    ```json theme={null}
    {
      "provider_configs": [
        {"provider": "groq", "allowed_models": ["*"], "key_ids": ["*"], "weight": 0.7},
        {"provider": "openai", "allowed_models": ["*"], "key_ids": ["*"], "weight": 0.3}
      ]
    }
    ```
  </Accordion>

  <Accordion title="Environment Separation">
    Create different Virtual Keys for dev/staging/prod with different provider access:

    ```json theme={null}
    {
      "virtual_keys": [
        {
          "id": "vk-dev",
          "provider_configs": [{"provider": "ollama", "allowed_models": ["*"], "key_ids": ["*"]}]
        },
        {
          "id": "vk-prod",
          "provider_configs": [
            {"provider": "openai", "allowed_models": ["*"], "key_ids": ["*"]},
            {"provider": "azure", "allowed_models": ["*"], "key_ids": ["*"]}
          ]
        }
      ]
    }
    ```
  </Accordion>

  <Accordion title="Compliance & Data Residency">
    Restrict specific Virtual Keys to compliant providers:

    ```json theme={null}
    {
      "provider_configs": [
        {"provider": "azure", "allowed_models": ["gpt-4o"]},
        {"provider": "bedrock", "allowed_models": ["claude-3-sonnet-20240229"]}
      ]
    }
    ```
  </Accordion>
</AccordionGroup>

<Note>
  **`allowed_models: ["*"]`**: Allows all models supported by the provider, validated via the Model Catalog (populated from pricing data and the provider's list models API). See the [Model Catalog section](#the-model-catalog) above for how syncing works. For configuration instructions, see [Governance Routing](/features/governance/routing).

  **`allowed_models: []` (empty array)**: Denies **all** models - no requests will be served for this provider config. This is deny-by-default behavior introduced in v1.5.0.

  **Empty `provider_configs`**: When `provider_configs` is empty (no providers configured), **all providers are blocked** (deny-by-default). You must explicitly add provider configurations to allow traffic through a Virtual Key.
</Note>

***

## Adaptive Load Balancing

<Info>
  **Enterprise Feature**: Adaptive Load Balancing is available in Bifrost
  Enterprise. [Contact us](https://www.getmaxim.ai/bifrost/enterprise) to enable
  it.
</Info>

Adaptive Load Balancing automatically optimizes routing based on real-time performance metrics. It operates at **two levels** to provide both macro-level provider selection and micro-level key optimization.

### Two-Level Architecture

<Card title="Why Two Levels?" icon="layer-group">
  Separating provider selection (direction) from key selection (route) enables:

  * **Provider-level optimization**: Choose the best provider for a model based on aggregate performance
  * **Key-level optimization**: Within that provider, choose the best API key based on individual key performance
  * **Resilience**: Even when provider is specified (by governance or user), key-level load balancing still optimizes which API key to use
</Card>

```mermaid theme={null}
flowchart TB
    Request["Request: gpt-4o"]

    subgraph Level1["Level 1: Direction (Provider Selection)"]
        Cat["Model Catalog Lookup"]
        Providers["Candidate Providers:<br/>openai, azure, groq"]
        Filter["Filter by allowed_models<br/>and key availability"]
        Score["Score by performance:<br/>error rate, latency"]
        Select["Select: openai"]
    end

    subgraph Level2["Level 2: Route (Key Selection)"]
        Keys["Available OpenAI Keys:<br/>key-1, key-2, key-3"]
        KeyScore["Score each key:<br/>error rate, latency, TPM hits"]
        KeySelect["Select: key-2<br/>(best performing)"]
    end

    Request --> Cat --> Providers --> Filter --> Score --> Select
    Select --> Keys --> KeyScore --> KeySelect --> Response["Execute with<br/>openai/gpt-4o + key-2"]
```

### Level 1: Direction (Provider Selection)

**When it runs**: Only when no provider has been selected yet — i.e. the request has no `provider/` prefix and no earlier layer (governance, routing rules) pinned one

**How it works**:

1. **Model catalog lookup**: Find all configured providers that support the requested model
2. **Provider filtering**: Filter based on:
   * Allowed models from keys configuration
   * Keys availability for the provider
3. **Performance scoring**: Score each provider on its recent, realized performance for the model:
   * **Error rate** — the primary, time-decayed signal
   * **Token-aware latency** — secondary, comparing the provider both to its peers for the model and to its own recent baseline
4. **Smooth selection**: Concentrate traffic on the best-scoring providers while keeping a small exploration share for the rest, so a recovered provider keeps getting re-probed
5. **Fallbacks created**: Remaining healthy providers sorted by performance score (descending) are added as fallbacks

### Level 2: Route (Key Selection)

**When it runs**: **Always**, even when provider is already specified (by governance, user, or Level 1)

**How it works**:

1. **Get available keys**: Fetch all keys for the selected provider
2. **Filter by configuration**: Apply model restrictions from key configuration
3. **Performance scoring**: Calculate score for each key based on:
   * Error rates (recent failures)
   * Latency (response time)
   * TPM hits (rate limit violations)
   * Current state (Healthy, Degraded, Failed, Recovering)
4. **Smooth weighted selection**: Concentrate traffic on higher-weight keys, with a small dedicated probe budget reserved for recovering keys so they can prove recovery
5. **Circuit breaker**: Skip keys with zero weight (TPM hits, repeated failures)

### Scoring

Every 5 seconds the load balancer recomputes a weight for each route from its recent performance. Three signals drive the score, in priority order:

* **Error rate** — the primary, time-decayed signal
* **Token-aware latency** — secondary; a route is compared both to its peers and to its own recent baseline
* **Utilization** — a minor fair-share nudge that discourages overloading any single key

Which signals apply depends on the route's health: healthy routes are scored mainly on errors and latency, while routes that are actively recovering are scored on latency and recovery progress so they aren't held back by stale error history. Provider-level (Level 1) selection scores on error rate and latency only.

<Tip>
  Lower penalties = higher weights = more traffic. The system self-heals by
  quickly penalizing failing routes but decaying those penalties fast once issues
  resolve, so a recovered route returns to full traffic within seconds.
</Tip>

### Request Flow

<Steps>
  <Step title="Request without Provider Prefix">
    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -d '{"model": "gpt-4o", "messages": [...]}'
    ```
  </Step>

  <Step title="Model Catalog Lookup">
    Providers supporting `gpt-4o`: \[openai, azure, groq]
  </Step>

  <Step title="Performance Evaluation">
    * OpenAI: Score 0.92 (low latency, 99% success rate)
    * Azure: Score 0.85 (medium latency, 98% success rate)
    * Groq: Score 0.65 (high latency recently)
  </Step>

  <Step title="Provider Selection">
    OpenAI selected (highest performance score; a small exploration share is still kept for the others)
  </Step>

  <Step title="Request Transformation">
    ```json theme={null}
    {
      "model": "openai/gpt-4o",
      "messages": [...],
      "fallbacks": ["azure/gpt-4o", "groq/gpt-4o"]
    }
    ```
  </Step>
</Steps>

### Key Features

| Feature                    | Description                                                                               |
| -------------------------- | ----------------------------------------------------------------------------------------- |
| **Automatic Optimization** | No manual weight tuning required                                                          |
| **Real-time Adaptation**   | Weights recomputed every 5 seconds based on live metrics                                  |
| **Circuit Breakers**       | Failing routes automatically removed from rotation                                        |
| **Fast Recovery**          | 90% penalty reduction in 30 seconds after issues resolve                                  |
| **Health States**          | Routes transition between Healthy, Degraded, Failed, and Recovering                       |
| **Smart Exploration**      | Keeps a small, dedicated share of traffic on recovering routes so they can prove recovery |

### Dashboard Visibility

Monitor load balancing performance in real-time:

<Frame>
  <img src="https://mintlify.s3.us-west-1.amazonaws.com/bifrost/media/ui-load-balancing.png" alt="Adaptive Load Balancing Dashboard" />
</Frame>

The dashboard shows:

* Weight distribution across provider-model-key routes
* Performance metrics (error rates, latency, success rates)
* State transitions (Healthy → Degraded → Failed → Recovering)
* Actual vs expected traffic distribution

<Note>
  **Scope & tuning**: Adaptive load balancing operates **per node** (each node routes on its own observed metrics; only rate-limit/TPM backoffs are shared across nodes, and only within a region) and adapts on a **\~5-second cycle** — distinct from per-request fallback, which is immediate. Both levels can be toggled independently, and the scoring parameters ship pre-tuned. See [Adaptive Load Balancing](/enterprise/adaptive-load-balancing) for configuration and limitations.
</Note>

***

## How Governance and Load Balancing Interact

When both methods are available in your Bifrost deployment, they work together in a complementary way across two levels.

<Warning>
  **Key Insight**: Load balancing has **two levels**:

  * **Level 1 (Direction/Provider)**: Skipped when provider is already specified
  * **Level 2 (Route/Key)**: **Always runs**, even when provider is specified

  This means key-level optimization works regardless of how the provider was chosen!
</Warning>

### Execution Flow

```mermaid theme={null}
flowchart TD
    Start["Request: gpt-4o"]

    subgraph PreReq["PreRequestHook Phase (once per request, registration order)"]
        HasVK{"Has VK with<br/>provider_configs?"}
        GovRoute["Governance:<br/>Routing rules + VK weighted random"]
        AddPrefix["Set req.Provider/Model:<br/>azure/gpt-4o"]
        PrefixCheck{"req.Provider<br/>already set?"}
        LBProvider["Enterprise LB:<br/>Performance-based selection"]
        AddLBPrefix["Set req.Provider/Model:<br/>openai/gpt-4o"]
        Resolver["model-catalog-resolver:<br/>Fill from catalog (last fallback)"]
    end

    subgraph LB2["Load Balancer Level 2 (Key Selector, in core)"]
        GetKeys["Get available keys<br/>for selected provider"]
        ScoreKeys["Score keys by<br/>performance metrics"]
        SelectKey["Select best key"]
    end

    Start --> HasVK
    HasVK -->|Yes| GovRoute --> AddPrefix --> PrefixCheck
    HasVK -->|No| PrefixCheck
    PrefixCheck -->|Yes, skip LB Level 1| Resolver
    PrefixCheck -->|No| LBProvider --> AddLBPrefix --> Resolver
    Resolver --> GetKeys
    GetKeys --> ScoreKeys --> SelectKey --> Execute["Execute request<br/>with selected provider + key"]
```

### Execution Order

All three routing layers (governance, enterprise LB Level 1, model-catalog-resolver) now run inside a single **PreRequestHook** phase that fires **once per top-level request**, before any provider call and before per-attempt hooks. Within that phase, plugins execute in placement + order:

1. **Governance Plugin** (PreRequestHook, builtin order 4)
   * Evaluates routing rules (CEL expressions, scope hierarchy)
   * If Virtual Key has `provider_configs`: performs weighted random provider selection
   * **Result**: `req.Provider`/`req.Model` set; `req.Fallbacks` populated

2. **Enterprise Load Balancer Level 1** (PreRequestHook, builtin)
   * Runs after governance
   * If `req.Provider` is already set (by governance or by an explicit `provider/model` prefix from the user): **skips provider selection**
   * If not: performs performance-based provider selection across catalog providers
   * **Result**: `req.Provider`/`req.Model` set if previously empty

3. **model-catalog-resolver** (PreRequestHook, builtin order 9 — final fallback)
   * Runs last
   * If `req.Provider` is still empty: looks up the model in the catalog and picks a provider (preferring the integration's canonical provider when the request came in via an integration route)
   * Emits a `model-catalog` routing-engine log entry
   * **Result**: Always leaves `req.Provider` populated when the catalog knows about the model

4. **Empty-provider validation** (core, after PreRequestHook)
   * If `req.Provider` is still empty: returns 400 to the caller with a clear error

5. **Load Balancer Level 2** (Key Selector — core, per provider attempt)
   * **Always runs** during request execution
   * Gets all keys for the selected provider, filters by model restrictions
   * Scores each key by performance metrics
   * Selects best key using weighted random + exploration
   * **Result**: Optimal key selected within the provider

<Info>
  **Important**: Even when governance specifies `azure/gpt-4o` in PreRequestHook,
  load balancing Level 2 **still optimizes which Azure key to use** based on
  performance metrics. The two-level architecture is preserved — only the
  *layer* where Level 1 runs has moved from a middleware to PreRequestHook.
</Info>

### Example Scenarios

<Tabs>
  <Tab title="Governance Only">
    **Setup:**

    * Virtual Key has `provider_configs` defined
    * No adaptive load balancing enabled

    **Request:**

    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -H "x-bf-vk: vk-prod-main" \
      -d '{"model": "gpt-4o", "messages": [...]}'
    ```

    **Behavior:**

    1. **Governance** applies weighted provider routing → selects Azure (70% weight)
    2. Model becomes `azure/gpt-4o`
    3. **Standard key selection** (non-adaptive) chooses an Azure key based on static weights
    4. Request forwarded to Azure with selected key
  </Tab>

  <Tab title="Load Balancing Only">
    **Setup:**

    * **No Virtual Key** (do not send `x-bf-vk`) → this is the **Load Balancing–only** setup
    * **Virtual Key with empty / missing `provider_configs`** → **blocks all providers** (deny-by-default) and therefore is **NOT** an LB-only setup
    * Adaptive load balancing enabled

    **Request:**

    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -d '{"model": "gpt-4o", "messages": [...]}'
    ```

    **Behavior:**

    1. **Load Balancing Level 1** applies performance-based provider routing → selects OpenAI (best performing)
    2. Model becomes `openai/gpt-4o`
    3. **Load Balancing Level 2** selects best OpenAI key based on performance metrics (error rate, latency, TPM status)
    4. Request forwarded to OpenAI with optimal key
  </Tab>

  <Tab title="Both Available (Governance + Load Balancing)">
    **Setup:**

    * Virtual Key has `provider_configs` defined
    * Adaptive load balancing enabled
    * Azure has 3 keys: `azure-key-1`, `azure-key-2`, `azure-key-3`

    **Request:**

    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -H "x-bf-vk: vk-prod-main" \
      -d '{"model": "gpt-4o", "messages": [...]}'
    ```

    **Behavior:**

    1. **Governance** applies first (respects explicit user config) → selects Azure provider
    2. Model becomes `azure/gpt-4o`
    3. **Load Balancing Level 1** sees "/" and **skips provider selection** (already decided)
    4. **Load Balancing Level 2** still runs! Selects best Azure key based on performance:
       * `azure-key-1`: 99% success rate, 150ms avg latency → score 0.95
       * `azure-key-2`: 85% success rate, 200ms avg latency → score 0.60 (degraded)
       * `azure-key-3`: Hit TPM limit → score 0.0 (circuit broken)
       * **Selects `azure-key-1`** (highest score)
    5. Request forwarded to Azure with `azure-key-1`

    **Why?** Governance controls provider selection (explicit user intent), but load balancing still optimizes key selection (automatic performance optimization).
  </Tab>

  <Tab title="Manual Provider Selection">
    **Setup:**

    * Both governance and load balancing enabled
    * OpenAI has 2 keys available

    **Request:**

    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -d '{"model": "openai/gpt-4o", "messages": [...]}'
    ```

    **Behavior:**

    1. **Governance** sees "/" and skips
    2. **Load Balancing Level 1** sees "/" and **skips provider selection**
    3. **Load Balancing Level 2** still runs! Selects best OpenAI key based on current metrics
    4. Request forwarded to OpenAI with optimal key

    **Why?** User explicitly specified the provider, but key-level optimization still provides value by selecting the best-performing OpenAI key.
  </Tab>
</Tabs>

### Provider vs Key Selection Rules

| Scenario                          | Provider Selection                              | Key Selection                              |
| --------------------------------- | ----------------------------------------------- | ------------------------------------------ |
| VK with provider\_configs         | **Governance** (weighted random)                | **Standard** or **Adaptive** (if enabled)  |
| VK without provider\_configs + LB | **Blocked** (empty = no providers allowed)      | N/A                                        |
| No VK + LB                        | **Load Balancing Level 1** (performance)        | **Load Balancing Level 2** (performance)   |
| Model with provider prefix + LB   | **Skip** (already specified)                    | **Load Balancing Level 2** (performance) ✅ |
| No Load Balancing enabled         | **Governance** or **User** or **Model Catalog** | **Standard** (static weights)              |

<Note>
  **Critical Insight**:

  * **Provider selection** respects the hierarchy: Governance → Load Balancing Level 1 → User specification
  * **Key selection** runs independently and benefits from load balancing **even when provider is predetermined**

  This separation is what makes the two-level architecture so powerful!
</Note>

***

## Routing Rules (Dynamic Expression-Based Routing)

<Info>
  **Position in routing pipeline**: Routing Rules execute **before governance
  provider selection** and can override it. They are evaluated before adaptive
  load balancing, enabling dynamic provider/model overrides based on runtime
  conditions like headers, parameters, capacity metrics, and organizational
  hierarchy.
</Info>

### Overview

Routing Rules provide sophisticated, expression-based control over request routing using CEL expressions. Unlike governance routing (static weights), routing rules evaluate conditions dynamically at request time.

### When Routing Rules Execute

```mermaid theme={null}
flowchart TD
    Start["Request: model + provider"]

    subgraph PreReq["PreRequestHook Phase (once per request)"]
        direction TB
        subgraph Gov["Governance Plugin"]
            RuleMatch{"CEL Routing Rule<br/>Matches?"}
            RuleDecision["Override:<br/>provider/model/fallbacks"]
            VKValidation["Virtual Key Validation"]
            GovRouting["VK Provider Selection<br/>(weighted random)"]
        end
        LB1["Enterprise LB Level 1:<br/>Provider Selection<br/>(skipped if provider already set)"]
        Resolver["model-catalog-resolver:<br/>Fill provider from catalog<br/>(final fallback)"]
    end

    LB2["LB Level 2: Key Selection<br/>(core, per attempt)"]

    Start --> RuleMatch
    RuleMatch -->|Yes| RuleDecision --> LB1
    RuleMatch -->|No| VKValidation --> GovRouting --> LB1
    LB1 --> Resolver --> LB2 --> Execute["Execute with<br/>selected provider + key"]
```

### How It Works

All routing layers below execute inside the **PreRequestHook** phase in registration order; routing rules run first within the governance plugin's hook body, before VK load balancing:

1. **Routing rules evaluate first** in scope precedence order (VirtualKey → Team → Customer → Global)
2. **If a routing rule matches**: provider/model/fallbacks are overridden, the VK `provider_configs` weighted selection is skipped
3. **If no routing rule matches**: VK provider selection runs (weighted random)
4. **Enterprise LB Level 1**: skipped if `req.Provider` is already set; otherwise performs performance-based selection
5. **model-catalog-resolver**: last fallback — fills `req.Provider` from the catalog if no earlier plugin set it
6. **Empty-provider validation** (core): returns 400 if `req.Provider` is still empty after the phase
7. **Load balancing Level 2** (key selection, core, per attempt): always runs to select the best key within the determined provider

### Available CEL Variables

Routing rules access request context through CEL variables:

```cel theme={null}
// Request context
model                      // Requested model
provider                   // Current provider

// Headers and parameters (case-insensitive)
headers["x-tier"]          // Request header
params["region"]           // Query parameter

// Organization context
virtual_key_id             // VirtualKey ID
team_name                  // Team name
customer_id                // Customer ID

// Capacity metrics (0-100 percentage)
budget_used                // Budget usage %
tokens_used                // Token rate limit usage %
request                    // Request rate limit usage %
```

### Examples

#### Route based on user tier

```cel theme={null}
headers["x-tier"] == "premium"   // → openai/gpt-4o
```

#### Route to fallback when budget high

```cel theme={null}
budget_used > 85                 // → groq/llama-2 (cheaper)
```

#### Route by team

```cel theme={null}
team_name == "ml-research"       // → anthropic/claude-3-opus
```

#### Complex multi-condition routing

```cel theme={null}
headers["x-environment"] == "production" &&
tokens_used < 75 &&
team_name == "ai-platform"       // → openai/gpt-4o
```

### Scope Hierarchy

Rules are evaluated in organizational precedence order (first-match-wins):

```
1. VirtualKey scope (highest priority)
2. Team scope
3. Customer scope
4. Global scope (lowest priority)
```

Within each scope, rules are sorted by **priority** (ascending: 0 before 10).

### Key Features

| Feature                | Description                                                            |
| ---------------------- | ---------------------------------------------------------------------- |
| **CEL Expressions**    | Powerful, composable condition language with multiple operators        |
| **Scope Hierarchy**    | Rules at VirtualKey/Team/Customer/Global levels with proper precedence |
| **Dynamic Override**   | Override provider and/or model based on runtime conditions             |
| **Fallback Chains**    | Define multiple fallback providers for automatic failover              |
| **Priority Ordering**  | Lower priority evaluated first within same scope                       |
| **Capacity Awareness** | Access real-time budget and rate limit usage percentages               |

<Info>
  For complexity-based routing driven by request content, see [Complexity Router](/features/governance/complexity-router). It adds a `complexity_tier` CEL variable that lets routing rules steer SIMPLE, MEDIUM, COMPLEX, and REASONING requests to different models.
</Info>

### Integration with Governance

Routing Rules execute **before** governance provider selection and can override it:

**If a routing rule matches**:

```
Routing Rules evaluate
                    ↓
Rule matches: budget_used > 85
                    ↓
Override: groq/llama-2 (cheaper provider)
                    ↓
Governance provider_configs SKIPPED
                    ↓
Load Balancing selects best key
```

**If no routing rule matches**:

```
Routing Rules evaluate
                    ↓
No matching rule
                    ↓
Governance decides: azure/gpt-4o (70% weight)
                    ↓
Load Balancing selects best key
```

**Key Insight**: Routing rules have higher precedence than governance provider\_configs. If a routing rule matches, governance provider\_configs are bypassed entirely.

### Integration with Load Balancing

Routing Rules work **before** load balancing:

```
Routing Rules decide: openai/gpt-4o
                    ↓
Load Balancing Level 1: Skipped (provider already determined)
                    ↓
Load Balancing Level 2: Selects best OpenAI key based on performance
```

Even when routing rules determine the provider, load balancing Level 2 still optimizes which API key to use within that provider.

### Use Cases

* **Tier-based routing**: Premium users → fast providers
* **Capacity failover**: High budget usage → cheaper providers
* **Team preferences**: Different teams → different providers
* **A/B testing**: Route subset of traffic to test models
* **Regional routing**: EU users → EU providers (data residency)
* **Complex logic**: Combine multiple conditions for sophisticated routing

### Dashboard & API

Routing rules can be configured through:

* **Dashboard**: Visual rule builder with CEL expression editor
* **API**: `POST /api/governance/routing-rules` and related endpoints
* **Scope**: Create rules at global, customer, team, or virtual key levels
* **Priority**: Order rules within scope with numeric priority

For complete documentation, see [Routing Rules Documentation](/providers/routing-rules).

***

## Choosing the Right Approach

1. **Use Governance When:**

   ✅ **Compliance requirements**: Need to ensure data stays in specific regions or providers
   ✅ **Cost optimization**: Want explicit control over traffic distribution to cheaper providers
   ✅ **Budget enforcement**: Need hard limits on spending per provider
   ✅ **Environment separation**: Different teams/apps need different provider access
   ✅ **Rate limit management**: Need to respect provider-specific rate limits

2. **Use Routing Rules When:**

   ✅ **Dynamic routing**: Route based on runtime request context (headers, parameters)
   ✅ **Capacity-aware routing**: Switch to fallback when budget/rate limits high
   ✅ **Organization-based routing**: Different rules for teams/customers
   ✅ **A/B testing**: Route subset of traffic to test new models
   ✅ **Complex conditions**: Multiple criteria (e.g., tier + capacity + team)

3. **Use Load Balancing When:**

   ✅ **Performance optimization**: Want automatic routing to best-performing providers
   ✅ **Minimal configuration**: Prefer hands-off operation with intelligent defaults
   ✅ **Dynamic workloads**: Traffic patterns change frequently
   ✅ **Automatic failover**: Need instant adaptation to provider issues
   ✅ **Multi-provider redundancy**: Want seamless provider switching based on availability

4. **Use All Three Together:**

   ✅ **Complete solution**: Governance provides base routing, routing rules add dynamic override, load balancing optimizes keys
   ✅ **Maximum flexibility**: Different Virtual Keys use different strategies (governance vs routing rules vs load balancing)
   ✅ **Enterprise deployments**: Complex organizations with multiple requirements per layer

***

## Additional Resources

<CardGroup cols={2}>
  <Card title="Governance Routing" icon="shield-check" href="/features/governance/routing">
    Configuration instructions for setting up governance routing via Virtual
    Keys (Web UI, API, config.json)
  </Card>

  <Card title="Routing Rules" icon="sliders" href="/providers/routing-rules">
    Dynamic, expression-based routing using CEL expressions for runtime
    conditions
  </Card>

  <Card title="Adaptive Load Balancing" icon="brain" href="/enterprise/adaptive-load-balancing">
    Technical implementation details: scoring algorithms, weight calculations,
    and performance characteristics
  </Card>

  <Card title="Virtual Keys" icon="key" href="/features/governance/virtual-keys">
    Learn how to create and configure Virtual Keys
  </Card>

  <Card title="Fallbacks" icon="arrow-rotate-right" href="/features/fallbacks">
    Understand how automatic fallbacks work across providers
  </Card>
</CardGroup>