Overview
Bifrost offers two powerful methods for routing requests across AI providers, each serving different use cases:- Governance-based Routing: Explicit, user-defined routing rules configured via Virtual Keys
- Adaptive Load Balancing: Automatic, performance-based routing powered by real-time metrics (Enterprise feature)
When to use which method:
- Use Governance when you need explicit control, compliance requirements, or specific cost optimization strategies
- Use Adaptive Load Balancing for automatic performance optimization and minimal configuration overhead
The Model Catalog
The Model Catalog is Bifrost’s central registry that tracks which models are available from which providers. It powers both governance-based routing and adaptive load balancing by maintaining an up-to-date mapping of models to providers.Data Sources
The Model Catalog combines two data sources:-
Pricing Data (Primary source)
- Downloaded from a remote URL (configurable, defaults to Maxim’s pricing endpoint)
- Contains model names, pricing tiers, and provider mappings
- Synced to database on startup and refreshed every hour
- Used for cost calculation and initial model-to-provider mapping
-
Provider List Models API (Secondary source)
- Calls each provider’s
/v1/modelsendpoint - Enriches the catalog with provider-specific models and aliases
- Called on Bifrost startup and when providers are added/updated
- Adds models that may not be in pricing data yet
- Calls each provider’s
Syncing Behavior
Initial Sync (Startup)
Initial Sync (Startup)
When Bifrost starts:
- Pricing data is loaded from the remote URL
- If successful, data is stored in the database (if config store is available)
- Model pool is populated from pricing data
- List models API is called for all configured providers
- Results are added to the model pool
- This is logged as a warning but does not stop startup
- The provider can still be used with models from pricing data
Ongoing Sync (Background)
Ongoing Sync (Background)
While Bifrost is running:
- Pricing data: Background worker checks every hour and syncs if interval elapsed
- List models API: Re-fetched when provider is added/updated via API or dashboard
- If pricing URL fails but database has existing data → Use database
- If pricing URL fails and no database data → Error (startup fails)
- If list models API fails → Log warning, continue with pricing data only
Fallback Strategy
Fallback Strategy
When syncing fails:
- Pricing data failure: Use existing database records (requires config store)
- List models failure: Rely on pricing data only
- Empty
allowed_models: Use model catalog to validate which models are supported
How It’s Used in Routing
- Governance Routing
- Load Balancing
When a Virtual Key has empty Bifrost checks the Model Catalog:
allowed_models:- Request for
gpt-4o→ ✅ Allowed (catalog shows OpenAI supports this) - Request for
claude-3-sonnet→ ❌ Rejected (catalog shows OpenAI doesn’t support this)
Model Catalog is essential for cross-provider routing. Without it, Bifrost wouldn’t know that
gpt-4o is available from both OpenAI and Azure, limiting routing flexibility.Governance-based Routing
Governance-based routing allows you to explicitly define which providers and models should handle requests for a specific Virtual Key. This method provides precise control over routing decisions.How It Works
When a Virtual Key hasprovider_configs defined:
- Request arrives with a Virtual Key (e.g.,
x-bf-vk: vk-prod-main) - Model validation: Bifrost checks if the requested model is allowed for any configured provider
- Provider filtering: Providers are filtered based on:
- Model availability in
allowed_models - Budget limits (current usage vs max limit)
- Rate limits (tokens/requests per time window)
- Model availability in
- Weighted selection: A provider is selected using weighted random distribution
- Provider prefix added: Model string becomes
provider/model(e.g.,openai/gpt-4o) - Fallbacks created: Remaining providers sorted by weight (descending) are added as fallbacks
Configuration Example
Request Flow
1
Request with Virtual Key
2
Governance Evaluation
- OpenAI: ✅ Has
gpt-4oin allowed_models, budget OK, weight 0.3 - Azure: ✅ Has
gpt-4oin allowed_models, rate limit OK, weight 0.7
3
Weighted Selection
- 70% chance → Azure
- 30% chance → OpenAI
4
Request Transformation
Key Features
| Feature | Description |
|---|---|
| Explicit Control | Define exactly which providers and models are accessible |
| Budget Enforcement | Automatically exclude providers exceeding budget limits |
| Rate Limit Protection | Skip providers that have hit rate limits |
| Weighted Distribution | Control traffic distribution with custom weights |
| Automatic Fallbacks | Failed providers automatically retry with next highest weight |
Best Practices
Cost Optimization
Cost Optimization
Assign higher weights to cheaper providers for cost-sensitive workloads:
Environment Separation
Environment Separation
Create different Virtual Keys for dev/staging/prod with different provider access:
Compliance & Data Residency
Compliance & Data Residency
Restrict specific Virtual Keys to compliant providers:
Empty
allowed_models: When left empty, Bifrost uses the Model Catalog (populated from pricing data and the provider’s list models API) to determine which models are supported. See the Model Catalog section above for how syncing works. For configuration instructions, see Governance Routing.Adaptive Load Balancing
Enterprise Feature: Adaptive Load Balancing is available in Bifrost Enterprise. Contact us to enable it.
Two-Level Architecture
Why Two Levels?
Separating provider selection (direction) from key selection (route) enables:
- Provider-level optimization: Choose the best provider for a model based on aggregate performance
- Key-level optimization: Within that provider, choose the best API key based on individual key performance
- Resilience: Even when provider is specified (by governance or user), key-level load balancing still optimizes which API key to use
Level 1: Direction (Provider Selection)
When it runs: Only when the model string has no provider prefix (e.g.,gpt-4o)
How it works:
- Model catalog lookup: Find all configured providers that support the requested model
- Provider filtering: Filter based on:
- Allowed models from keys configuration
- Keys availability for the provider
- Performance scoring: Calculate scores for each provider based on:
- Error rates (50% weight)
- Latency (20% weight, using MV-TACOS algorithm)
- Utilization (5% weight)
- Momentum bias (recovery acceleration)
- Smart selection: Choose provider using weighted random with jitter and exploration
- Fallbacks created: Remaining providers sorted by performance score (descending) are added as fallbacks
Level 2: Route (Key Selection)
When it runs: Always, even when provider is already specified (by governance, user, or Level 1) How it works:- Get available keys: Fetch all keys for the selected provider
- Filter by configuration: Apply model restrictions from key configuration
- Performance scoring: Calculate score for each key based on:
- Error rates (recent failures)
- Latency (response time)
- TPM hits (rate limit violations)
- Current state (Healthy, Degraded, Failed, Recovering)
- Weighted random selection: Choose key with exploration (25% chance to probe recovering keys)
- Circuit breaker: Skip keys with zero weight (TPM hits, repeated failures)
Scoring Algorithm
The load balancer computes a performance score for each provider-model combination:Request Flow
1
Request without Provider Prefix
2
Model Catalog Lookup
Providers supporting
gpt-4o: [openai, azure, groq]3
Performance Evaluation
- OpenAI: Score 0.92 (low latency, 99% success rate)
- Azure: Score 0.85 (medium latency, 98% success rate)
- Groq: Score 0.65 (high latency recently)
4
Provider Selection
OpenAI selected (highest score within jitter band)
5
Request Transformation
Key Features
| Feature | Description |
|---|---|
| Automatic Optimization | No manual weight tuning required |
| Real-time Adaptation | Weights recomputed every 5 seconds based on live metrics |
| Circuit Breakers | Failing routes automatically removed from rotation |
| Fast Recovery | 90% penalty reduction in 30 seconds after issues resolve |
| Health States | Routes transition between Healthy, Degraded, Failed, and Recovering |
| Smart Exploration | 25% chance to probe potentially recovered routes |
Dashboard Visibility
Monitor load balancing performance in real-time:
- Weight distribution across provider-model-key routes
- Performance metrics (error rates, latency, success rates)
- State transitions (Healthy → Degraded → Failed → Recovering)
- Actual vs expected traffic distribution
How Governance and Load Balancing Interact
When both methods are available in your Bifrost deployment, they work together in a complementary way across two levels.Execution Flow
Execution Order
-
HTTPTransportIntercept (Governance Plugin - Provider Level)
- Runs first in the request pipeline
- Checks if Virtual Key has
provider_configs - If yes: adds provider prefix (e.g.,
azure/gpt-4o) - Result: Provider is selected by governance rules
-
Middleware (Load Balancing Plugin - Provider Level / Direction)
- Runs after HTTPTransportIntercept
- Checks if model string contains ”/”
- If yes: skips provider selection (already determined by governance or user)
- If no: performs performance-based provider selection
- Result: Provider prefix added if not already present
-
KeySelector (Load Balancing - Key Level / Route)
- Always runs during request execution in Bifrost core
- Gets all keys for the selected provider
- Filters keys based on model restrictions
- Scores each key by performance metrics
- Selects best key using weighted random + exploration
- Result: Optimal key selected within the provider
Important: Even when governance specifies
azure/gpt-4o, load balancing still optimizes which Azure key to use based on performance metrics. This is the power of the two-level architecture!Example Scenarios
- Governance Only
- Load Balancing Only
- Both Available (Governance + Load Balancing)
- Manual Provider Selection
Setup:Behavior:
- Virtual Key has
provider_configsdefined - No adaptive load balancing enabled
- Governance applies weighted provider routing → selects Azure (70% weight)
- Model becomes
azure/gpt-4o - Standard key selection (non-adaptive) chooses an Azure key based on static weights
- Request forwarded to Azure with selected key
Provider vs Key Selection Rules
| Scenario | Provider Selection | Key Selection |
|---|---|---|
| VK with provider_configs | Governance (weighted random) | Standard or Adaptive (if enabled) |
| VK without provider_configs + LB | Load Balancing Level 1 (performance) | Load Balancing Level 2 (performance) |
| No VK + LB | Load Balancing Level 1 (performance) | Load Balancing Level 2 (performance) |
| Model with provider prefix + LB | Skip (already specified) | Load Balancing Level 2 (performance) ✅ |
| No Load Balancing enabled | Governance or User or Model Catalog | Standard (static weights) |
Critical Insight:
- Provider selection respects the hierarchy: Governance → Load Balancing Level 1 → User specification
- Key selection runs independently and benefits from load balancing even when provider is predetermined
Choosing the Right Approach
- Use Governance When: ✅ Compliance requirements: Need to ensure data stays in specific regions or providers ✅ Cost optimization: Want explicit control over traffic distribution to cheaper providers ✅ Budget enforcement: Need hard limits on spending per provider ✅ Environment separation: Different teams/apps need different provider access ✅ Rate limit management: Need to respect provider-specific rate limits
- Use Load Balancing When: ✅ Performance optimization: Want automatic routing to best-performing providers ✅ Minimal configuration: Prefer hands-off operation with intelligent defaults ✅ Dynamic workloads: Traffic patterns change frequently ✅ Automatic failover: Need instant adaptation to provider issues ✅ Multi-provider redundancy: Want seamless provider switching based on availability
- Use Both When: ✅ Hybrid requirements: Some Virtual Keys need governance, others can use load balancing ✅ Progressive rollout: Start with governance, gradually adopt load balancing ✅ Selective optimization: Governance for sensitive workloads, load balancing for others
Additional Resources
Governance Routing
Configuration instructions for setting up governance routing via Virtual Keys (Web UI, API, config.json)
Adaptive Load Balancing
Technical implementation details: scoring algorithms, weight calculations, and performance characteristics
Virtual Keys
Learn how to create and configure Virtual Keys
Fallbacks
Understand how automatic fallbacks work across providers

