Budget and Limits

Overview

Budgeting and rate limiting are a core feature of Bifrost’s governance system managed through Virtual Keys. Bifrost’s budget management system provides comprehensive cost control and financial governance for enterprise AI deployments. It operates through a hierarchical budget structure that enables granular cost management, usage tracking, and financial oversight across your entire organization. Core Hierarchy:

Customer (has independent budget)
    ↓ (one-to-many)
Team (has independent budget) 
    ↓ (one-to-many)
Virtual Key (has independent budget + rate limits)
    ↓ (one-to-many)
Provider Config (has independent budget + rate limits)

OR

Customer (has independent budget)
    ↓ (direct attachment)
Virtual Key (has independent budget + rate limits)
    ↓ (one-to-many)
Provider Config (has independent budget + rate limits)

OR

Virtual Key (standalone - has independent budget + rate limits)
    ↓ (one-to-many)
Provider Config (has independent budget + rate limits)

Key Capabilities:

Virtual Keys - Primary access control via x-bf-vk header (exclusive team OR customer attachment)
Budget Management - Independent budget limits at each hierarchy level with cumulative checking
Rate Limiting - Request and token-based throttling at both VK and provider config levels
Provider-Level Governance - Granular budgets and rate limits per AI provider within a virtual key
Model/Provider Filtering - Granular access control per virtual key
Usage Tracking - Real-time monitoring and audit trails
Audit Headers - Optional team and customer identification

Budget Management

Cost Calculation

Bifrost automatically calculates costs based on:

Provider Pricing - Real-time model pricing data
Token Usage - Input + output tokens from API responses
Request Type - Different pricing for chat, text, embedding, speech, transcription
Cache Status - Reduced costs for cached responses
Batch Operations - Volume discounts for batch requests

All cost calculation details are covered in Architecture > Framework > Model Catalog.

Budget Checking Flow

When a request is made with a virtual key, Bifrost checks all applicable budgets independently in the hierarchy. Each budget must have sufficient remaining balance for the request to proceed. Checking Sequence: For VK → Team → Customer:

✓ Provider Config Budget (if provider config has budget)
✓ VK Budget (if VK has budget)
✓ Team Budget (if VK's team has budget)  
✓ Customer Budget (if team's customer has budget)

For VK → Customer (direct):

✓ Provider Config Budget (if provider config has budget)
✓ VK Budget (if VK has budget)
✓ Customer Budget (if VK's customer has budget)

For Standalone VK:

1. ✓ Provider Config Budget (if provider config has budget)
2. ✓ VK Budget (if VK has budget)

Important Notes:

All applicable budgets must pass - any single budget failure blocks the request
Budgets are independent - each tracks its own usage and limits
Costs are deducted from all applicable budgets - same cost applied to each level
Rate limits checked at provider config and VK levels - teams and customers have no rate limits
Provider selection - providers that exceed their budget or rate limits are excluded from routing

Example:

- Provider config budget: $4/$5 remaining ✓
- VK budget: $9/$10 remaining ✓
- Team budget: $15/$20 remaining ✓  
- Customer budget: $45/$50 remaining ✓
- Result: Allowed (no budget is exceeded)

- After request: 
    - Request cost: $2 
    - Updated Provider=$6/$5, VK=$11/$10, Team=$17/$20, Customer=$47/$50
    - Then the next request will be blocked (both provider and VK budgets exceeded).

Rate Limiting

Rate limits protect your system from abuse and manage traffic by setting thresholds on request frequency and token usage over a specific time window. Rate limits can be configured at both the Virtual Key level and Provider Config level for granular control. Bifrost supports two types of rate limits that work in parallel:

Request Limits: Control the maximum number of API calls that can be made within a set duration (e.g., 100 requests per minute).
Token Limits: Control the maximum number of tokens (prompt + completion) that can be processed within a set duration (e.g., 50,000 tokens per hour).

Rate Limit Hierarchy

Rate limits are checked in hierarchical order:

1. ✓ Provider Config Rate Limits (if provider config has rate limits)
2. ✓ Virtual Key Rate Limits (if VK has rate limits)

For a request to be allowed, it must pass both the request limit and token limit checks at all applicable levels. If a provider config exceeds its rate limits, that provider is excluded from routing, but other providers within the same virtual key remain available.

Provider-Level Rate Limiting

Provider configs within a virtual key can have independent rate limits, enabling:

Per-Provider Throttling: Different rate limits for OpenAI vs Anthropic
Provider Isolation: Rate limit violations on one provider don’t affect others
Granular Control: Fine-tune limits based on provider capabilities and costs

Reset Durations

Budgets and rate limits support flexible reset durations: Format Examples:

1m - 1 minute
5m - 5 minutes
1h - 1 hour
1d - 1 day
1w - 1 week
1M - 1 month
1Y - 1 year

Common Patterns:

Rate Limits: 1m, 1h, 1d for request throttling
Budgets: 1d, 1w, 1M, 1Y for cost control

Calendar-aligned budgets

By default, a budget rolls: after reset_duration elapses since last_reset, usage resets. With calendar_aligned: true, the budget resets at the start of each calendar period in UTC instead (same instant for every customer of that configuration). Supported reset_duration suffixes: only day (d), week (w), month (M), and year (Y). Examples: 1d → midnight UTC each day; 1w → Monday 00:00 UTC each week; 1M → first day of each month; 1Y → January 1 each year. Sub-day durations (for example 1h, 30m) cannot use calendar alignment; the API rejects invalid combinations. Calendar alignment applies to budgets on customers, teams, virtual keys, and per–provider-config budgets. You can set it when creating a budget (calendar_aligned on create) or toggle it on update (calendar_aligned on the budget in PUT requests). Turning calendar alignment on for an existing budget resets current usage to zero and snaps last_reset to the current period start.

Configuration Guide

Configure provider-level budgets and rate limits using any of these methods:

Web UI
API
config.json

The Bifrost Web UI provides an intuitive interface for configuring provider-level governance through the Virtual Keys management page.

Creating Virtual Keys with Provider Configs

Navigate to Virtual Keys: Go to Virtual Keys page in the Bifrost dashboard
Create New Virtual Key: Click “Create Virtual Key” button
Configure Providers: In the “Provider Configurations” section:
- Add multiple providers with individual weights
- Set provider-specific budgets and rate limits
- Configure allowed models per provider

Provider Configuration Interface

Key Features:

Visual Provider Cards: Each provider displays as an expandable card
Budget Controls: Set spending limits with reset periods per provider
Rate Limit Controls: Configure token and request limits independently
Model Filtering: Specify allowed models for each provider
Weight Distribution: Visual indicators for load balancing weights
Real-time Validation: Immediate feedback on configuration errors

Monitoring Provider Usage

The info sheet for the virtual key provides real-time monitoring of:

Budget consumption per provider
Rate limit utilization (tokens and requests)
Provider availability status
Usage trends and forecasting

Use the Bifrost HTTP API to programmatically manage provider-level governance configurations.

Create Virtual Key with Provider Configs

curl -X POST "https://your-bifrost-instance.com/api/governance/virtual-keys" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "marketing-team-vk",
    "description": "Marketing team virtual key with provider-specific limits",
    "provider_configs": [
      {
        "provider": "openai",
        "weight": 0.7,
        "allowed_models": ["gpt-4", "gpt-3.5-turbo"],
        "budget": {
          "max_limit": 500.00,
          "reset_duration": "1M",
          "calendar_aligned": true
        },
        "rate_limit": {
          "token_max_limit": 1000000,
          "token_reset_duration": "1h",
          "request_max_limit": 1000,
          "request_reset_duration": "1h"
        }
      },
      {
        "provider": "anthropic",
        "weight": 0.3,
        "allowed_models": ["claude-3-opus", "claude-3-sonnet"],
        "budget": {
          "max_limit": 200.00,
          "reset_duration": "1M"
        },
        "rate_limit": {
          "token_max_limit": 500000,
          "token_reset_duration": "1h",
          "request_max_limit": 500,
          "request_reset_duration": "1h"
        }
      }
    ],
    "budget": {
      "max_limit": 1000.00,
      "reset_duration": "1M",
      "calendar_aligned": true
    },
    "is_active": true
  }'

Use calendar_aligned only with d / w / M / Y reset durations (see Calendar-aligned budgets).

Update Provider Configuration

curl -X PUT "https://your-bifrost-instance.com/api/governance/virtual-keys/{vk_id}" \
  -H "Content-Type: application/json" \
  -d '{
    "provider_configs": [
      {
        "id": 1,
        "provider": "openai",
        "weight": 0.8,
        "budget": {
          "max_limit": 600.00,
          "reset_duration": "1M"
        },
        "rate_limit": {
          "token_max_limit": 1200000,
          "token_reset_duration": "1h"
        }
      }
    ]
  }'

API Response Structure

{
  "message": "Virtual key created successfully",
  "virtual_key": {
    "id": "vk_123",
    "name": "marketing-team-vk",
    "value": "vk_abc123def456",
    "provider_configs": [
      {
        "id": 1,
        "provider": "openai",
        "weight": 0.7,
        "allowed_models": ["gpt-4", "gpt-3.5-turbo"],
        "budget": {
          "id": "budget_789",
          "max_limit": 500.00,
          "current_usage": 0.00,
          "reset_duration": "1M",
          "calendar_aligned": true,
          "last_reset": "2024-01-01T00:00:00Z"
        },
        "rate_limit": {
          "id": "rate_limit_456",
          "token_max_limit": 1000000,
          "token_current_usage": 0,
          "token_reset_duration": "1h",
          "token_last_reset": "2024-01-01T00:00:00Z",
          "request_max_limit": 1000,
          "request_current_usage": 0,
          "request_reset_duration": "1h",
          "request_last_reset": "2024-01-01T00:00:00Z"
        }
      }
    ]
  }
}

Field Descriptions

Field	Type	Description
`provider`	string	AI provider name (e.g., “openai”, “anthropic”)
`weight`	float	Load balancing weight (0.0-1.0)
`allowed_models`	array	Specific models allowed for this provider
`budget.max_limit`	float	Maximum spend in USD
`budget.reset_duration`	string	Reset period (e.g., “1h”, “1d”, “1M”)
`budget.calendar_aligned`	boolean	When true, resets at calendar boundaries in UTC (requires `d`/`w`/`M`/`Y` durations)
`rate_limit.token_max_limit`	integer	Maximum tokens per period
`rate_limit.request_max_limit`	integer	Maximum requests per period

Configure provider-level governance through Bifrost’s configuration file for declarative management.

Basic Configuration Structure

{
  "governance": {
    "virtual_keys": [
      {
        "id": "vk-dev-001",
        "name": "development-team-vk",
        "description": "Development team with multi-provider setup",
        "is_active": true,
        "rate_limit_id": "rl-vk-dev",
        "provider_configs": [
          {
            "id": 1,
            "provider": "openai",
            "weight": 0.6,
            "allowed_models": ["gpt-4", "gpt-3.5-turbo"],
            "rate_limit_id": "rl-pc-openai"
          },
          {
            "id": 2,
            "provider": "anthropic",
            "weight": 0.4,
            "allowed_models": ["claude-3-opus", "claude-3-sonnet"],
            "rate_limit_id": "rl-pc-anthropic"
          }
        ]
      }
    ],
    "budgets": [
      {
        "id": "budget-vk-dev",
        "virtual_key_id": "vk-dev-001",
        "max_limit": 2000.00,
        "reset_duration": "1M",
        "calendar_aligned": true
      },
      {
        "id": "budget-pc-openai",
        "provider_config_id": 1,
        "max_limit": 1000.00,
        "reset_duration": "1M"
      },
      {
        "id": "budget-pc-anthropic",
        "provider_config_id": 2,
        "max_limit": 500.00,
        "reset_duration": "1M"
      }
    ],
    "rate_limits": [
      {
        "id": "rl-vk-dev",
        "token_max_limit": 5000000,
        "token_reset_duration": "1h",
        "request_max_limit": 3000,
        "request_reset_duration": "1h"
      },
      {
        "id": "rl-pc-openai",
        "token_max_limit": 2000000,
        "token_reset_duration": "1h",
        "request_max_limit": 2000,
        "request_reset_duration": "1h"
      },
      {
        "id": "rl-pc-anthropic",
        "token_max_limit": 1000000,
        "token_reset_duration": "1h",
        "request_max_limit": 1000,
        "request_reset_duration": "1h"
      }
    ]
  }
}

Budgets and rate limits live as separate top-level arrays inside governance. Virtual keys and provider configs reference them by id (rate_limit_id) or are referenced back (virtual_key_id / provider_config_id on each budgets[] entry). Optional calendar_aligned on each budget matches the HTTP API and calendar-aligned behavior.

Advanced Configuration Examples

Cost-Optimized Setup

{
  "governance": {
    "virtual_keys": [
      {
        "id": "vk-cost-opt",
        "name": "cost-optimized-vk",
        "provider_configs": [
          {"id": 10, "provider": "openai-gpt-3.5", "weight": 0.8, "rate_limit_id": "rl-cheap"},
          {"id": 11, "provider": "openai-gpt-4",   "weight": 0.2, "rate_limit_id": "rl-premium"}
        ]
      }
    ],
    "budgets": [
      {"id": "b-cheap",   "provider_config_id": 10, "max_limit": 50.00,  "reset_duration": "1d"},
      {"id": "b-premium", "provider_config_id": 11, "max_limit": 200.00, "reset_duration": "1d"}
    ],
    "rate_limits": [
      {"id": "rl-cheap",   "request_max_limit": 1000, "request_reset_duration": "1h"},
      {"id": "rl-premium", "request_max_limit": 100,  "request_reset_duration": "1h"}
    ]
  }
}

High-Volume Production Setup

{
  "governance": {
    "virtual_keys": [
      {
        "id": "vk-prod-hv",
        "name": "production-high-volume-vk",
        "provider_configs": [
          {"id": 20, "provider": "openai",       "weight": 0.5, "rate_limit_id": "rl-openai"},
          {"id": 21, "provider": "anthropic",    "weight": 0.3, "rate_limit_id": "rl-anthropic"},
          {"id": 22, "provider": "azure-openai", "weight": 0.2, "rate_limit_id": "rl-azure"}
        ]
      }
    ],
    "budgets": [
      {"id": "b-openai",    "provider_config_id": 20, "max_limit": 5000.00, "reset_duration": "1M"},
      {"id": "b-anthropic", "provider_config_id": 21, "max_limit": 3000.00, "reset_duration": "1M"},
      {"id": "b-azure",     "provider_config_id": 22, "max_limit": 2000.00, "reset_duration": "1M"}
    ],
    "rate_limits": [
      {"id": "rl-openai",    "token_max_limit": 10000000, "token_reset_duration": "1h", "request_max_limit": 10000, "request_reset_duration": "1h"},
      {"id": "rl-anthropic", "token_max_limit": 6000000,  "token_reset_duration": "1h", "request_max_limit": 6000,  "request_reset_duration": "1h"},
      {"id": "rl-azure",     "token_max_limit": 4000000,  "token_reset_duration": "1h", "request_max_limit": 4000,  "request_reset_duration": "1h"}
    ]
  }
}

Validation Rules:

Budget limits must be positive numbers
Reset durations must be valid time formats
Rate limits must be positive integers
Provider names must match configured providers

Provider-Level Governance Examples

Example 1: Mixed Provider Budgets

A virtual key configured with multiple providers and different budget allocations:

{
  "governance": {
    "virtual_keys": [
      {
        "id": "vk-mkt",
        "name": "marketing-team-vk",
        "provider_configs": [
          {"id": 30, "provider": "openai",    "weight": 0.7},
          {"id": 31, "provider": "anthropic", "weight": 0.3}
        ]
      }
    ],
    "budgets": [
      {"id": "b-vk-mkt", "virtual_key_id": "vk-mkt",      "max_limit": 100, "reset_duration": "1M"},
      {"id": "b-openai", "provider_config_id": 30,        "max_limit": 50,  "reset_duration": "1M"},
      {"id": "b-anth",   "provider_config_id": 31,        "max_limit": 30,  "reset_duration": "1M"}
    ]
  }
}

Behavior:

OpenAI requests limited to 50 dollars/month at provider level + 100 dollars/month at VK level
Anthropic requests limited to 30 dollars/month at provider level + 100 dollars/month at VK level
If any provider’s budget is exhausted, all requests to that provider will be blocked

Example 2: Provider-Specific Rate Limits

Different rate limits based on provider capabilities:

{
  "governance": {
    "virtual_keys": [
      {
        "id": "vk-hv",
        "name": "high-volume-vk",
        "provider_configs": [
          {"id": 40, "provider": "openai",    "rate_limit_id": "rl-openai"},
          {"id": 41, "provider": "anthropic", "rate_limit_id": "rl-anthropic"}
        ]
      }
    ],
    "rate_limits": [
      {"id": "rl-openai",    "request_max_limit": 1000, "request_reset_duration": "1h", "token_max_limit": 1000000, "token_reset_duration": "1h"},
      {"id": "rl-anthropic", "request_max_limit": 500,  "request_reset_duration": "1h", "token_max_limit": 500000,  "token_reset_duration": "1h"}
    ]
  }
}

Behavior:

OpenAI: 1000 requests/hour, 1M tokens/hour
Anthropic: 500 requests/hour, 500K tokens/hour
If any provider’s rate limits are exceeded, all requests to that provider will be blocked

Example 3: Failover Strategy

Provider configurations with budget-based failover:

{
  "governance": {
    "virtual_keys": [
      {
        "id": "vk-cost",
        "name": "cost-optimized-vk",
        "provider_configs": [
          {"id": 50, "provider": "openai-cheap",   "weight": 1.0},
          {"id": 51, "provider": "openai-premium", "weight": 0.0, "rate_limit_id": "rl-premium"}
        ]
      }
    ],
    "budgets": [
      {"id": "b-cheap",   "provider_config_id": 50, "max_limit": 10, "reset_duration": "1d"},
      {"id": "b-premium", "provider_config_id": 51, "max_limit": 50, "reset_duration": "1d"}
    ],
    "rate_limits": [
      {"id": "rl-premium", "request_max_limit": 100, "request_reset_duration": "1h", "token_max_limit": 50000, "token_reset_duration": "1h"}
    ]
  }
}

Behavior:

Primary: Use cheap provider until $10 daily budget exhausted
Fallback: Automatically switch to premium provider when cheap option unavailable. To enable this, you should not send provider name in the request body, read Routing for more details.
Cost containment: Prevent unexpected overspend on premium resources and limit the number of requests to the premium provider

Key Benefits of Provider-Level Governance

Granular Control: Set specific spending limits and rate limits per AI provider
Automatic Fallback: Route to alternative providers when budgets or rate limits are exceeded
Cost Control: Track and control spending by provider for better financial oversight
Performance Testing: A/B testing across providers with controlled budgets
Multi-Provider Strategies: Primary/backup provider configurations
Cost-Tiered Access: Cheap providers for basic tasks, premium for complex workloads

Next Steps

Routing - Direct requests to specific AI models, providers, and keys using Virtual Keys.
MCP Tool Filtering - Manage MCP clients/tools for virtual keys.
Tracing - Audit trails and request tracking

Overview

Quick Start

Migration Guides

SDK Integrations

Providers & Guides

MCP Gateway

Custom plugins

Open Source Features

Budget and Limits

Overview

Budget Management

Cost Calculation

Budget Checking Flow

Rate Limiting

Rate Limit Hierarchy

Provider-Level Rate Limiting

Reset Durations

Calendar-aligned budgets

Configuration Guide

Creating Virtual Keys with Provider Configs

Provider Configuration Interface

Monitoring Provider Usage

Create Virtual Key with Provider Configs

Update Provider Configuration

API Response Structure

Field Descriptions

Basic Configuration Structure

Advanced Configuration Examples

Cost-Optimized Setup

High-Volume Production Setup

Provider-Level Governance Examples

Example 1: Mixed Provider Budgets

Example 2: Provider-Specific Rate Limits

Example 3: Failover Strategy

Key Benefits of Provider-Level Governance

Next Steps

Overview

Quick Start

Migration Guides

SDK Integrations

Providers & Guides

MCP Gateway

Custom plugins

Open Source Features

Documentation Index

​Overview

​Budget Management

​Cost Calculation

​Budget Checking Flow

​Rate Limiting

​Rate Limit Hierarchy

​Provider-Level Rate Limiting

​Reset Durations

​Calendar-aligned budgets

​Configuration Guide

​Creating Virtual Keys with Provider Configs

​Provider Configuration Interface

​Monitoring Provider Usage

​Create Virtual Key with Provider Configs

​Update Provider Configuration

​API Response Structure

​Field Descriptions

​Basic Configuration Structure

​Advanced Configuration Examples

​Cost-Optimized Setup

​High-Volume Production Setup

​Provider-Level Governance Examples

​Example 1: Mixed Provider Budgets

​Example 2: Provider-Specific Rate Limits

​Example 3: Failover Strategy

​Key Benefits of Provider-Level Governance

​Next Steps

Overview

Budget Management

Cost Calculation

Budget Checking Flow

Rate Limiting

Rate Limit Hierarchy

Provider-Level Rate Limiting

Reset Durations

Calendar-aligned budgets

Configuration Guide

Creating Virtual Keys with Provider Configs

Provider Configuration Interface

Monitoring Provider Usage

Create Virtual Key with Provider Configs

Update Provider Configuration

API Response Structure

Field Descriptions

Basic Configuration Structure

Advanced Configuration Examples

Cost-Optimized Setup

High-Volume Production Setup

Provider-Level Governance Examples

Example 1: Mixed Provider Budgets

Example 2: Provider-Specific Rate Limits

Example 3: Failover Strategy

Key Benefits of Provider-Level Governance

Next Steps