Telemetry

Overview

Bifrost provides built-in telemetry and monitoring capabilities through Prometheus metrics collection. The telemetry system tracks both HTTP-level performance metrics and upstream provider interactions, giving you complete visibility into your AI gateway’s performance and usage patterns. Key Features:

Prometheus Integration - Native metrics collection at /metrics endpoint
Comprehensive Tracking - Success/error rates, token usage, costs, and cache performance
Custom Labels - Configurable dimensions for detailed analysis
Dynamic Headers - Runtime label injection via x-bf-dim-* headers
Cost Monitoring - Real-time tracking of AI provider costs in USD
Cache Analytics - Direct and semantic cache hit tracking
Async Collection - Zero-latency impact on request processing
Multi-Level Tracking - HTTP transport + upstream provider metrics

The telemetry plugin operates asynchronously to ensure metrics collection doesn’t impact request latency or connection performance.

Default Metrics

HTTP Transport Metrics

These metrics track all incoming HTTP requests to Bifrost:

Metric	Type	Description
`http_requests_total`	Counter	Total number of HTTP requests
`http_request_duration_seconds`	Histogram	Duration of HTTP requests
`http_request_size_bytes`	Histogram	Size of incoming HTTP requests
`http_response_size_bytes`	Histogram	Size of outgoing HTTP responses

Labels:

path: HTTP endpoint path
method: HTTP verb (e.g., GET, POST, PUT, DELETE)
status: HTTP status code
custom labels: Custom labels configured in the Bifrost configuration

Upstream Provider Metrics

These metrics track requests forwarded to AI providers:

Metric	Type	Description	Labels
`bifrost_upstream_requests_total`	Counter	Total requests forwarded to upstream providers	Base Labels, custom labels
`bifrost_success_requests_total`	Counter	Total successful requests to upstream providers	Base Labels, custom labels
`bifrost_error_requests_total`	Counter	Total failed requests to upstream providers	Base Labels, `status_code`, custom labels
`bifrost_upstream_latency_seconds`	Histogram	Latency of upstream provider requests	Base Labels, `is_success`, custom labels
`bifrost_input_tokens_total`	Counter	Total input tokens sent to upstream providers	Base Labels, custom labels
`bifrost_output_tokens_total`	Counter	Total output tokens received from upstream providers	Base Labels, custom labels
`bifrost_cache_hits_total`	Counter	Total cache hits by type (direct/semantic)	Base Labels, `cache_type`, custom labels
`bifrost_cost_total`	Counter	Total cost in USD for upstream provider requests	Base Labels, custom labels

Base Labels:

provider: AI provider name (e.g., openai, anthropic, azure)
model: Model name (e.g., gpt-4o-mini, claude-3-sonnet)
method: Request type (chat, text, embedding, speech, transcription)
virtual_key_id: Virtual key ID
virtual_key_name: Virtual key name
routing_engines_used: Comma-separated routing engines used (“routing-rule”, “governance”, “loadbalancing”, “model-catalog”)
routing_rule_id: Routing rule ID that matched the request
routing_rule_name: Routing rule name that matched the request
selected_key_id: ID of the key that successfully served the request (null on final errors)
selected_key_name: Name of the key that successfully served the request (null on final errors)
number_of_retries: Number of retries
fallback_index: Fallback index (0 for first attempt, 1 for second attempt, etc.)
custom labels: Custom labels configured in the Bifrost configuration

Streaming Metrics

These metrics capture latency characteristics specific to streaming responses:

Metric	Type	Description	Labels
`bifrost_stream_first_token_latency_seconds`	Histogram	Time from request start to first streamed token	Base Labels
`bifrost_stream_inter_token_latency_seconds`	Histogram	Latency between subsequent streamed tokens	Base Labels

Monitoring Examples

Success Rate Monitoring

Track the success rate of requests to different providers:

# Success rate by provider
rate(bifrost_success_requests_total[5m]) /
rate(bifrost_upstream_requests_total[5m]) * 100

Token Usage Analysis

Monitor token consumption across different models:

# Input tokens per minute by model
increase(bifrost_input_tokens_total[1m])

# Output tokens per minute by model
increase(bifrost_output_tokens_total[1m])

# Token efficiency (output/input ratio)
rate(bifrost_output_tokens_total[5m]) /
rate(bifrost_input_tokens_total[5m])

Cost Tracking

Monitor spending across providers and models:

# Cost per second by provider
sum by (provider) (rate(bifrost_cost_total[1m]))

# Daily cost estimate
sum by (provider) (increase(bifrost_cost_total[1d]))

# Cost per request by provider and model
sum by (provider, model) (rate(bifrost_cost_total[5m])) /
sum by (provider, model) (rate(bifrost_upstream_requests_total[5m]))

Cache Performance

Track cache effectiveness:

# Cache hit rate by type
rate(bifrost_cache_hits_total[5m]) /
rate(bifrost_upstream_requests_total[5m]) * 100

# Direct vs semantic cache hits
sum by (cache_type) (rate(bifrost_cache_hits_total[5m]))

Error Rate Analysis

Monitor error patterns:

# Error rate by provider
rate(bifrost_error_requests_total[5m]) /
rate(bifrost_upstream_requests_total[5m]) * 100

# Errors by model
sum by (model) (rate(bifrost_error_requests_total[5m]))

Configuration

Configure custom Prometheus labels to add dimensions for filtering and analysis:

Web UI
API
config.json

Navigate to Configuration
- Open Bifrost UI at http://localhost:8080
- Go to Config tab

Prometheus Labels

Custom Labels: team, environment, organization, project

# Update prometheus labels via API
curl -X PATCH http://localhost:8080/config \
  -H "Content-Type: application/json" \
  -d '{
    "client": {
      "prometheus_labels": ["team", "environment", "organization", "project"]
    }
  }'

{
  "client": {
    "prometheus_labels": ["team", "environment", "organization", "project"],
    "drop_excess_requests": false,
    "initial_pool_size": 300
  }
}

Dynamic Label Injection

Add custom label values at runtime using x-bf-dim-* headers:

# Add custom labels to specific requests
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-bf-dim-team: engineering" \
  -H "x-bf-dim-environment: production" \
  -H "x-bf-dim-organization: my-org" \
  -H "x-bf-dim-project: my-project" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Header Format:

Prefix: x-bf-dim-
Label name: Any string after the prefix, except reserved metric labels like path and method
Value: String value for the label

These runtime dimensions are also forwarded to the other observability backends. The same x-bf-dim-* values appear in internal logs, OpenTelemetry span attributes, and Maxim tags.

Legacy x-bf-prom-* headers still work for Prometheus-only behavior, but they are deprecated. When both prefixes provide the same label, x-bf-dim-* wins.

Infrastructure Setup

Development & Testing

For local development and testing, use the provided Docker Compose setup:

# Navigate to telemetry plugin directory
cd plugins/telemetry

# Start Prometheus and Grafana
docker-compose up -d

# Access endpoints
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/admin)
# Bifrost metrics: http://localhost:8080/metrics

Development Only: The provided Docker Compose setup is for testing purposes only. Do not use in production without proper security, scaling, and persistence configuration.

You can use the Prometheus scraping endpoint to create your own Grafana dashboards. Given below are few examples created using the Docker Compose setup.

Production Deployment

For production environments:

Deploy Prometheus with proper persistence, retention, and security
Configure scraping to target your Bifrost instances at /metrics
Set up Grafana with authentication and dashboards
Configure alerts based on your SLA requirements

Prometheus Scrape Configuration:

scrape_configs:
  - job_name: "bifrost-gateway"
    static_configs:
      - targets: ["bifrost-instance-1:8080", "bifrost-instance-2:8080"]
    scrape_interval: 30s
    metrics_path: /metrics
    # If Bifrost auth is enabled, add:
    # basic_auth:
    #   username: '<admin_username>'
    #   password: '<admin_password>'

If you have Bifrost authentication enabled (auth_config), you must include basic_auth in the scrape config with your admin_username and admin_password. See the Prometheus docs for details.

Production Alerting Examples

Configure alerts for critical scenarios using the new metrics: High Error Rate Alert:

- alert: BifrostHighErrorRate
  expr: sum by (provider) (rate(bifrost_error_requests_total[5m])) / sum by (provider) (rate(bifrost_upstream_requests_total[5m])) > 0.05
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "High error rate detected for provider {{ $labels.provider }} ({{ $value | humanizePercentage }})"

High Cost Alert:

- alert: BifrostHighCosts
  expr: sum by (provider) (increase(bifrost_cost_total[1d])) > 100 # $100/day threshold
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: 'Daily cost for provider {{ $labels.provider }} exceeds $100 ({{ $value | printf "%.2f" }})'

Cache Performance Alert:

- alert: BifrostLowCacheHitRate
  expr: sum by (provider) (rate(bifrost_cache_hits_total[15m])) / sum by (provider) (rate(bifrost_upstream_requests_total[15m])) < 0.1
  for: 5m
  labels:
    severity: info
  annotations:
    summary: "Cache hit rate for provider {{ $labels.provider }} below 10% ({{ $value | humanizePercentage }})"

Next Steps

Prometheus Documentation - Official Prometheus guides
Grafana Setup - Dashboard creation and management
Tracing - Request/response logging for detailed analysis

Overview

Quick Start

Migration Guides

SDK Integrations

Providers & Guides

MCP Gateway

Custom plugins

Open Source Features

Overview

Default Metrics

HTTP Transport Metrics

Upstream Provider Metrics

Streaming Metrics

Monitoring Examples

Success Rate Monitoring

Token Usage Analysis

Cost Tracking

Cache Performance

Error Rate Analysis

Configuration

Dynamic Label Injection

Infrastructure Setup

Development & Testing

Production Deployment

Production Alerting Examples

Next Steps

Overview

Quick Start

Migration Guides

SDK Integrations

Providers & Guides

MCP Gateway

Custom plugins

Open Source Features

Documentation Index

​Overview

​Default Metrics

​HTTP Transport Metrics

​Upstream Provider Metrics

​Streaming Metrics

​Monitoring Examples

​Success Rate Monitoring

​Token Usage Analysis

​Cost Tracking

​Cache Performance

​Error Rate Analysis

​Configuration

​Dynamic Label Injection

​Infrastructure Setup

​Development & Testing

​Production Deployment

​Production Alerting Examples

​Next Steps

Overview

Default Metrics

HTTP Transport Metrics

Upstream Provider Metrics

Streaming Metrics

Monitoring Examples

Success Rate Monitoring

Token Usage Analysis

Cost Tracking

Cache Performance

Error Rate Analysis

Configuration

Dynamic Label Injection

Infrastructure Setup

Development & Testing

Production Deployment

Production Alerting Examples

Next Steps