Skip to main content

Overview

Adaptive Load Balancing in Bifrost automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics. The system continuously monitors error rates, latency, and throughput to dynamically adjust weights, ensuring optimal performance and reliability.

Key Features

FeatureDescription
Dynamic Weight AdjustmentAutomatically adjusts key weights based on performance metrics
Real-time Performance MonitoringTracks error rates, latency, and success rates per model-key combination
Cross-Node SynchronizationGossip protocol ensures consistent weight information across all cluster nodes
Predictive ScalingAnticipates traffic patterns and adjusts weights proactively
Circuit Breaker IntegrationTemporarily removes poorly performing keys from rotation
Model-Level OptimizationOptimizes performance at both provider and individual model levels

How Adaptive Load Balancing Works

Performance Metrics Collection

The system continuously collects performance data for each model-key combination:
{
  "provider": "openai",
  "model_key_id": "gpt-4-key-1",
  "metrics": {
    "avg_latency_ms": 1200,
    "error_rate": 0.01,
    "success_rate": 0.99,
    "requests_per_minute": 362,
    "tokens_processed": 87500,
    "current_weight": 0.8,
    "baseline_latency_ms": 980,
    "performance_score": 0.85
  }
}

Weight Adjustment Algorithm

The adaptive load balancer automatically adjusts weights based on real-time performance metrics:
  • High Error Rates: Reduces weight for keys with elevated error rates
  • Latency Spikes: Decreases weight when response times exceed baseline thresholds
  • Superior Performance: Increases weight for consistently high-performing keys
  • Gradual Adjustments: Makes incremental changes to prevent traffic oscillation

Real-Time Weight Synchronization

In clustered deployments, weight adjustments are synchronized across all nodes using the gossip protocol:

Weight Update Message Format

{
  "version": 1,
  "type": "weight_update",
  "node_id": "bifrost-node-b",
  "timestamp": "2024-01-15T10:30:15Z",
  "data": {
    "provider": "openai",
    "model_key_id": "gpt-4-key-2",
    "weight_change": {
      "from": 0.8,
      "to": 0.6,
      "reason": "high_error_rate",
      "threshold_exceeded": 0.025,
      "adjustment_factor": 0.75
    },
    "performance_metrics": {
      "avg_latency_ms": 1450,
      "baseline_latency_ms": 1100,
      "error_rate": 0.03,
      "success_rate": 0.97,
      "requests_count": 150,
      "performance_score": 0.72
    },
    "next_evaluation": "2024-01-15T10:31:15Z"
  }
}

Performance Monitoring & Alerting

Key Performance Indicators

The system tracks these critical metrics for each model-key combination:
MetricThresholdAction
Error Rate> 2.5%Reduce weight by 30%
Latency Spike> 150% baselineReduce weight by 20%
Success Rate< 95%Circuit breaker activation
Response Time> 5000msTemporary removal from pool
Throughput Drop< 50% expectedWeight adjustment

Automatic Performance Alerts

{
  "version": 1,
  "type": "performance_alert",
  "node_id": "bifrost-node-c",
  "timestamp": "2024-01-15T10:31:00Z",
  "data": {
    "alert_type": "latency_spike",
    "severity": "warning",
    "provider": "anthropic",
    "model_key_id": "claude-3-key-1",
    "current_metrics": {
      "avg_latency_ms": 2800,
      "baseline_latency_ms": 980,
      "spike_percentage": 185.7,
      "error_rate": 0.008,
      "current_weight": 1.0
    },
    "recommended_action": "reduce_weight",
    "suggested_new_weight": 0.7,
    "auto_applied": true
  }
}

Configuration

Basic Adaptive Load Balancing Setup

{
  "adaptive_load_balancing": {
    "enabled": true,
    "algorithm": "adaptive_weighted",
    "evaluation_interval": "30s",
    "weight_adjustment": {
      "enabled": true,
      "max_change_per_cycle": 0.3,
      "min_weight": 0.1,
      "max_weight": 2.0
    },
    "performance_thresholds": {
      "error_rate_warning": 0.02,
      "error_rate_critical": 0.05,
      "latency_spike_threshold": 1.5,
      "circuit_breaker_threshold": 0.95
    }
  }
}

Advanced Configuration

{
  "adaptive_load_balancing": {
    "enabled": true,
    "algorithm": "adaptive_weighted",
    "evaluation_interval": "30s",
    "weight_adjustment": {
      "enabled": true,
      "strategy": "performance_based",
      "max_change_per_cycle": 0.3,
      "min_weight": 0.1,
      "max_weight": 2.0,
      "adjustment_factors": {
        "error_rate_penalty": 0.7,
        "latency_penalty": 0.8,
        "performance_bonus": 1.1
      }
    },
    "performance_thresholds": {
      "error_rate_warning": 0.02,
      "error_rate_critical": 0.05,
      "latency_spike_threshold": 1.5,
      "latency_critical_threshold": 2.0,
      "circuit_breaker_threshold": 0.95,
      "recovery_threshold": 0.98
    },
    "metrics_collection": {
      "window_size": "5m",
      "sample_rate": "1s",
      "baseline_calculation": "rolling_average_7d"
    },
    "predictive_scaling": {
      "enabled": true,
      "prediction_window": "15m",
      "confidence_threshold": 0.8,
      "proactive_adjustments": true
    }
  }
}

Provider-Specific Configuration

{
  "providers": [
    {
      "id": "openai",
      "keys": [
        {
          "key": "sk-...",
          "model_key_id": "gpt-4-key-1",
          "weight": 1.0,
          "adaptive_balancing": {
            "enabled": true,
            "baseline_latency_ms": 1100,
            "expected_error_rate": 0.01,
            "max_requests_per_minute": 500,
            "priority": "high"
          }
        },
        {
          "key": "sk-...",
          "model_key_id": "gpt-4-key-2",
          "weight": 0.8,
          "adaptive_balancing": {
            "enabled": true,
            "baseline_latency_ms": 1200,
            "expected_error_rate": 0.015,
            "max_requests_per_minute": 400,
            "priority": "medium"
          }
        }
      ]
    }
  ]
}

Traffic Distribution Examples

Before Adaptive Load Balancing

{
  "provider": "openai",
  "traffic_distribution": {
    "gpt-4-key-1": {
      "weight": 1.0,
      "traffic_percentage": 50.0,
      "avg_latency_ms": 1450,
      "error_rate": 0.03,
      "status": "degraded_performance"
    },
    "gpt-4-key-2": {
      "weight": 1.0,
      "traffic_percentage": 50.0,
      "avg_latency_ms": 1100,
      "error_rate": 0.01,
      "status": "healthy"
    }
  }
}

After Adaptive Load Balancing

{
  "provider": "openai",
  "traffic_distribution": {
    "gpt-4-key-1": {
      "weight": 0.6,
      "traffic_percentage": 35.3,
      "avg_latency_ms": 1450,
      "error_rate": 0.03,
      "status": "weight_reduced",
      "adjustment_reason": "high_error_rate_and_latency"
    },
    "gpt-4-key-2": {
      "weight": 1.1,
      "traffic_percentage": 64.7,
      "avg_latency_ms": 1100,
      "error_rate": 0.01,
      "status": "weight_increased",
      "adjustment_reason": "superior_performance"
    }
  },
  "overall_improvement": {
    "avg_latency_reduction": "12.3%",
    "error_rate_reduction": "23.1%",
    "throughput_increase": "8.7%"
  }
}

Monitoring Dashboard

Real-Time Performance View

Monitor adaptive load balancing effectiveness through these key metrics:
{
  "adaptive_load_balancing_metrics": {
    "last_evaluation": "2024-01-15T10:30:00Z",
    "next_evaluation": "2024-01-15T10:30:30Z",
    "total_adjustments_last_hour": 12,
    "performance_improvements": {
      "latency_improvement": "15.2%",
      "error_rate_reduction": "28.4%",
      "throughput_increase": "11.8%"
    },
    "provider_performance": {
      "openai": {
        "total_keys": 3,
        "healthy_keys": 2,
        "degraded_keys": 1,
        "avg_weight": 0.83,
        "traffic_distribution": {
          "gpt-4-key-1": {
            "weight": 0.6,
            "traffic_percentage": 28.5,
            "performance_score": 0.72,
            "trend": "declining"
          },
          "gpt-4-key-2": {
            "weight": 1.1,
            "traffic_percentage": 52.3,
            "performance_score": 0.94,
            "trend": "stable"
          },
          "gpt-4-key-3": {
            "weight": 0.9,
            "traffic_percentage": 19.2,
            "performance_score": 0.87,
            "trend": "improving"
          }
        }
      },
      "anthropic": {
        "total_keys": 2,
        "healthy_keys": 2,
        "degraded_keys": 0,
        "avg_weight": 1.05,
        "traffic_distribution": {
          "claude-3-key-1": {
            "weight": 1.0,
            "traffic_percentage": 48.2,
            "performance_score": 0.91,
            "trend": "stable"
          },
          "claude-3-key-2": {
            "weight": 1.1,
            "traffic_percentage": 51.8,
            "performance_score": 0.95,
            "trend": "improving"
          }
        }
      }
    }
  }
}