Skip to main content
This guide covers performance tuning for Bifrost when running in Docker containers. Proper tuning ensures Bifrost can fully utilize container resources and achieve optimal throughput.
These optimizations apply to Docker, Docker Compose, Kubernetes, and any container runtime using cgroups for resource management.

Quick Start

For most production deployments, add these settings to your container:
services:
  bifrost:
    image: maximhq/bifrost:latest
    environment:
      - GOGC=200
      - GOMEMLIMIT=3600MiB  # 90% of 4GB memory limit
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4G

Go Runtime Tuning

GOMAXPROCS (Automatic)

Bifrost automatically detects container CPU limits using automaxprocs. This sets GOMAXPROCS to match your container’s CPU quota from cgroups (v1 and v2). No configuration needed — this works automatically. You’ll see a log line at startup:
maxprocs: Updating GOMAXPROCS=4: determined from CPU quota
Without automaxprocs, Go would detect all host CPUs (e.g., 64 on an EC2 instance) even when the container is limited to 4 CPUs, causing excessive context switching and degraded performance.

GOGC (Garbage Collection)

GOGC controls garbage collection frequency. The default is 100 (GC triggers when heap grows 100% since last collection).
ScenarioRecommended GOGCTrade-off
Memory constrained50-100More frequent GC, lower memory
High throughput, memory available200-400Less GC overhead, higher memory
Latency sensitive50-100More predictable latency
environment:
  - GOGC=200
For high-throughput API gateways, GOGC=200 or GOGC=400 typically provides the best balance of throughput and memory usage.

GOMEMLIMIT (Memory Limit)

GOMEMLIMIT sets a soft memory limit for the Go runtime. When approaching this limit, Go becomes more aggressive about garbage collection. Best practice: Set to ~90% of your container’s memory limit to leave headroom for non-heap memory (goroutine stacks, CGO, etc.).
Container MemoryRecommended GOMEMLIMIT
512 MB450MiB
1 GB900MiB
2 GB1800MiB
4 GB3600MiB
8 GB7200MiB
environment:
  - GOMEMLIMIT=3600MiB
When using both GOGC and GOMEMLIMIT, Go GCs based on whichever trigger fires first. For high-throughput workloads, set GOGC=200 or higher and let GOMEMLIMIT be the primary constraint.

System Limits

File Descriptor Limits (ulimits)

Each HTTP connection requires a file descriptor. The default container limit (often 1024) is too low for high-concurrency workloads.
ulimits:
  nofile:
    soft: 65536
    hard: 65536
Expected Concurrent ConnectionsRecommended nofile
< 10004096
1000-500016384
5000-1000032768
> 1000065536+
If you see errors like too many open files or connections being refused under load, increase your nofile limit.

Resource Limits

Set CPU and memory limits to match your expected workload:
deploy:
  resources:
    limits:
      cpus: '4'
      memory: 4G
    reservations:
      cpus: '2'
      memory: 2G
Sizing guidance:
Expected RPSRecommended CPUsRecommended Memory
100-5001-2512MB-1GB
500-20002-41-2GB
2000-50004-82-4GB
5000+8+4GB+

Docker Compose Examples

Development

services:
  bifrost:
    image: maximhq/bifrost:latest
    ports:
      - "8080:8080"
    volumes:
      - ./data:/app/data
    environment:
      - LOG_LEVEL=debug

Production (Single Node)

services:
  bifrost:
    image: maximhq/bifrost:latest
    ports:
      - "8080:8080"
    volumes:
      - bifrost-data:/app/data
    environment:
      - LOG_LEVEL=info
      - LOG_STYLE=json
      - GOGC=200
      - GOMEMLIMIT=3600MiB
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4G
        reservations:
          cpus: '2'
          memory: 2G
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "-O", "/dev/null", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped

volumes:
  bifrost-data:

Production (Multi-Node with PostgreSQL)

services:
  bifrost-1:
    image: maximhq/bifrost:latest
    ports:
      - "8081:8080"
    environment:
      - LOG_LEVEL=info
      - GOGC=200
      - GOMEMLIMIT=1800MiB
      - BIFROST_DB_TYPE=postgres
      - BIFROST_DB_DSN=postgres://user:pass@postgres:5432/bifrost?sslmode=disable
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
    depends_on:
      - postgres

  bifrost-2:
    image: maximhq/bifrost:latest
    ports:
      - "8082:8080"
    environment:
      - LOG_LEVEL=info
      - GOGC=200
      - GOMEMLIMIT=1800MiB
      - BIFROST_DB_TYPE=postgres
      - BIFROST_DB_DSN=postgres://user:pass@postgres:5432/bifrost?sslmode=disable
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
    depends_on:
      - postgres

  postgres:
    image: postgres:16-alpine
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
      - POSTGRES_DB=bifrost
    volumes:
      - postgres-data:/var/lib/postgresql/data

volumes:
  postgres-data:

Kubernetes Configuration

Basic Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bifrost
spec:
  replicas: 3
  selector:
    matchLabels:
      app: bifrost
  template:
    metadata:
      labels:
        app: bifrost
    spec:
      containers:
        - name: bifrost
          image: maximhq/bifrost:latest
          ports:
            - containerPort: 8080
          env:
            - name: GOGC
              value: "200"
            - name: GOMEMLIMIT
              value: "3600MiB"
          resources:
            limits:
              cpu: "4"
              memory: "4Gi"
            requests:
              cpu: "2"
              memory: "2Gi"
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5

File Descriptor Limits in Kubernetes

File descriptor limits in Kubernetes are typically set at the node level. Options include:
  1. Node-level configuration (recommended): Set fs.file-max and ulimits in your node configuration
  2. Init container: Use an init container with elevated privileges to set limits
  3. Security context: Some clusters allow setting capabilities
securityContext:
  capabilities:
    add: ["SYS_RESOURCE"]
Check your current limits inside a container with: cat /proc/sys/fs/file-max and ulimit -n

Bifrost Application Settings

Align Bifrost’s internal settings with your container resources:

Concurrency and Buffer Size

Configure per provider in config.json:
{
  "providers": {
    "openai": {
      "concurrency_and_buffer_size": {
        "concurrency": 1000,
        "buffer_size": 1500
      }
    }
  }
}
Formula:
  • concurrency = expected RPS per provider
  • buffer_size = 1.5 × concurrency

Initial Pool Size

Configure globally in config.json:
{
  "client": {
    "initial_pool_size": 3000
  }
}
Formula: initial_pool_size = 1.5 × total expected RPS across all providers
See the Performance Tuning guide for detailed sizing recommendations.

Tuning Checklist

1

Set container resource limits

Define CPU and memory limits based on expected workload. Start with 2 CPUs / 2GB for moderate loads.
2

Configure GOMEMLIMIT

Set to 90% of container memory limit (e.g., 1800MiB for 2GB container).
3

Tune GOGC

Start with GOGC=200 for throughput; reduce to 100 if memory pressure is high.
4

Set file descriptor limits

Set nofile ulimit to at least 2× your expected concurrent connections.
5

Align Bifrost settings

Match concurrency and buffer_size to your container’s CPU count and expected RPS.
6

Monitor and adjust

Watch memory usage, GC pause times, and request latencies. Adjust settings based on observed behavior.

Troubleshooting

High Memory Usage

  • Reduce GOGC (e.g., from 200 to 100)
  • Ensure GOMEMLIMIT is set
  • Reduce buffer_size and initial_pool_size

High Latency Spikes

  • May indicate GC pauses; try reducing GOGC
  • Check if container is hitting CPU limits
  • Verify GOMAXPROCS matches container CPU quota (check startup logs)

Connection Errors Under Load

  • Increase nofile ulimit
  • Ensure buffer_size is large enough for traffic spikes
  • Check provider rate limits

Container OOM Killed

  • Reduce GOMEMLIMIT to 85% of container memory
  • Reduce GOGC to trigger more frequent GC
  • Reduce buffer_size and initial_pool_size