Docker Performance Tuning

This guide covers performance tuning for Bifrost when running in Docker containers. Proper tuning ensures Bifrost can fully utilize container resources and achieve optimal throughput.

These optimizations apply to Docker, Docker Compose, Kubernetes, and any container runtime using cgroups for resource management.

Quick Start

For most production deployments, add these settings to your container:

services:
  bifrost:
    image: maximhq/bifrost:latest
    environment:
      - GOGC=200
      - GOMEMLIMIT=3600MiB  # 90% of 4GB memory limit
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4G

Go Runtime Tuning

GOMAXPROCS (Automatic)

Bifrost automatically detects container CPU limits using automaxprocs. This sets GOMAXPROCS to match your container’s CPU quota from cgroups (v1 and v2). No configuration needed - this works automatically. You’ll see a log line at startup:

maxprocs: Updating GOMAXPROCS=4: determined from CPU quota

Without automaxprocs, Go would detect all host CPUs (e.g., 64 on an EC2 instance) even when the container is limited to 4 CPUs, causing excessive context switching and degraded performance.

GOGC (Garbage Collection)

GOGC controls garbage collection frequency. The default is 100 (GC triggers when heap grows 100% since last collection).

Scenario	Recommended GOGC	Trade-off
Memory constrained	50-100	More frequent GC, lower memory
High throughput, memory available	200-400	Less GC overhead, higher memory
Latency sensitive	50-100	More predictable latency

environment:
  - GOGC=200

For high-throughput API gateways, GOGC=200 or GOGC=400 typically provides the best balance of throughput and memory usage.

GOMEMLIMIT (Memory Limit)

GOMEMLIMIT sets a soft memory limit for the Go runtime. When approaching this limit, Go becomes more aggressive about garbage collection. Best practice: Set to ~90% of your container’s memory limit to leave headroom for non-heap memory (goroutine stacks, CGO, etc.).

Container Memory	Recommended GOMEMLIMIT
512 MB	450MiB
1 GB	900MiB
2 GB	1800MiB
4 GB	3600MiB
8 GB	7200MiB

environment:
  - GOMEMLIMIT=3600MiB

When using both GOGC and GOMEMLIMIT, Go GCs based on whichever trigger fires first. For high-throughput workloads, set GOGC=200 or higher and let GOMEMLIMIT be the primary constraint.

System Limits

File Descriptor Limits (ulimits)

Each HTTP connection requires a file descriptor. The default container limit (often 1024) is too low for high-concurrency workloads.

ulimits:
  nofile:
    soft: 65536
    hard: 65536

Expected Concurrent Connections	Recommended nofile
< 1000	4096
1000-5000	16384
5000-10000	32768
> 10000	65536+

If you see errors like too many open files or connections being refused under load, increase your nofile limit.

Resource Limits

Set CPU and memory limits to match your expected workload:

deploy:
  resources:
    limits:
      cpus: '4'
      memory: 4G
    reservations:
      cpus: '2'
      memory: 2G

Sizing guidance:

Expected RPS	Recommended CPUs	Recommended Memory
100-500	1-2	512MB-1GB
500-2000	2-4	1-2GB
2000-5000	4-8	2-4GB
5000+	8+	4GB+

Docker Compose Examples

Development

services:
  bifrost:
    image: maximhq/bifrost:latest
    ports:
      - "8080:8080"
    volumes:
      - ./data:/app/data
    environment:
      - LOG_LEVEL=debug

Production (Single Node)

services:
  bifrost:
    image: maximhq/bifrost:latest
    ports:
      - "8080:8080"
    volumes:
      - bifrost-data:/app/data
    environment:
      - LOG_LEVEL=info
      - LOG_STYLE=json
      - GOGC=200
      - GOMEMLIMIT=3600MiB
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4G
        reservations:
          cpus: '2'
          memory: 2G
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "-O", "/dev/null", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped

volumes:
  bifrost-data:

Production (Multi-Node with PostgreSQL)

If you use PostgreSQL for Bifrost storage, ensure the database is UTF8 encoded. See PostgreSQL UTF8 Requirement.

services:
  bifrost-1:
    image: maximhq/bifrost:latest
    ports:
      - "8081:8080"
    environment:
      - LOG_LEVEL=info
      - GOGC=200
      - GOMEMLIMIT=1800MiB
      - BIFROST_DB_TYPE=postgres
      - BIFROST_DB_DSN=postgres://user:pass@postgres:5432/bifrost?sslmode=disable
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
    depends_on:
      - postgres

  bifrost-2:
    image: maximhq/bifrost:latest
    ports:
      - "8082:8080"
    environment:
      - LOG_LEVEL=info
      - GOGC=200
      - GOMEMLIMIT=1800MiB
      - BIFROST_DB_TYPE=postgres
      - BIFROST_DB_DSN=postgres://user:pass@postgres:5432/bifrost?sslmode=disable
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
    depends_on:
      - postgres

  postgres:
    image: postgres:16-alpine
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
      - POSTGRES_DB=bifrost
    volumes:
      - postgres-data:/var/lib/postgresql/data

volumes:
  postgres-data:

Kubernetes Configuration

Basic Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bifrost
spec:
  replicas: 3
  selector:
    matchLabels:
      app: bifrost
  template:
    metadata:
      labels:
        app: bifrost
    spec:
      containers:
        - name: bifrost
          image: maximhq/bifrost:latest
          ports:
            - containerPort: 8080
          env:
            - name: GOGC
              value: "200"
            - name: GOMEMLIMIT
              value: "3600MiB"
          resources:
            limits:
              cpu: "4"
              memory: "4Gi"
            requests:
              cpu: "2"
              memory: "2Gi"
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5

File Descriptor Limits in Kubernetes

File descriptor limits in Kubernetes are typically set at the node level. Options include:

Node-level configuration (recommended): Set fs.file-max and ulimits in your node configuration
Init container: Use an init container with elevated privileges to set limits
Security context: Some clusters allow setting capabilities

securityContext:
  capabilities:
    add: ["SYS_RESOURCE"]

Check your current limits inside a container with: cat /proc/sys/fs/file-max and ulimit -n

Bifrost Application Settings

Align Bifrost’s internal settings with your container resources:

Concurrency and Buffer Size

Configure per provider in config.json:

{
  "providers": {
    "openai": {
      "concurrency_and_buffer_size": {
        "concurrency": 1000,
        "buffer_size": 1500
      }
    }
  }
}

Formula:

concurrency = expected RPS per provider
buffer_size = 1.5 × concurrency

Initial Pool Size

Configure globally in config.json:

{
  "client": {
    "initial_pool_size": 3000
  }
}

Formula: initial_pool_size = 1.5 × total expected RPS across all providers

See the Performance Tuning guide for detailed sizing recommendations.

Tuning Checklist

Set container resource limits

Define CPU and memory limits based on expected workload. Start with 2 CPUs / 2GB for moderate loads.

Configure GOMEMLIMIT

Set to 90% of container memory limit (e.g., 1800MiB for 2GB container).

Tune GOGC

Start with GOGC=200 for throughput; reduce to 100 if memory pressure is high.

Set file descriptor limits

Set nofile ulimit to at least 2× your expected concurrent connections.

Align Bifrost settings

Match concurrency and buffer_size to your container’s CPU count and expected RPS.

Monitor and adjust

Watch memory usage, GC pause times, and request latencies. Adjust settings based on observed behavior.

Troubleshooting

High Memory Usage

Reduce GOGC (e.g., from 200 to 100)
Ensure GOMEMLIMIT is set
Reduce buffer_size and initial_pool_size

High Latency Spikes

May indicate GC pauses; try reducing GOGC
Check if container is hitting CPU limits
Verify GOMAXPROCS matches container CPU quota (check startup logs)

Connection Errors Under Load

Increase nofile ulimit
Ensure buffer_size is large enough for traffic spikes
Check provider rate limits

Container OOM Killed

Reduce GOMEMLIMIT to 85% of container memory
Reduce GOGC to trigger more frequent GC
Reduce buffer_size and initial_pool_size

Performance Tuning - Bifrost-specific performance configuration
Helm Deployment - Kubernetes deployment with Helm
Multi-Node Setup - Scaling across multiple instances

​Quick Start

​Go Runtime Tuning

​GOMAXPROCS (Automatic)

​GOGC (Garbage Collection)

​GOMEMLIMIT (Memory Limit)

​System Limits

​File Descriptor Limits (ulimits)

​Resource Limits

​Docker Compose Examples

​Development

​Production (Single Node)

​Production (Multi-Node with PostgreSQL)

​Kubernetes Configuration

​Basic Deployment

​File Descriptor Limits in Kubernetes

​Bifrost Application Settings

​Concurrency and Buffer Size

​Initial Pool Size

​Tuning Checklist

​Troubleshooting

​High Memory Usage

​High Latency Spikes

​Connection Errors Under Load

​Container OOM Killed

​Related Documentation

Quick Start

Go Runtime Tuning

GOMAXPROCS (Automatic)

GOGC (Garbage Collection)

GOMEMLIMIT (Memory Limit)

System Limits

File Descriptor Limits (ulimits)

Resource Limits

Docker Compose Examples

Development

Production (Single Node)

Production (Multi-Node with PostgreSQL)

Kubernetes Configuration

Basic Deployment

File Descriptor Limits in Kubernetes

Bifrost Application Settings

Concurrency and Buffer Size

Initial Pool Size

Tuning Checklist

Troubleshooting

High Memory Usage

High Latency Spikes

Connection Errors Under Load

Container OOM Killed

Related Documentation