Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getbifrost.ai/llms.txt

Use this file to discover all available pages before exploring further.

This page covers the most common problems encountered when deploying Bifrost with Helm, along with diagnostic commands and fixes.

Pod Not Starting

Quick diagnostics

# Show pod status
kubectl get pods -l app.kubernetes.io/name=bifrost

# Show pod events (most useful first step)
kubectl describe pod -l app.kubernetes.io/name=bifrost

# Show pod logs (use --previous if the pod has already crashed)
kubectl logs -l app.kubernetes.io/name=bifrost
kubectl logs -l app.kubernetes.io/name=bifrost --previous

Image pull errors (ErrImagePull / ImagePullBackOff)

# Check which image is being pulled
kubectl describe pod -l app.kubernetes.io/name=bifrost | grep "Image:"

# Verify imagePullSecrets are attached
kubectl get pod -l app.kubernetes.io/name=bifrost -o jsonpath='{.items[0].spec.imagePullSecrets}'

# Test secret manually
kubectl get secret <pull-secret-name> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq .
Common causes:
  • image.tag not set - the chart requires it; the pod will not start without it
  • Pull secret missing or expired (ECR tokens expire after 12 hours)
  • Incorrect image.repository for enterprise registry
# Fix: set the correct tag
helm upgrade bifrost bifrost/bifrost --reuse-values --set image.tag=v1.4.11

PVC not binding (Pending)

# Check PVC status
kubectl get pvc -l app.kubernetes.io/instance=bifrost

# Show binding events
kubectl describe pvc -l app.kubernetes.io/instance=bifrost
Common causes:
  • No Persistent Volume provisioner in the cluster
  • storageClass set to a class that doesn’t exist
  • ReadWriteOnce access mode with multiple replicas (SQLite PVCs are single-node)
# List available storage classes
kubectl get storageclass

# Fix: pin to a valid storage class
helm upgrade bifrost bifrost/bifrost \
  --reuse-values \
  --set storage.persistence.storageClass=standard

ConfigMap / Secret errors

# View the generated ConfigMap (contains rendered config.json)
kubectl get configmap bifrost-config -o yaml

# View secrets the pod depends on
kubectl get secret -l app.kubernetes.io/instance=bifrost

# Decode a specific secret value
kubectl get secret bifrost-encryption -o jsonpath='{.data.key}' | base64 -d

CrashLoopBackOff

# Get last log lines before the crash
kubectl logs -l app.kubernetes.io/name=bifrost --previous --tail=50

# Common causes shown in logs:
# "encryption key is not initialized" → no key provided; optional, but data will be stored in plaintext
# "failed to connect to database" → see Database section below
# "image.tag is required" → set image.tag in values

Database Connection Issues

Embedded PostgreSQL

# Check if the PostgreSQL pod is running
kubectl get pods -l app.kubernetes.io/name=bifrost-postgresql

# Connect directly to inspect the database
kubectl exec -it deployment/bifrost-postgresql -- psql -U bifrost -d bifrost

# Test connectivity from the Bifrost pod
kubectl exec -it deployment/bifrost -- nc -zv bifrost-postgresql 5432

# Check PostgreSQL logs
kubectl logs deployment/bifrost-postgresql --tail=50

External PostgreSQL

# Test connectivity from within the cluster
kubectl run pg-test --image=postgres:16-alpine --rm -it --restart=Never -- \
  psql "host=your-db-host dbname=bifrost user=bifrost sslmode=require"

# Verify the secret value is correct
kubectl get secret postgres-credentials -o jsonpath='{.data.password}' | base64 -d

# Check that the external host/port is reachable
kubectl exec -it deployment/bifrost -- nc -zv your-db-host 5432
Common causes:
  • sslMode: disable when the database requires SSL - set sslMode: require
  • Password in secret doesn’t match the database user
  • Network policy blocking pod → database traffic
  • Database not UTF8 encoded (see PostgreSQL UTF8 Requirement)
# Fix: update the secret and restart
kubectl create secret generic postgres-credentials \
  --from-literal=password='correct-password' \
  --dry-run=client -o yaml | kubectl apply -f -

kubectl rollout restart deployment/bifrost

Ingress Not Working

# Check ingress resource status
kubectl describe ingress bifrost

# Check if the ingress controller is running
kubectl get pods -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx

# View ingress controller logs for routing errors
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=50

# Verify DNS resolves to the correct load balancer IP
nslookup bifrost.yourdomain.com
kubectl get ingress bifrost -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

# Test without TLS first
curl -v http://bifrost.yourdomain.com/health
Common causes:
  • ingress.className not set or set to a class not installed in the cluster
  • TLS certificate not issued yet (cert-manager can take up to 60 seconds)
  • Service port mismatch - Bifrost listens on 8080 by default
# Check cert-manager certificate status
kubectl get certificate -l app.kubernetes.io/instance=bifrost
kubectl describe certificate bifrost-tls

Secret and Credential Issues

Provider API key not resolving

If Bifrost logs show env.OPENAI_API_KEY: not set or similar:
# Check the env var is present in the running pod
kubectl exec -it deployment/bifrost -- env | grep OPENAI

# Verify the providerSecrets secret exists with the right key
kubectl get secret provider-api-keys -o yaml

# Check the providerSecrets configuration rendered correctly
kubectl get configmap bifrost-config -o yaml | grep -A5 providers

Encryption key issues

# Verify the secret exists and contains the right key name
kubectl get secret bifrost-encryption -o yaml

# Check the exact key name matches encryptionKeySecret.key in values
# Default key name is "encryption-key" - if you used "key", set:
#   bifrost.encryptionKeySecret.key: "key"

High Memory Usage

# Check current resource usage
kubectl top pods -l app.kubernetes.io/name=bifrost

# Check if OOM kills are happening
kubectl describe pod -l app.kubernetes.io/name=bifrost | grep -A3 "OOMKilled\|Limits"

# View resource requests/limits on running pods
kubectl get pod -l app.kubernetes.io/name=bifrost \
  -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].resources}{"\n"}{end}'
Increase resource limits:
helm upgrade bifrost bifrost/bifrost \
  --reuse-values \
  --set resources.limits.memory=4Gi \
  --set resources.requests.memory=1Gi
Tune Go runtime (see Docker Tuning):
env:
  - name: GOGC
    value: "200"          # run GC less often
  - name: GOMEMLIMIT
    value: "3500MiB"      # hard memory ceiling slightly below the container limit

High CPU Usage / Latency

# Check CPU usage
kubectl top pods -l app.kubernetes.io/name=bifrost

# Check if HPA is scaling correctly
kubectl get hpa bifrost
kubectl describe hpa bifrost
Common causes:
  • initialPoolSize too small - goroutines queuing up; increase to 5001000
  • dropExcessRequests: false with a small pool - queue depth growing unboundedly
helm upgrade bifrost bifrost/bifrost \
  --reuse-values \
  --set bifrost.client.initialPoolSize=1000 \
  --set bifrost.client.dropExcessRequests=true

Autoscaling Issues

HPA not scaling

# Check HPA status and current metrics
kubectl describe hpa bifrost

# Verify metrics server is installed
kubectl top nodes
kubectl top pods

# Common fix: metrics server not installed
# Install with:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Pods scaling down too aggressively (drops active SSE streams)

The default scaleDown.stabilizationWindowSeconds: 300 and preStop sleep of 15 seconds should prevent this. If streams are still being cut:
terminationGracePeriodSeconds: 120   # increase if streams run longer than 105s

autoscaling:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 600  # wait 10 min before scaling down
      policies:
        - type: Pods
          value: 1
          periodSeconds: 300           # remove at most 1 pod per 5 min

lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 30"]  # give load balancer more time to drain
helm upgrade bifrost bifrost/bifrost --reuse-values -f graceful-shutdown-values.yaml

SQLite / PVC Issues

StatefulSet migration (upgrading from chart < v2.0.0)

Older chart versions used a Deployment + manual PVC. v2.0.0 moved SQLite to a StatefulSet. If upgrading:
# 1. Scale down the old deployment
kubectl scale deployment bifrost --replicas=0

# 2. Note the existing PVC name
kubectl get pvc

# 3. Upgrade, pointing at the existing claim
helm upgrade bifrost bifrost/bifrost \
  --reuse-values \
  --set storage.persistence.existingClaim=<your-old-pvc-name> \
  --set image.tag=v1.4.11

Data lost after upgrade

# Check if PVCs still exist (they persist after helm uninstall)
kubectl get pvc -l app.kubernetes.io/instance=bifrost

# Re-attach by setting existingClaim
helm upgrade bifrost bifrost/bifrost \
  --reuse-values \
  --set storage.persistence.existingClaim=<pvc-name>

Cluster Mode Issues

Peers not discovering each other

# Check gossip port is reachable between pods
kubectl exec -it bifrost-0 -- nc -zv bifrost-1.bifrost-headless 7946

# View gossip-related log lines
kubectl logs -l app.kubernetes.io/name=bifrost --tail=100 | grep -i gossip

# Check the headless service exists
kubectl get svc bifrost-headless
For Kubernetes-based discovery, verify the service account has pod list permissions:
kubectl auth can-i list pods --as=system:serviceaccount:default:bifrost

Useful Diagnostic Commands

# Full state dump for a support ticket
kubectl get all -l app.kubernetes.io/instance=bifrost
kubectl describe pod -l app.kubernetes.io/name=bifrost > pod-describe.txt
kubectl logs -l app.kubernetes.io/name=bifrost --tail=200 > pod-logs.txt

# View the full rendered config.json
kubectl get configmap bifrost-config -o jsonpath='{.data.config\.json}' | jq .

# Check current Helm values (shows all overrides)
helm get values bifrost

# Check Helm release status
helm status bifrost

# View Helm release history
helm history bifrost

Still Stuck?