Cluster Mode & HA

Cluster mode enables multiple Bifrost replicas to share state - rate limits, budget counters, and governance data - across pods. When bifrost.cluster.enabled is false (the default), each replica operates independently and state is only shared via the database.

Cluster mode requires PostgreSQL as the storage backend. SQLite is single-node only.

bifrost.cluster.* is an enterprise capability. OSS images accept these values but do not run cluster mode at runtime.

When to Use Cluster Mode

Scenario	Recommendation
Single replica	Not needed
Multiple replicas, shared DB only	Optional - DB provides eventual consistency
Multiple replicas with strict per-minute rate limiting	Enable cluster mode - in-memory counters are synced via gossip
Geographic multi-region	Enable cluster mode with DNS or Consul discovery
Serverless platforms without peer-to-peer networking (e.g. Cloud Run)	Use broker mode instead of gossip - see note below

The Helm chart deploys the default mesh clustering, which needs nodes to reach each other directly over gossip (10101) and gRPC (10102). On platforms that do not allow peer-to-peer connectivity - such as Google Cloud Run - use broker mode, where nodes only make an outbound connection to a central relay. See Enterprise Clustering → Broker Mode.

Basic Cluster Setup

# cluster-values.yaml
image:
  tag: "v1.4.11"

replicaCount: 3

storage:
  mode: postgres

postgresql:
  external:
    enabled: true
    host: "your-postgres-host.example.com"
    port: 5432
    user: bifrost
    database: bifrost
    sslMode: require
    existingSecret: "postgres-credentials"
    passwordKey: "password"

bifrost:
  encryptionKeySecret:
    name: "bifrost-encryption"
    key: "encryption-key"

  cluster:
    enabled: true
    gossip:
      port: 10101
      config:
        timeoutSeconds: 10
        successThreshold: 3
        failureThreshold: 3
    grpc:
      port: 10102 # this is the default port if grpc is not mentioned, can be overridden

# Spread replicas across nodes for true HA
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/name: bifrost
        topologyKey: kubernetes.io/hostname

# Conservative scale-down: avoid killing pods mid-stream
autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Pods
          value: 1
          periodSeconds: 120

# Give in-flight SSE streams time to drain
terminationGracePeriodSeconds: 90
lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 20"]

For version 1.4.x - you will need to expose 10102 TCP port and 10101 UDP port for cluster discovery.

kubectl create secret generic postgres-credentials \
  --from-literal=password='your-postgres-password'

kubectl create secret generic bifrost-encryption \
  --from-literal=encryption-key='your-32-byte-encryption-key'

helm install bifrost bifrost/bifrost -f cluster-values.yaml

Peer Discovery

Bifrost uses a gossip protocol (memberlist) for peer-to-peer state sync. Configure how peers find each other:

For consul, etcd, and udp discovery, set bifrost.cluster.discovery.serviceName so nodes register/discover under a stable service identity.

Bifrost queries the Kubernetes API to find other Bifrost pods by label selector. No static peer list needed - works with HPA.

bifrost:
  cluster:
    enabled: true
    discovery:
      enabled: true
      type: kubernetes
      k8sNamespace: "default"           # namespace where Bifrost runs
      k8sLabelSelector: "app.kubernetes.io/name=bifrost"
    gossip:
      port: 7946

The service account needs permission to list pods:

serviceAccount:
  create: true
  annotations: {}

# Create a ClusterRole and binding for pod discovery (apply once)
kubectl apply -f - <<'EOF'
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: bifrost-pod-discovery
  namespace: default
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["list", "get", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: bifrost-pod-discovery
  namespace: default
subjects:
  - kind: ServiceAccount
    name: bifrost
    namespace: default
roleRef:
  kind: Role
  name: bifrost-pod-discovery
  apiGroup: rbac.authorization.k8s.io
EOF

helm install bifrost bifrost/bifrost -f cluster-k8s-discovery-values.yaml

Uses a headless service DNS name to resolve peer IPs. Works well with StatefulSets (predictable pod DNS names).

bifrost:
  cluster:
    enabled: true
    discovery:
      enabled: true
      type: dns
      dnsNames:
        - "bifrost-headless.default.svc.cluster.local"
    gossip:
      port: 7946

The chart automatically creates a headless service (bifrost-headless) when cluster mode is enabled with a StatefulSet. For Deployments, create it manually:

kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Service
metadata:
  name: bifrost-headless
spec:
  clusterIP: None
  selector:
    app.kubernetes.io/name: bifrost
  ports:
    - name: gossip
      port: 7946
      protocol: TCP
EOF

helm install bifrost bifrost/bifrost -f cluster-dns-discovery-values.yaml

Enumerate peer addresses explicitly. Use when discovery mechanisms are unavailable or you want deterministic membership.

bifrost:
  cluster:
    enabled: true
    peers:
      - "bifrost-0.bifrost-headless.default.svc.cluster.local:7946"
      - "bifrost-1.bifrost-headless.default.svc.cluster.local:7946"
      - "bifrost-2.bifrost-headless.default.svc.cluster.local:7946"
    gossip:
      port: 7946

Static peers require StatefulSet pod names to be stable. This approach doesn’t adapt to HPA-driven scaling - use Kubernetes or DNS discovery for dynamic replica counts.

bifrost:
  cluster:
    enabled: true
    discovery:
      enabled: true
      type: consul
      serviceName: "bifrost-cluster"
      consulAddress: "consul.consul.svc.cluster.local:8500"
    gossip:
      port: 7946

helm install bifrost bifrost/bifrost -f cluster-consul-discovery-values.yaml

bifrost:
  cluster:
    enabled: true
    discovery:
      enabled: true
      type: etcd
      serviceName: "bifrost-cluster"
      etcdEndpoints:
        - "http://etcd-0.etcd.default.svc.cluster.local:2379"
        - "http://etcd-1.etcd.default.svc.cluster.local:2379"
        - "http://etcd-2.etcd.default.svc.cluster.local:2379"
    gossip:
      port: 7946

Best for local development or bare-metal clusters where multicast is available.

bifrost:
  cluster:
    enabled: true
    discovery:
      enabled: true
      type: mdns
      mdnsService: "_bifrost._tcp"
    gossip:
      port: 7946

Allowed Address Space

Restrict gossip to a specific subnet (useful in multi-tenant clusters):

bifrost:
  cluster:
    discovery:
      enabled: true
      type: kubernetes
      k8sNamespace: "default"
      k8sLabelSelector: "app.kubernetes.io/name=bifrost"
      allowedAddressSpace:
        - "10.0.0.0/8"
        - "172.16.0.0/12"

Region-Aware Routing

Tag replicas with a region identifier for latency-aware routing:

bifrost:
  cluster:
    enabled: true
    region: "us-east-1"

Full HA Production Example

# ha-production-values.yaml
image:
  tag: "v1.4.11"

replicaCount: 3

resources:
  requests:
    cpu: 1000m
    memory: 1Gi
  limits:
    cpu: 4000m
    memory: 4Gi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 15
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 75
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Pods
          value: 1
          periodSeconds: 120
    scaleUp:
      stabilizationWindowSeconds: 30

terminationGracePeriodSeconds: 90
lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 20"]

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/proxy-body-size: "100m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
  hosts:
    - host: bifrost.yourdomain.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: bifrost-tls
      hosts:
        - bifrost.yourdomain.com

storage:
  mode: postgres

postgresql:
  external:
    enabled: true
    host: "rds.us-east-1.amazonaws.com"
    port: 5432
    user: bifrost
    database: bifrost
    sslMode: require
    existingSecret: "postgres-credentials"
    passwordKey: "password"

bifrost:
  encryptionKeySecret:
    name: "bifrost-encryption"
    key: "encryption-key"

  client:
    initialPoolSize: 1000
    dropExcessRequests: true
    enableLogging: true
    enforceGovernanceHeader: true

  cluster:
    enabled: true
    region: "us-east-1"
    discovery:
      enabled: true
      type: kubernetes
      k8sNamespace: "default"
      k8sLabelSelector: "app.kubernetes.io/name=bifrost"
    gossip:
      port: 7946
      config:
        timeoutSeconds: 10
        successThreshold: 3
        failureThreshold: 3

  plugins:
    telemetry:
      enabled: true
      config:
        push_gateway:
          enabled: true
          push_gateway_url: "http://prometheus-pushgateway.monitoring.svc.cluster.local:9091"
          push_interval: 15
    logging:
      enabled: true
    governance:
      enabled: true
      config:
        is_vk_mandatory: true

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/name: bifrost
        topologyKey: kubernetes.io/hostname

serviceAccount:
  create: true
  annotations: {}

# Prerequisites
kubectl create secret generic postgres-credentials \
  --from-literal=password='your-secure-postgres-password'

kubectl create secret generic bifrost-encryption \
  --from-literal=encryption-key='your-32-byte-encryption-key'

# RBAC for Kubernetes pod discovery
kubectl apply -f - <<'EOF'
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: bifrost-pod-discovery
  namespace: default
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["list", "get", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: bifrost-pod-discovery
  namespace: default
subjects:
  - kind: ServiceAccount
    name: bifrost
    namespace: default
roleRef:
  kind: Role
  name: bifrost-pod-discovery
  apiGroup: rbac.authorization.k8s.io
EOF

# Install
helm install bifrost bifrost/bifrost -f ha-production-values.yaml

# Verify all peers have found each other (check logs)
kubectl logs -l app.kubernetes.io/name=bifrost --tail=50 | grep -i gossip

Verifying Cluster Health

# Check all pods are running
kubectl get pods -l app.kubernetes.io/name=bifrost

# Check gossip port is reachable between pods
kubectl exec -it bifrost-0 -- nc -zv bifrost-1.bifrost-headless 7946

# Check health endpoint
kubectl port-forward svc/bifrost 8080:8080 &
curl http://localhost:8080/health

# View HPA status
kubectl get hpa bifrost

# Scale manually during maintenance
kubectl scale deployment bifrost --replicas=5

Documentation Index

​When to Use Cluster Mode

​Basic Cluster Setup

​Peer Discovery

​Allowed Address Space

​Region-Aware Routing

​Full HA Production Example

​Verifying Cluster Health

When to Use Cluster Mode

Basic Cluster Setup

Peer Discovery

Allowed Address Space

Region-Aware Routing

Full HA Production Example

Verifying Cluster Health