Pod Not Starting
Quick diagnostics
Image pull errors (ErrImagePull / ImagePullBackOff)
image.tagnot set — the chart requires it; the pod will not start without it- Pull secret missing or expired (ECR tokens expire after 12 hours)
- Incorrect
image.repositoryfor enterprise registry
PVC not binding (Pending)
- No Persistent Volume provisioner in the cluster
storageClassset to a class that doesn’t existReadWriteOnceaccess mode with multiple replicas (SQLite PVCs are single-node)
ConfigMap / Secret errors
CrashLoopBackOff
Database Connection Issues
Embedded PostgreSQL
External PostgreSQL
sslMode: disablewhen the database requires SSL — setsslMode: require- Password in secret doesn’t match the database user
- Network policy blocking pod → database traffic
- Database not UTF8 encoded (see PostgreSQL UTF8 Requirement)
Ingress Not Working
ingress.classNamenot set or set to a class not installed in the cluster- TLS certificate not issued yet (cert-manager can take up to 60 seconds)
- Service port mismatch — Bifrost listens on
8080by default
Secret and Credential Issues
Provider API key not resolving
If Bifrost logs showenv.OPENAI_API_KEY: not set or similar:
Encryption key issues
High Memory Usage
High CPU Usage / Latency
initialPoolSizetoo small — goroutines queuing up; increase to500–1000dropExcessRequests: falsewith a small pool — queue depth growing unboundedly
Autoscaling Issues
HPA not scaling
Pods scaling down too aggressively (drops active SSE streams)
The defaultscaleDown.stabilizationWindowSeconds: 300 and preStop sleep of 15 seconds should prevent this. If streams are still being cut:
SQLite / PVC Issues
StatefulSet migration (upgrading from chart < v2.0.0)
Older chart versions used a Deployment + manual PVC. v2.0.0 moved SQLite to a StatefulSet. If upgrading:Data lost after upgrade
Cluster Mode Issues
Peers not discovering each other
Useful Diagnostic Commands
Still Stuck?
- GitHub Issues — search existing issues or open a new one
- Enterprise Support — for enterprise customers with SLA

