- NPX
- Docker
✨ Features
- WebSocket based responses API — Added WebSocket transport for responses API (OpenAI)
- Anthropic Passthrough — Added native Anthropic passthrough endpoint
- Prompt Repository — Added HTTP handlers for prompt management with RBAC (folders, prompts, versions, sessions)
- Streaming Request Decompression — Threshold-gated streaming decompression with pooled readers, replacing BodyUncompressed()
- Model Parameters API — Added model parameters table and API endpoint with in-memory caching
- Virtual Key Limit Resets — Added virtual key limit reset functionality
- Session Stickiness — Added session stickiness in key selection for consistent routing
- Pricing Engine Refactor — Unified cost calculation with quality-based image and video pricing
- Image Configuration — Added size/aspect ratio config for Gemini and size-to-resolution conversion for Replicate
- Large Payload Support — Added large payload awareness across transport hooks, plugins, and response streaming
- Raw Request/Response Storage — Allow storing raw request/response without returning them to clients (thanks @Vaibhav701161!)
- ChatReasoning Enabled Field — Added Enabled field to ChatReasoning struct (thanks @mango766!)
🐞 Fixed
- Deterministic Tool Schema — Fixed deterministic tool schema serialization for Anthropic prompt caching (thanks @Edward-Upton!)
- CORS Wildcard — Fixed CORS issue with allowing * origin
- TLS Termination — Allow TLS termination inside Bifrost server through config
- Bedrock toolChoice — Fixed toolChoice silently dropped on Bedrock /converse and /converse-stream endpoints
- Count Tokens Passthrough — Fixed request body passthrough for count tokens endpoint for Anthropic and Vertex
- Chat Finish Reason — Map chat finish_reason to responses status and preserve terminal stream semantics
- Tool Call Indexes — Fixed streaming tool call indices for parallel tool calls in chat completions stream
- Video Pricing — Fixed video pricing calculation
- SQLite Migration — Prevented CASCADE deletion during routing targets migration
- Log Serialization — Reduced logstore serialization overhead and batch cost updates
- Log List Queries — Avoid loading raw_request/raw_response in log list queries (thanks @Vaibhav701161!)
- MCP Reconnection — Improved MCP client reconnection with exponential backoff and connection timeout
- Responses Input Messages — Set responses input messages in gen_ai.input.messages
- Helm Fixes — Fixed Helm chart and test issues
- feat: WebSocket and Realtime API support
- feat: Anthropic passthrough support
- feat: threshold-gated streaming request decompression with pooled readers
- feat: refactored model catalog pricing engine with unified cost calculation
- feat: quality-based image pricing and image size/aspect ratio for Gemini
- feat: size-to-resolution conversion for Replicate image models
- feat: session stickiness in key selection
- feat: add Enabled field to ChatReasoning struct (thanks @mango766!)
- feat: allow storing raw request/response without returning to clients (thanks @Vaibhav701161!)
- feat: RBAC for prompt repository
- fix: deterministic tool schema serialization for prompt caching (thanks @Edward-Upton!)
- fix: skip body building for large payload flow
- fix: TLS termination inside Bifrost server through config
- fix: map chat finish_reason to responses status and preserve terminal stream semantics
- fix: set responses input messages in gen_ai.input.messages
- fix: video pricing fixes
- fix: remove resolution parameter from image generation
- fix: MCP client reconnection with exponential backoff and connection timeout
- fix: record ttft in nanoseconds instead of milliseconds to avoid truncation to 0
- feat: add
routing_targetstable with 1:many relationship torouting_rules; migrates existing single-target rules to the new table withweight=1; drops legacyproviderandmodelcolumns fromrouting_rules - feat: add per-target
key_idpinning support inrouting_targets - fix: avoid postgres cached-plan failures during provider hash backfill @dannyball710
- feat: prompt repository with folder, prompt, version, and session schemas and backend
- feat: model parameters table and API endpoint with in-memory caching
- feat: large payload awareness for plugins and logstore
- feat: large payload transport hooks and response streaming
- feat: chat token detail OTEL span attributes
- feat: hide deleted virtual keys from filter options
- feat: virtual key search/filtering and pagination
- feat: allow storing raw request/response without returning to clients (thanks @Vaibhav701161!)
- fix: MCP client reconnection with exponential backoff and connection timeout
- fix: prevent SQLite CASCADE deletion during routing targets migration
- fix: reduce logstore serialization overhead and batch cost updates
- fix: video pricing fixes
- fix: avoid loading raw_request/raw_response in log list queries (thanks @Vaibhav701161!)
- chore: upgraded core to v1.4.8
- feat: pricing engine integration with unified cost calculation
- feat: large payload awareness
- chore: upgraded core to v1.4.8 and framework to v1.2.27
- chore: upgraded core to v1.4.8 and framework to v1.2.27
- chore: upgraded core to v1.4.8 and framework to v1.2.27
- feat: passthrough support for log capture
- feat: large payload awareness for logstore
- feat: async log write improvements
- fix: reduce logstore serialization overhead and batch cost updates
- fix: avoid loading raw_request/raw_response in log list queries (thanks @Vaibhav701161!)
- chore: upgraded core to v1.4.8 and framework to v1.2.27
- feat: WebSocket and Realtime API support
- chore: upgraded core to v1.4.8 and framework to v1.2.27
- chore: upgraded core to v1.4.8 and framework to v1.2.27
- fix: set responses input messages in gen_ai.input.messages
- chore: upgraded core to v1.4.8 and framework to v1.2.27
- feat: add Enabled field to ChatReasoning struct (thanks @mango766!)
- feat: large payload awareness
- chore: upgraded core to v1.4.8 and framework to v1.2.27
- feat: pricing engine integration
- chore: upgraded core to v1.4.8 and framework to v1.2.27

