Overview
Fireworks is an OpenAI-compatible provider in Bifrost with native support for:- Chat Completions via
/v1/chat/completions - Responses API via
/v1/responses - Text Completions via
/v1/completions - Embeddings via
/v1/embeddings - Streaming for chat, responses, and completions
- Tool calling for chat and responses
Supported Operations
| Operation | Non-Streaming | Streaming | Endpoint |
|---|---|---|---|
| Chat Completions | ✅ | ✅ | /v1/chat/completions |
| Responses API | ✅ | ✅ | /v1/responses |
| Text Completions | ✅ | ✅ | /v1/completions |
| Embeddings | ✅ | ❌ | /v1/embeddings |
| List Models | ✅ | - | /v1/models |
| Images | ❌ | ❌ | - |
| Speech / Transcription | ❌ | ❌ | - |
| Files | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
| Count Tokens | ❌ | ❌ | - |
Fireworks Responses support is native in Bifrost. Requests are sent to Fireworks’
/v1/responses endpoint directly, so fields such as previous_response_id, max_tool_calls, and store are preserved.1. Chat Completions
Fireworks chat completions use the standard OpenAI-compatible wire format.Fireworks-specific handling
predictionis preserved and forwarded.- Bifrost maps
prompt_cache_keyto Fireworksprompt_cache_isolation_keyfor chat-completion cache isolation. - Assistant
reasoning_contentis preserved for Fireworks chat-completion models that support reasoning history.
Filtered Parameters
For Fireworks chat completions, Bifrost removes or rewrites a small set of OpenAI-specific fields before sending the request upstream:prompt_cache_keyis mapped to Fireworksprompt_cache_isolation_keyprompt_cache_retentionis removedverbosityis removedstoreis removedweb_search_optionsis removed
Example
2. Responses API
Fireworks Responses use the native Fireworks endpoint:previous_response_idmax_tool_callsstore- native responses streaming
Example
previous_response_id.
3. Text Completions
Fireworks text completions are sent to the native completions endpoint:Example
prompt_cache_key from extra_params and maps it to Fireworks prompt_cache_isolation_key.
4. Embeddings
Fireworks embeddings are sent to:Example
prompt_template, return_logits, and normalize. This page describes the standard embeddings flow currently covered by Bifrost.
5. Unsupported Features
The following operations are still unsupported by the Fireworks provider in Bifrost:| Feature | Status |
|---|---|
| Image generation / editing / variations | ❌ |
| Speech / TTS | ❌ |
| Transcription / STT | ❌ |
| Files | ❌ |
| Batch | ❌ |
| Count tokens | ❌ |
| Rerank | ❌ |
6. Caveats
Prompt Caching Semantics
Prompt Caching Semantics
For Fireworks chat completions, Bifrost maps
prompt_cache_key to Fireworks prompt_cache_isolation_key, which is the Fireworks body field for cache isolation. Fireworks also accepts the header form x-prompt-cache-isolation-key. For text completions, Bifrost extracts prompt_cache_key from extra_params and maps it to the same Fireworks body field. If you need Fireworks session-affinity behavior, pass user, configure x-session-affinity in provider extra headers, or send it through the HTTP gateway via x-bf-eh-x-session-affinity. Live cache-hit behavior remains model and deployment dependent.Reasoning History
Reasoning History
Bifrost preserves assistant
reasoning_content for Fireworks chat models that support reasoning history. Fireworks-specific reasoning controls such as reasoning_history are not given special typed handling in this provider page.
