Skip to main content

Overview

Fireworks is an OpenAI-compatible provider in Bifrost with native support for:
  • Chat Completions via /v1/chat/completions
  • Responses API via /v1/responses
  • Text Completions via /v1/completions
  • Embeddings via /v1/embeddings
  • Streaming for chat, responses, and completions
  • Tool calling for chat and responses
Unless noted below, Fireworks follows the standard OpenAI-compatible request and response behavior described in OpenAI.

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/v1/chat/completions
Responses API/v1/responses
Text Completions/v1/completions
Embeddings/v1/embeddings
List Models-/v1/models
Images-
Speech / Transcription-
Files-
Batch-
Count Tokens-
Fireworks Responses support is native in Bifrost. Requests are sent to Fireworks’ /v1/responses endpoint directly, so fields such as previous_response_id, max_tool_calls, and store are preserved.

1. Chat Completions

Fireworks chat completions use the standard OpenAI-compatible wire format.

Fireworks-specific handling

  • prediction is preserved and forwarded.
  • Bifrost maps prompt_cache_key to Fireworks prompt_cache_isolation_key for chat-completion cache isolation.
  • Assistant reasoning_content is preserved for Fireworks chat-completion models that support reasoning history.

Filtered Parameters

For Fireworks chat completions, Bifrost removes or rewrites a small set of OpenAI-specific fields before sending the request upstream:
  • prompt_cache_key is mapped to Fireworks prompt_cache_isolation_key
  • prompt_cache_retention is removed
  • verbosity is removed
  • store is removed
  • web_search_options is removed

Example

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/deepseek-v3p2",
    "messages": [
      {"role": "user", "content": "Reply with exactly: fireworks ok"}
    ]
  }'

2. Responses API

Fireworks Responses use the native Fireworks endpoint:
/v1/responses
This preserves Responses-only fields and semantics, including:
  • previous_response_id
  • max_tool_calls
  • store
  • native responses streaming

Example

curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/deepseek-v3p2",
    "input": [
      {"role": "user", "content": "Reply with exactly: responses ok"}
    ],
    "max_tool_calls": 2
  }'
For continuation requests, Fireworks also supports previous_response_id.

3. Text Completions

Fireworks text completions are sent to the native completions endpoint:
/v1/completions

Example

curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/deepseek-v3p2",
    "prompt": "In fruits, A is for apple and B is for"
  }'
For Fireworks text completions, Bifrost extracts prompt_cache_key from extra_params and maps it to Fireworks prompt_cache_isolation_key.

4. Embeddings

Fireworks embeddings are sent to:
/v1/embeddings
Embedding-capable models may be different from chat/completions models.

Example

curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/nomic-ai/nomic-embed-text-v1.5",
    "input": "embedding test"
  }'
Fireworks documents additional embedding-specific fields such as prompt_template, return_logits, and normalize. This page describes the standard embeddings flow currently covered by Bifrost.

5. Unsupported Features

The following operations are still unsupported by the Fireworks provider in Bifrost:
FeatureStatus
Image generation / editing / variations
Speech / TTS
Transcription / STT
Files
Batch
Count tokens
Rerank

6. Caveats

For Fireworks chat completions, Bifrost maps prompt_cache_key to Fireworks prompt_cache_isolation_key, which is the Fireworks body field for cache isolation. Fireworks also accepts the header form x-prompt-cache-isolation-key. For text completions, Bifrost extracts prompt_cache_key from extra_params and maps it to the same Fireworks body field. If you need Fireworks session-affinity behavior, pass user, configure x-session-affinity in provider extra headers, or send it through the HTTP gateway via x-bf-eh-x-session-affinity. Live cache-hit behavior remains model and deployment dependent.
Bifrost preserves assistant reasoning_content for Fireworks chat models that support reasoning history. Fireworks-specific reasoning controls such as reasoning_history are not given special typed handling in this provider page.