Overview

Fireworks is an OpenAI-compatible provider in Bifrost with native support for:

Chat Completions via /v1/chat/completions
Responses API via /v1/responses
Text Completions via /v1/completions
Embeddings via /v1/embeddings
Streaming for chat, responses, and completions
Tool calling for chat and responses

Unless noted below, Fireworks follows the standard OpenAI-compatible request and response behavior described in OpenAI.

Supported Operations

Operation	Non-Streaming	Streaming	Endpoint
Chat Completions	✅	✅	`/v1/chat/completions`
Responses API	✅	✅	`/v1/responses`
Text Completions	✅	✅	`/v1/completions`
Embeddings	✅	❌	`/v1/embeddings`
List Models	✅	-	`/v1/models`
Images	❌	❌	-
Speech / Transcription	❌	❌	-
Files	❌	❌	-
Batch	❌	❌	-
Count Tokens	❌	❌	-

Fireworks Responses support is native in Bifrost. Requests are sent to Fireworks’ /v1/responses endpoint directly, so fields such as previous_response_id, max_tool_calls, and store are preserved.

1. Chat Completions

Fireworks chat completions use the standard OpenAI-compatible wire format.

Fireworks-specific handling

prediction is preserved and forwarded.
Bifrost maps prompt_cache_key to Fireworks prompt_cache_isolation_key for chat-completion cache isolation.
Assistant reasoning_content is preserved for Fireworks chat-completion models that support reasoning history.

Filtered Parameters

For Fireworks chat completions, Bifrost removes or rewrites a small set of OpenAI-specific fields before sending the request upstream:

prompt_cache_key is mapped to Fireworks prompt_cache_isolation_key
prompt_cache_retention is removed
verbosity is removed
store is removed
web_search_options is removed

Example

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/deepseek-v3p2",
    "messages": [
      {"role": "user", "content": "Reply with exactly: fireworks ok"}
    ]
  }'

2. Responses API

Fireworks Responses use the native Fireworks endpoint:

/v1/responses

This preserves Responses-only fields and semantics, including:

previous_response_id
max_tool_calls
store
native responses streaming

Example

curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/deepseek-v3p2",
    "input": [
      {"role": "user", "content": "Reply with exactly: responses ok"}
    ],
    "max_tool_calls": 2
  }'

For continuation requests, Fireworks also supports previous_response_id.

3. Text Completions

Fireworks text completions are sent to the native completions endpoint:

/v1/completions

Example

curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/accounts/fireworks/models/deepseek-v3p2",
    "prompt": "In fruits, A is for apple and B is for"
  }'

For Fireworks text completions, Bifrost extracts prompt_cache_key from extra_params and maps it to Fireworks prompt_cache_isolation_key.

4. Embeddings

Fireworks embeddings are sent to:

/v1/embeddings

Embedding-capable models may be different from chat/completions models.

Example

curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fireworks/nomic-ai/nomic-embed-text-v1.5",
    "input": "embedding test"
  }'

Fireworks documents additional embedding-specific fields such as prompt_template, return_logits, and normalize. This page describes the standard embeddings flow currently covered by Bifrost.

5. Unsupported Features

The following operations are still unsupported by the Fireworks provider in Bifrost:

Feature	Status
Image generation / editing / variations	❌
Speech / TTS	❌
Transcription / STT	❌
Files	❌
Batch	❌
Count tokens	❌
Rerank	❌

6. Caveats

Prompt Caching Semantics

For Fireworks chat completions, Bifrost maps prompt_cache_key to Fireworks prompt_cache_isolation_key, which is the Fireworks body field for cache isolation. Fireworks also accepts the header form x-prompt-cache-isolation-key. For text completions, Bifrost extracts prompt_cache_key from extra_params and maps it to the same Fireworks body field. If you need Fireworks session-affinity behavior, pass user, configure x-session-affinity in provider extra headers, or send it through the HTTP gateway via x-bf-eh-x-session-affinity. Live cache-hit behavior remains model and deployment dependent.

Reasoning History

Bifrost preserves assistant reasoning_content for Fireworks chat models that support reasoning history. Fireworks-specific reasoning controls such as reasoning_history are not given special typed handling in this provider page.

Overview

Quick Start

Release Cadence

Migration Guides

SDK Integrations

Providers & Guides

MCP Gateway

Custom plugins

Open Source Features

Fireworks

Overview

Supported Operations

1. Chat Completions

Fireworks-specific handling

Filtered Parameters

Example

2. Responses API

Example

3. Text Completions

Example

4. Embeddings

Example

5. Unsupported Features

6. Caveats

Overview

Quick Start

Release Cadence

Migration Guides

SDK Integrations

Providers & Guides

MCP Gateway

Custom plugins

Open Source Features

Documentation Index

​Overview

​Supported Operations

​1. Chat Completions

​Fireworks-specific handling

​Filtered Parameters

​Example

​2. Responses API

​Example

​3. Text Completions

​Example

​4. Embeddings

​Example

​5. Unsupported Features

​6. Caveats

Overview

Supported Operations

1. Chat Completions

Fireworks-specific handling

Filtered Parameters

Example

2. Responses API

Example

3. Text Completions

Example

4. Embeddings

Example

5. Unsupported Features

6. Caveats