Skip to main content

Overview

Cerebras is a fully OpenAI-compatible provider leveraging the complete set of OpenAI API features. Bifrost delegates all functionality to the OpenAI provider implementation with standard parameter filtering. Key characteristics:
  • Complete OpenAI compatibility - All chat, text, and streaming features supported
  • Full tool calling - Function definitions and parallel tool execution
  • Streaming support - Server-Sent Events with token usage tracking
  • Parameter preservation - Passes through all standard OpenAI parameters
  • Responses API - Full support with format conversion

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/v1/chat/completions
Responses API/v1/chat/completions
Text Completions/v1/completions
List Models-/v1/models
Embeddings-
Image Generation-
Speech (TTS)-
Transcriptions (STT)-
Files-
Batch-
Unsupported Operations (❌): Embeddings, Image Generation, Speech, Transcriptions, Files, and Batch are not supported by the upstream Cerebras API. These return UnsupportedOperationError.

1. Chat Completions

Request Parameters

Cerebras supports all standard OpenAI chat completion parameters. For full parameter reference and behavior, see OpenAI Chat Completions.

Filtered Parameters

Removed for Cerebras compatibility:
  • prompt_cache_key - Not supported
  • verbosity - Anthropic-specific
  • store - Not supported
  • service_tier - OpenAI-specific

Reasoning Parameter

Cerebras delegates to OpenAI via ToOpenAIChatRequest, so reasoning parameters are transformed: reasoning.effort values (e.g., minimallow) are mapped per the OpenAI-compatible providers convention, and reasoning.max_tokens is cleared/omitted (removed during conversion). Cerebras supports all standard OpenAI message types, tools, responses, and streaming formats. For details on message handling, tool conversion, responses, and streaming, refer to OpenAI Chat Completions.

2. Responses API

Bifrost converts Responses API format to Chat Completions internally, then converts response back:
BifrostResponsesRequest
  → ToChatRequest()
  → ChatCompletion
  → ToBifrostResponsesResponse()
Same parameter support as Chat Completions with response format differences (output items instead of message content).

3. Text Completions

Cerebras supports legacy text completion API:
ParameterMapping
promptSent as-is
max_tokensmax_tokens
temperaturetemperature
top_ptop_p
stopstop sequences
Response returns choices[].text with completion text.

4. Text Completions Streaming

Streaming text completions use same SSE format as chat streaming.

5. List Models

Lists available models from Cerebras with capabilities and context length information.

Unsupported Features

FeatureReason
EmbeddingNot offered by Cerebras API
Image GenerationNot offered by Cerebras API
Speech/TTSNot offered by Cerebras API
Transcription/STTNot offered by Cerebras API
Batch OperationsNot offered by Cerebras API
File ManagementNot offered by Cerebras API

Caveats

Severity: Low Behavior: User field > 64 characters is silently dropped Impact: Longer user identifiers are lost Code: SanitizeUserField enforces 64-char max