Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getbifrost.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

OpenAI is the baseline schema for Bifrost. When using OpenAI directly, parameters are passed through with minimal conversion - mostly validation and filtering of OpenAI-specific features.

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/v1/chat/completions
Responses API/v1/responses
Text Completions/v1/completions
Embeddings-/v1/embeddings
Speech (TTS)/v1/audio/speech
Transcriptions (STT)/v1/audio/transcriptions
Image Generation/v1/images/generations
Image Edit/v1/images/edits
Image Variation-/v1/images/variations
Files-/v1/files
Batch-/v1/batches
Video Generation-/v1/videos
List Models-/v1/models

1. Chat Completions

Request Parameters
ParameterTypeRequiredNotes
modelstringModel identifier
messagesarrayChatMessage array with roles (docs)
temperaturefloatSampling temperature (0-2)
top_pfloatNucleus sampling parameter
stopstring/arrayStop sequences
max_completion_tokensintMin 16, max output tokens
frequency_penaltyfloatFrequency penalty (-2 to 2)
presence_penaltyfloatPresence penalty (-2 to 2)
logit_biasobjectToken logit adjustments
logprobsboolInclude log probabilities
top_logprobsintNumber of log probabilities per token
seedintReproducibility seed
response_formatobjectOutput format (docs)
toolsarrayTool objects (docs)
tool_choicestring/object"auto", "none", "required", or specific tool
parallel_tool_callsboolAllow multiple simultaneous tool calls
stream_optionsobjectStreaming options (docs)
reasoningobjectReasoning parameters (Bifrost docs, OpenAI docs)
userstringTruncated to 64 chars
metadataobjectCustom metadata
storeboolFiltered for non-OpenAI routing
service_tierstringFiltered for non-OpenAI routing
prompt_cache_keystringFiltered for non-OpenAI routing
predictionobjectPredicted output for acceleration
audioobjectAudio output config
modalitiesarrayResponse modalities (text, audio)

  • Reasoning: OpenAI supports reasoning.effort (minimal, low, medium, high) and reasoning.max_tokens - both passed through directly. When routing to other providers, "minimal" effort is converted to "low" for compatibility. See Bifrost reasoning docs.
  • Messages: All message roles are supported: system, user, assistant, tool, developer (treated as system). Content types: text, images via URL (image_url), audio input (input_audio). Tool messages include a tool_call_id.
  • Tools: Standard OpenAI tool format with strict mode support. Tool choice: "auto", "none", "required", or specific tool by name.
  • Responses: Passed through in standard OpenAI format. Finish reasons: stop, length, tool_calls, content_filter. Usage includes token counts and optionally cached/reasoning token details.
  • Streaming: Server-Sent Events format with delta.content, delta.tool_calls, finish_reason, and usage (final chunk only, automatically included by Bifrost). stream_options: { include_usage: true } is set by default for all streaming calls.
  • Cache Control: cache_control fields are stripped from messages, their content blocks, and tools before sending.
  • Token Enforcement: max_completion_tokens is enforced to have a minimum of 16. Values below 16 are automatically set to 16.
  • Special handling: user field is truncated to 64 characters; prompt_cache_key, store, service_tier are filtered when routing to non-OpenAI providers

2. Responses API

The Responses API is OpenAI’s structured output API. Request Parameters
ParameterTypeRequiredNotes
modelstringModel identifier
inputstring/arrayText or ContentBlock array (docs)
max_output_tokensintMaximum output length
backgroundboolRun request in background mode
conversationstringConversation ID for continuing a conversation
includearrayArray of fields to include in response (e.g., "web_search_call.action.sources")
instructionsstringSystem instructions
max_tool_callsintMaximum number of tool calls
metadataobjectCustom metadata
parallel_tool_callsboolAllow multiple simultaneous tool calls
previous_response_idstringID of previous response to continue from
prompt_cache_keystringPrompt caching key
reasoningobjectResponsesParametersReasoning configuration (Bifrost docs)
safety_identifierstringSafety identifier for content filtering
service_tierstringService tier for the request
stream_optionsobjectResponsesStreamOptions configuration
storeboolStore the response for later retrieval
temperaturefloatSampling temperature
textobjectResponsesTextConfig for output formatting
top_logprobsintNumber of log probabilities to return per token
top_pfloatNucleus sampling parameter
tool_choicestring/objectResponsesToolChoice strategy
toolsarrayResponsesTool objects (docs)
truncationstringTruncation strategy (auto or off)
userstringTruncated to 64 chars

Special Message Handling (gpt-oss vs other models): OpenAI models handle reasoning differently depending on the model family:
  • Non-gpt-oss models (GPT-4o, o1, etc.): Send reasoning as summaries. Reasoning-only messages (with no summary and only content blocks) are filtered out since these models don’t support reasoning content blocks in the request format.
  • gpt-oss models: Send reasoning as content blocks. Reasoning summaries in the request are converted to content blocks since gpt-oss expects reasoning as structured blocks, not summaries.
This conversion ensures compatibility across different model architectures for the structured Responses API. See Bifrost reasoning docs for detailed reasoning handling. Token & Parameter Enforcement:
  • max_output_tokens is enforced to have a minimum of 16. Values below 16 are automatically set to 16.
  • reasoning.max_tokens field is automatically removed from JSON output (OpenAI Responses API doesn’t accept it).
Other conversions:
  • Action types zoom and region are converted to screenshot
  • cache_control fields are stripped from messages and tools
  • Unsupported tool types are silently filtered (only these are supported: function, file_search, computer_use_preview, web_search, mcp, code_interpreter, image_generation, local_shell, custom, web_search_preview)
Response: Includes id, status (completed, incomplete, pending, error), output array with message content, and token usage. Streaming: Server-Sent Events with types: response.created, response.in_progress, response.output_item.added, response.content_part.added, response.output_text.delta, response.function_call_arguments.delta, response.completed, response.incomplete. stream_options: { include_usage: true } is set by default for all streaming calls.

3. Text Completions (Legacy)

Text Completions is a legacy API. Use Chat Completions for new implementations.
Request Parameters
ParameterTypeRequiredNotes
modelstringModel identifier
promptstring/arrayCompletion prompt(s)
max_tokensintMaximum output tokens
temperaturefloatSampling temperature
top_pfloatNucleus sampling
stopstring/arrayStop sequences
userstringTruncated to 64 chars

  • Array prompts generate multiple completions. Finish reasons: stop or length. Streaming uses SSE format. stream_options: { include_usage: true } is set by default for streaming calls.
  • user field is truncated to 64 characters or set to nil if it exceeds the limit.

4. Embeddings

Request Parameters
ParameterTypeRequiredNotes
modelstringModel identifier
inputstring/arrayText(s) to embed (docs)
encoding_formatstringfloat or base64
dimensionsintOutput embedding dimensions
userstringNOT truncated (unlike chat/text)

  • No streaming support. Returns embedding array with usage counts.

5. Speech (Text-to-Speech)

Request Parameters
ParameterTypeRequiredNotes
modelstringtts-1 or tts-1-hd
inputstringText to convert to speech
voicestringalloy, echo, fable, onyx, nova, shimmer
response_formatstringmp3, opus, aac, flac, wav, pcm
speedfloat0.25 to 4.0 (default 1.0)

  • Returns raw binary audio. Streaming supported in SSE format (base64 chunks), but not all models support streaming. stream_options: { include_usage: true } is set by default for streaming calls.

6. Transcriptions (Speech-to-Text)

Requests use multipart/form-data, not JSON.
Request Parameters
ParameterTypeRequiredNotes
filebinaryAudio file (multipart form-data)
modelstringwhisper-1
languagestringISO-639-1 language code
promptstringOptional prompt for context
temperaturefloatSampling temperature
response_formatstringjson, text, srt, vtt, verbose_json

  • Supported audio formats: mp3, mp4, mpeg, mpga, m4a, wav, webm
  • Response: Includes text, task, language, duration, and optionally word-level timing. Streaming supported in SSE format. stream_options: { include_usage: true } is set by default for streaming calls.

7. Image Generation

Request Parameters
ParameterTypeRequiredNotes
modelstringModel identifier (e.g., dall-e-3)
promptstringText description of the image to generate
nintNumber of images to generate (1-10)
sizestringImage size: "256x256", "512x512", "1024x1024", "1792x1024", "1024x1792", "1536x1024", "1024x1536", "auto"
qualitystringImage quality: "auto", "high", "medium", "low", "hd", "standard"
stylestringImage style: "natural", "vivid"
response_formatstringResponse format: "url" or "b64_json"
backgroundstringBackground: "transparent", "opaque", "auto"
output_formatstringOutput format: "png", "webp", "jpeg"
output_compressionintCompression level (0-100%)
partial_imagesintNumber of partial images (0-3)
moderationstringModeration level: "low", "auto"
userstringUser identifier

Request Conversion OpenAI is the baseline schema for image generation. Parameters are passed through with minimal conversion:
  • Model & Prompt: bifrostReq.Modelreq.Model, bifrostReq.Promptreq.Prompt
  • Parameters: All fields from bifrostReq (ImageGenerationParameters) are embedded directly into the OpenAI request struct via struct embedding. No field mapping or transformation is performed.
  • Streaming: When streaming is requested, stream: true is set in the request body.
Response Conversion
  • Non-streaming: OpenAI responses are unmarshaled directly into BifrostImageGenerationResponse since Bifrost’s response schema is a superset of OpenAI’s format. All fields are passed through as-is.
  • Streaming: OpenAI streaming responses use Server-Sent Events (SSE) format with event types:
    • image_generation.partial_image: Intermediate image chunks with b64_json data
    • image_generation.completed: Final chunk for each image with usage information
    • error: Error events
    Each chunk includes:
    • type: Event type
    • sequence_number: Sequence number of the chunk
    • partial_image_index: Image index (0-N) for partial images
    • b64_json: Base64-encoded image data (pointer, may be nil)
    • usage: Token usage (only in completed events)
    • created_at, size, quality, background, output_format: Additional metadata
    Bifrost converts these to BifrostImageGenerationStreamResponse chunks with:
    • Per-image chunkIndex tracking for proper ordering within each image
    • Index field indicating which image (0-N) the chunk belongs to
    • PartialImageIndex set only for partial images (not completed events)
    • Usage information attached to completed chunks
    • Latency tracking per chunk
Endpoint: /v1/images/generations

8. Image Edit

Requests use multipart/form-data, not JSON.
Request Parameters
ParameterTypeRequiredNotes
modelstringModel identifier
promptstringText description of the edit
image[]binaryImage file(s) to edit (multipart form-data, supports multiple images)
maskbinaryMask image file (multipart form-data)
nintNumber of images to generate (1-10)
sizestringImage size: "256x256", "512x512", "1024x1024", "1536x1024", "1024x1536", "auto"
qualitystringImage quality: "auto", "high", "medium", "low", "standard"
response_formatstringResponse format: "url" or "b64_json"
backgroundstringBackground: "transparent", "opaque", "auto"
input_fidelitystringInput fidelity: "low", "high"
partial_imagesintNumber of partial images (0-3)
output_formatstringOutput format: "png", "webp", "jpeg"
output_compressionintCompression level (0-100%)
userstringUser identifier
streamboolEnable streaming response

Request Conversion
  • Model & Input: bifrostReq.Modelreq.Model, bifrostReq.Input.Imagesreq.Input.Images, bifrostReq.Input.Promptreq.Input.Prompt
  • Parameters: All fields from bifrostReq.Params (ImageEditParameters) are embedded directly into the OpenAI request struct via struct embedding. No field mapping or transformation is performed.
  • Multipart Form Data: The request is serialized as multipart/form-data:
    • Model & Prompt: Written as form fields (model, prompt)
    • Images: Each image in Input.Images is written as a separate image[] field with proper MIME type detection (image/jpeg, image/webp, image/png) and Content-Type headers
    • Mask: If present, written as a mask field with MIME type detection and appropriate filename (mask.png, mask.jpg, mask.webp)
    • Optional Parameters: All optional parameters (n, size, quality, response_format, background, input_fidelity, partial_images, output_format, output_compression, user) are written as form fields
    • Integer Conversion: Integer fields (n, partial_images, output_compression) are converted to strings using strconv.Itoa
    • Streaming: When streaming is requested, stream: "true" is written as a form field
Response Conversion
  • Non-streaming: OpenAI responses are unmarshaled directly into BifrostImageGenerationResponse since Bifrost’s response schema is a superset of OpenAI’s format. All fields are passed through as-is.
  • Streaming: OpenAI streaming responses use Server-Sent Events (SSE) format with event types:
    • image_edit.partial_image: Intermediate image chunks with b64_json data
    • image_edit.completed: Final chunk for each image with usage information
    • error: Error events
    Each chunk includes:
    • type: Event type (image_edit.partial_image or image_edit.completed)
    • sequence_number: Sequence number of the chunk
    • partial_image_index: Image index (0-N) for partial images
    • b64_json: Base64-encoded image data (pointer, may be nil)
    • usage: Token usage (only in completed events)
    Bifrost converts these to BifrostImageGenerationStreamResponse chunks with:
    • Per-image chunkIndex tracking for proper ordering within each image
    • Index field indicating which image (0-N) the chunk belongs to
    • PartialImageIndex set only for partial images (not completed events)
    • Usage information attached to completed chunks
    • Latency tracking per chunk
    • Robust handling of interleaved chunks using incomplete image tracking
Endpoint: /v1/images/edits

9. Image Variation

Requests use multipart/form-data, not JSON.
Request Parameters
ParameterTypeRequiredNotes
modelstringModel identifier
imagebinaryImage file to create variations from (multipart form-data)
nintNumber of images to generate (1-10)
sizestringImage size: "256x256", "512x512", "1024x1024", "1792x1024", "1024x1792", "1536x1024", "1024x1536", "auto"
response_formatstringResponse format: "url" or "b64_json"
userstringUser identifier

Request Conversion
  • Model & Input: bifrostReq.Modelreq.Model, bifrostReq.Input.Image.Imagereq.Input.Image.Image
  • Parameters: All fields from bifrostReq.Params (ImageVariationParameters) are embedded directly into the OpenAI request struct via struct embedding. No field mapping or transformation is performed.
  • Multipart Form Data: The request is serialized as multipart/form-data:
    • Model: Written as form field (model)
    • Image: The image is written as an image field with proper MIME type detection (image/jpeg, image/webp, image/png) and Content-Type headers. If MIME type cannot be detected, defaults to image/png
    • Optional Parameters: All optional parameters (n, size, response_format, user) are written as form fields
    • Integer Conversion: Integer field (n) is converted to string using strconv.Itoa
  • Multiple Images: Additional images beyond the first one (if present in ExtraParams["images"]) are stored in ExtraParams but only the first image is sent to OpenAI (OpenAI API only supports single image input)
Response Conversion
  • Non-streaming: OpenAI responses are unmarshaled directly into BifrostImageVariationResponse (which is a type alias for BifrostImageGenerationResponse). All fields are passed through as-is.
  • Streaming: Not supported for image variation requests.
Endpoint: /v1/images/variations

10. Files API

Upload

Request Parameters
ParameterTypeRequiredNotes
filebinaryFile to upload (multipart form-data)
purposestringbatch, fine-tune, or assistants
filenamestringCustom filename (defaults to file.jsonl)
Response: FileObject with id, bytes, created_at, filename, purpose, status (docs)

List Files

Query Parameters
ParameterTypeRequiredNotes
purposestringFilter by purpose
limitintResults per page
afterstringPagination cursor
orderstringasc or desc
Cursor-based pagination with has_more flag.

Retrieve / Delete / Content

Operations:
  • GET /v1/files/{file_id} - Retrieve file metadata
  • DELETE /v1/files/{file_id} - Delete file
  • GET /v1/files/{file_id}/content - Download file content

11. Batch API

Create Batch

Request Parameters
ParameterTypeRequiredNotes
input_file_idstringConditionalFile ID OR requests array (not both)
requestsarrayConditionalBatchRequestItem objects (converted to JSONL)
endpointstringTarget endpoint (e.g., /v1/chat/completions)
completion_windowstring24h (default)
metadataobjectCustom metadata
Response: BifrostBatchCreateResponse with id, endpoint, input_file_id, status, created_at, request_counts (docs). Statuses: BatchStatus (validating, failed, in_progress, finalizing, completed, expired, cancelling, cancelled)

List Batches

Query Parameters
ParameterTypeRequiredNotes
limitintResults per page
afterstringPagination cursor

Retrieve / Cancel Batch

Operations:

Get Results

  1. Batch must be completed (has output_file_id)
  2. Download output file via Files API
  3. Parse JSONL - each BatchResultItem: {id, custom_id, response: {status_code, body}}

12. List Models

GET /v1/models - Lists available models with metadata. Model IDs in Bifrost responses are prefixed with openai/ (e.g., openai/gpt-4o). Results are aggregated from all configured API keys. No request body or parameters required.

13. Video Generation

Generate (POST /v1/videos)

Request Parameters
ParameterTypeRequiredNotes
modelstringe.g., sora-2
promptstringText description of the video
input_referencestringInput image for image-to-video. Must be a base64 data URL (e.g., data:image/png;base64,...). Plain URLs are not accepted.
secondsstringDuration in seconds
sizestringResolution: 720x1280 (default), 1280x720, 1024x1792, 1792x1024
Response: BifrostVideoGenerationResponse - id, status, model, prompt, created_at Job Statuses: queuedin_progresscompleted / failed

Retrieve / Download / Delete / List / Remix

OperationEndpointNotes
Get statusGET /v1/videos/{id}Poll until status: completed
DownloadGET /v1/videos/{id}/contentReturns raw video bytes
DeleteDELETE /v1/videos/{id}Removes video job
List jobsGET /v1/videosQuery params: after, limit, order
RemixPOST /v1/videos/{id}/remixBody: {"prompt": "..."}

Common Error Codes

HTTP Status → Error Type mapping:
  • 400 - invalid_request_error
  • 401 - authentication_error
  • 403 - permission_error
  • 404 - not_found_error
  • 429 - rate_limit_error
  • 500 - api_error