Skip to main content

Overview

OpenAI is the baseline schema for Bifrost. When using OpenAI directly, parameters are passed through with minimal conversion - mostly validation and filtering of OpenAI-specific features.

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/v1/chat/completions
Responses API/v1/responses
Text Completions/v1/completions
Embeddings-/v1/embeddings
Speech (TTS)/v1/audio/speech
Transcriptions (STT)/v1/audio/transcriptions
Files-/v1/files
Batch-/v1/batches
List Models-/v1/models

1. Chat Completions

Request Parameters
ParameterTypeRequiredNotes
modelstringModel identifier
messagesarrayChatMessage array with roles (docs)
temperaturefloatSampling temperature (0-2)
top_pfloatNucleus sampling parameter
stopstring/arrayStop sequences
max_completion_tokensintMin 16, max output tokens
frequency_penaltyfloatFrequency penalty (-2 to 2)
presence_penaltyfloatPresence penalty (-2 to 2)
logit_biasobjectToken logit adjustments
logprobsboolInclude log probabilities
top_logprobsintNumber of log probabilities per token
seedintReproducibility seed
response_formatobjectOutput format (docs)
toolsarrayTool objects (docs)
tool_choicestring/object"auto", "none", "required", or specific tool
parallel_tool_callsboolAllow multiple simultaneous tool calls
stream_optionsobjectStreaming options (docs)
reasoningobjectReasoning parameters (Bifrost docs, OpenAI docs)
userstringTruncated to 64 chars
metadataobjectCustom metadata
storeboolFiltered for non-OpenAI routing
service_tierstringFiltered for non-OpenAI routing
prompt_cache_keystringFiltered for non-OpenAI routing
predictionobjectPredicted output for acceleration
audioobjectAudio output config
modalitiesarrayResponse modalities (text, audio)

  • Reasoning: OpenAI supports reasoning.effort (minimal, low, medium, high) and reasoning.max_tokens - both passed through directly. When routing to other providers, "minimal" effort is converted to "low" for compatibility. See Bifrost reasoning docs.
  • Messages: All message roles are supported: system, user, assistant, tool, developer (treated as system). Content types: text, images via URL (image_url), audio input (input_audio). Tool messages include a tool_call_id.
  • Tools: Standard OpenAI tool format with strict mode support. Tool choice: "auto", "none", "required", or specific tool by name.
  • Responses: Passed through in standard OpenAI format. Finish reasons: stop, length, tool_calls, content_filter. Usage includes token counts and optionally cached/reasoning token details.
  • Streaming: Server-Sent Events format with delta.content, delta.tool_calls, finish_reason, and usage (final chunk only, automatically included by Bifrost). stream_options: { include_usage: true } is set by default for all streaming calls.
  • Cache Control: cache_control fields are stripped from messages, their content blocks, and tools before sending.
  • Token Enforcement: max_completion_tokens is enforced to have a minimum of 16. Values below 16 are automatically set to 16.
  • Special handling: user field is truncated to 64 characters; prompt_cache_key, store, service_tier are filtered when routing to non-OpenAI providers

2. Responses API

The Responses API is OpenAI’s structured output API. Request Parameters
ParameterTypeRequiredNotes
modelstringModel identifier
inputstring/arrayText or ContentBlock array (docs)
max_output_tokensintMaximum output length
backgroundboolRun request in background mode
conversationstringConversation ID for continuing a conversation
includearrayArray of fields to include in response (e.g., "web_search_call.action.sources")
instructionsstringSystem instructions
max_tool_callsintMaximum number of tool calls
metadataobjectCustom metadata
parallel_tool_callsboolAllow multiple simultaneous tool calls
previous_response_idstringID of previous response to continue from
prompt_cache_keystringPrompt caching key
reasoningobjectResponsesParametersReasoning configuration (Bifrost docs)
safety_identifierstringSafety identifier for content filtering
service_tierstringService tier for the request
stream_optionsobjectResponsesStreamOptions configuration
storeboolStore the response for later retrieval
temperaturefloatSampling temperature
textobjectResponsesTextConfig for output formatting
top_logprobsintNumber of log probabilities to return per token
top_pfloatNucleus sampling parameter
tool_choicestring/objectResponsesToolChoice strategy
toolsarrayResponsesTool objects (docs)
truncationstringTruncation strategy (auto or off)
userstringTruncated to 64 chars

Special Message Handling (gpt-oss vs other models): OpenAI models handle reasoning differently depending on the model family:
  • Non-gpt-oss models (GPT-4o, o1, etc.): Send reasoning as summaries. Reasoning-only messages (with no summary and only content blocks) are filtered out since these models don’t support reasoning content blocks in the request format.
  • gpt-oss models: Send reasoning as content blocks. Reasoning summaries in the request are converted to content blocks since gpt-oss expects reasoning as structured blocks, not summaries.
This conversion ensures compatibility across different model architectures for the structured Responses API. See Bifrost reasoning docs for detailed reasoning handling. Token & Parameter Enforcement:
  • max_output_tokens is enforced to have a minimum of 16. Values below 16 are automatically set to 16.
  • reasoning.max_tokens field is automatically removed from JSON output (OpenAI Responses API doesn’t accept it).
Other conversions:
  • Action types zoom and region are converted to screenshot
  • cache_control fields are stripped from messages and tools
  • Unsupported tool types are silently filtered (only these are supported: function, file_search, computer_use_preview, web_search, mcp, code_interpreter, image_generation, local_shell, custom, web_search_preview)
Response: Includes id, status (completed, incomplete, pending, error), output array with message content, and token usage. Streaming: Server-Sent Events with types: response.created, response.in_progress, response.output_item.added, response.content_part.added, response.output_text.delta, response.function_call_arguments.delta, response.completed, response.incomplete. stream_options: { include_usage: true } is set by default for all streaming calls.

3. Text Completions (Legacy)

Text Completions is a legacy API. Use Chat Completions for new implementations.
Request Parameters
ParameterTypeRequiredNotes
modelstringModel identifier
promptstring/arrayCompletion prompt(s)
max_tokensintMaximum output tokens
temperaturefloatSampling temperature
top_pfloatNucleus sampling
stopstring/arrayStop sequences
userstringTruncated to 64 chars

  • Array prompts generate multiple completions. Finish reasons: stop or length. Streaming uses SSE format. stream_options: { include_usage: true } is set by default for streaming calls.
  • user field is truncated to 64 characters or set to nil if it exceeds the limit.

4. Embeddings

Request Parameters
ParameterTypeRequiredNotes
modelstringModel identifier
inputstring/arrayText(s) to embed (docs)
encoding_formatstringfloat or base64
dimensionsintOutput embedding dimensions
userstringNOT truncated (unlike chat/text)

  • No streaming support. Returns embedding array with usage counts.

5. Speech (Text-to-Speech)

Request Parameters
ParameterTypeRequiredNotes
modelstringtts-1 or tts-1-hd
inputstringText to convert to speech
voicestringalloy, echo, fable, onyx, nova, shimmer
response_formatstringmp3, opus, aac, flac, wav, pcm
speedfloat0.25 to 4.0 (default 1.0)

  • Returns raw binary audio. Streaming supported in SSE format (base64 chunks), but not all models support streaming. stream_options: { include_usage: true } is set by default for streaming calls.

6. Transcriptions (Speech-to-Text)

Requests use multipart/form-data, not JSON.
Request Parameters
ParameterTypeRequiredNotes
filebinaryAudio file (multipart form-data)
modelstringwhisper-1
languagestringISO-639-1 language code
promptstringOptional prompt for context
temperaturefloatSampling temperature
response_formatstringjson, text, srt, vtt, verbose_json

  • Supported audio formats: mp3, mp4, mpeg, mpga, m4a, wav, webm
  • Response: Includes text, task, language, duration, and optionally word-level timing. Streaming supported in SSE format. stream_options: { include_usage: true } is set by default for streaming calls.

7. Files API

Upload

Request Parameters
ParameterTypeRequiredNotes
filebinaryFile to upload (multipart form-data)
purposestringbatch, fine-tune, or assistants
filenamestringCustom filename (defaults to file.jsonl)
Response: FileObject with id, bytes, created_at, filename, purpose, status (docs)

List Files

Query Parameters
ParameterTypeRequiredNotes
purposestringFilter by purpose
limitintResults per page
afterstringPagination cursor
orderstringasc or desc
Cursor-based pagination with has_more flag.

Retrieve / Delete / Content

Operations:
  • GET /v1/files/{file_id} - Retrieve file metadata
  • DELETE /v1/files/{file_id} - Delete file
  • GET /v1/files/{file_id}/content - Download file content

8. Batch API

Create Batch

Request Parameters
ParameterTypeRequiredNotes
input_file_idstringConditionalFile ID OR requests array (not both)
requestsarrayConditionalBatchRequestItem objects (converted to JSONL)
endpointstringTarget endpoint (e.g., /v1/chat/completions)
completion_windowstring24h (default)
metadataobjectCustom metadata
Response: BifrostBatchCreateResponse with id, endpoint, input_file_id, status, created_at, request_counts (docs). Statuses: BatchStatus (validating, failed, in_progress, finalizing, completed, expired, cancelling, cancelled)

List Batches

Query Parameters
ParameterTypeRequiredNotes
limitintResults per page
afterstringPagination cursor

Retrieve / Cancel Batch

Operations:

Get Results

  1. Batch must be completed (has output_file_id)
  2. Download output file via Files API
  3. Parse JSONL - each BatchResultItem: {id, custom_id, response: {status_code, body}}

9. List Models

GET /v1/models - Lists available models with metadata. Model IDs in Bifrost responses are prefixed with openai/ (e.g., openai/gpt-4o). Results are aggregated from all configured API keys. No request body or parameters required.

Common Error Codes

HTTP Status → Error Type mapping:
  • 400 - invalid_request_error
  • 401 - authentication_error
  • 403 - permission_error
  • 404 - not_found_error
  • 429 - rate_limit_error
  • 500 - api_error