Overview
Google Gemini’s API has different structure from OpenAI. Bifrost performs extensive conversion including:- Role remapping - “assistant” → “model”, system messages integrated into main flow
- Message grouping - Consecutive tool responses merged into single user message
- Parameter renaming - e.g.,
max_completion_tokens→maxOutputTokens,stop→stopSequences - Function call handling - Tool call ID preservation and thought signature support
- Content modality - Support for text, images, video, code execution, and thought content
- Thinking/Reasoning - Thinking configuration mapped to Bifrost reasoning structure
Supported Operations
| Operation | Non-Streaming | Streaming | Endpoint |
|---|---|---|---|
| Chat Completions | ✅ | ✅ | /v1beta/models/{model}:generateContent |
| Responses API | ✅ | ✅ | /v1beta/models/{model}:generateContent |
| Speech (TTS) | ✅ | ✅ | /v1beta/models/{model}:generateContent |
| Transcriptions (STT) | ✅ | ✅ | /v1beta/models/{model}:generateContent |
| Embeddings | ✅ | - | /v1beta/models/{model}:embedContent |
| Files | ✅ | - | /upload/storage/v1beta/files |
| Batch | ✅ | - | /v1beta/batchJobs |
| List Models | ✅ | - | /v1beta/models |
1. Chat Completions
Request Parameters
Parameter Mapping
| Parameter | Transformation |
|---|---|
max_completion_tokens | Renamed to maxOutputTokens |
temperature, top_p | Direct pass-through |
stop | Renamed to stopSequences |
response_format | Converted to responseMimeType and responseSchema |
tools | Schema restructured (see Tool Conversion) |
tool_choice | Mapped to functionCallingConfig (see Tool Conversion) |
reasoning | Mapped to thinkingConfig (see Reasoning / Thinking) |
top_k | Via extra_params (Gemini-specific) |
presence_penalty, frequency_penalty | Via extra_params |
seed | Via extra_params |
Dropped Parameters
The following parameters are silently ignored:logit_bias, logprobs, top_logprobs, parallel_tool_calls, service_tier
Extra Parameters
Useextra_params (SDK) or pass directly in request body (Gateway) for Gemini-specific fields:
- Gateway
- Go SDK
Reasoning / Thinking
Documentation: See Bifrost Reasoning ReferenceParameter Mapping
reasoning.effort→thinkingConfig.thinkingLevel(“low” →LOW, “high” →HIGH)reasoning.max_tokens→thinkingConfig.thinkingBudget(token budget for thinking)reasoningparameter triggersthinkingConfig.includeThoughts = true
Supported Thinking Levels
"low"/"minimal"→LOW"medium"/"high"→HIGHnullor unspecified → Based onmax_tokens: -1 (dynamic), 0 (disabled), or specific budget
Example
Message Conversion
Critical Caveats
- Role remapping: “assistant” → “model”, “system” → part of user/model content flow
- Consecutive tool responses: Tool response messages merged into single user message with function response parts
- Content flattening: Multi-part content in single message preserved as parts array
Image Conversion
- URL images:
{type: "image_url", image_url: {url: "..."}}→{type: "image", source: {type: "url", url: "..."}} - Base64 images: Data URL →
{type: "image", source: {type: "base64", media_type: "image/png", ...}} - Video content: Preserved with metadata (fps, start/end offset)
Tool Conversion
Tool definitions are restructured with these mappings:function.name→functionDeclarations.name(preserved)function.parameters→functionDeclarations.parameters(Schema format)function.description→functionDeclarations.descriptionfunction.strict→ Dropped (not supported by Gemini)
Tool Choice Mapping
| OpenAI | Gemini |
|---|---|
"auto" | AUTO (default) |
"none" | NONE |
"required" | ANY |
| Specific tool | ANY with allowedFunctionNames |
Response Conversion
Field Mapping
-
finishReason→finish_reason:STOP→stopMAX_TOKENS→lengthSAFETY,RECITATION,LANGUAGE,BLOCKLIST,PROHIBITED_CONTENT,SPII,IMAGE_SAFETY→content_filterMALFORMED_FUNCTION_CALL,UNEXPECTED_TOOL_CALL→tool_calls
-
candidates[0].content.parts[0].text→choices[0].message.content(if single text block) -
candidates[0].content.parts[].functionCall→choices[0].message.tool_calls -
promptTokenCount→usage.prompt_tokens -
candidatesTokenCount→usage.completion_tokens -
totalTokenCount→usage.total_tokens -
cachedContentTokenCount→usage.prompt_tokens_details.cached_tokens -
thoughtsTokenCount→usage.completion_tokens_details.reasoning_tokens -
Thought content (from
textparts withthought: true) →reasoningfield in stream deltas -
Function call
args(map) → JSON stringarguments
Streaming
Event structure:- Streaming responses contain deltas in
delta.content(text),delta.reasoning(thoughts),delta.toolCalls(function calls) - Function responses appear as text content in the delta
finish_reasononly set on final chunk- Usage metadata only included in final chunk
2. Responses API
The Responses API uses the same underlying/generateContent endpoint but converts between OpenAI’s Responses format and Gemini’s Messages format.
Request Parameters
Parameter Mapping
| Parameter | Transformation |
|---|---|
max_output_tokens | Renamed to maxOutputTokens |
temperature, top_p | Direct pass-through |
instructions | Converted to system instruction text |
input (string or array) | Converted to messages |
tools | Schema restructured (see Chat Completions) |
tool_choice | Type mapped (see Chat Completions) |
reasoning | Mapped to thinkingConfig (see Reasoning / Thinking) |
text | Maps to responseMimeType and responseSchema |
stop | Via extra_params, renamed to stopSequences |
top_k | Via extra_params |
Extra Parameters
Useextra_params (SDK) or pass directly in request body (Gateway):
- Gateway
- Go SDK
Input & Instructions
- Input: String wrapped as user message or array converted to messages
- Instructions: Becomes system instruction (single text block)
Tool Support
Supported types:function, computer_use_preview, web_search, mcp
Tool conversions same as Chat Completions with:
- Computer tools auto-configured (if specified in Bifrost request)
- Function-based tools always enabled
Response Conversion
finishReason→status:STOP/MAX_TOKENS/other →completed|SAFETY→incomplete- Output items conversion:
- Text parts →
messagefield - Function calls →
function_callfield - Thought content →
reasoningfield
- Text parts →
- Usage fields preserved with cache tokens mapped to
*_tokens_details.cached_tokens
Streaming
Event structure: Similar to Chat Completions streamingcontent_part.addedemitted for text and reasoning parts- Item IDs generated as
msg_{responseID}_item_{outputIndex}
3. Speech (Text-to-Speech)
Speech synthesis uses the underlying chat generation endpoint with audio response modality.Request Parameters
| Parameter | Transformation |
|---|---|
input | Text to synthesize → contents[0].parts[0].text |
voice | Voice name → generationConfig.speechConfig.voiceConfig.prebuiltVoiceConfig.voiceName |
response_format | Only “wav” supported (default); auto-converted from PCM |
Voice Configuration
Single Voice:Response Conversion
- Audio data extracted from
candidates[0].content.parts[].inlineData - Format conversion: Gemini returns PCM audio (s16le, 24kHz, mono)
- Auto-conversion: PCM → WAV when
response_format: "wav"(default) - Raw audio returned if
response_formatis omitted or empty string
Supported Voices
Common Gemini voices include:Chant-Female- Female voiceChant-Male- Male voice- Additional voices depend on model capabilities
4. Transcriptions (Speech-to-Text)
Transcriptions are implemented as chat completions with audio content and text prompts.Request Parameters
| Parameter | Transformation |
|---|---|
file | Audio bytes → contents[].parts[].inlineData |
prompt | Instructions → contents[0].parts[0].text (defaults to “Generate a transcript of the speech.”) |
language | Via extra_params (if supported by model) |
Audio Input Handling
Audio is sent as inline data with auto-detected MIME type:Extra Parameters
Safety settings and caching can be configured:- Gateway
- Go SDK
Response Conversion
- Transcribed text extracted from
candidates[0].content.parts[].text taskset to"transcribe"- Usage metadata mapped:
promptTokenCount→input_tokenscandidatesTokenCount→output_tokenstotalTokenCount→total_tokens
5. Embeddings
Supports both single text and batch text embeddings via batch requests.
input→requests[0].content.parts[0].text(single text joins arrays with space)dimensions→outputDimensionality- Extra task type and title via
extra_params
embeddings[].values→ Bifrost embedding arraymetadata.billableCharacterCount→ Usage prompt tokens (fallback)- Token counts extracted from usage metadata
6. Batch API
Request formats: Inline requests array or file-based input Pagination: Token-based withpageToken
Endpoints:
- POST
/v1beta/batchJobs- Create - GET
/v1beta/batchJobs?pageSize={limit}&pageToken={token}- List - GET
/v1beta/batchJobs/{batch_id}- Retrieve - POST
/v1beta/batchJobs/{batch_id}:cancel- Cancel
- Status mapping:
BATCH_STATE_PENDING/BATCH_STATE_RUNNING→in_progress,BATCH_STATE_SUCCEEDED→completed,BATCH_STATE_FAILED→failed,BATCH_STATE_CANCELLING→cancelling,BATCH_STATE_CANCELLED→cancelled,BATCH_STATE_EXPIRED→expired - Inline responses: Array in
dest.inlinedResponses - File-based responses: JSONL file in
dest.fileName
7. Files API
Supports file upload for batch processing and multimodal requests.
file (binary) and filename (optional)
Field mapping:
name→iddisplayName→filenamesizeBytes→size_bytesmimeType→content_typecreateTime(RFC3339) → Converted to Unix timestamp
- POST
/upload/storage/v1beta/files- Upload - GET
/v1beta/files?limit={limit}&pageToken={token}(cursor pagination) - GET
/v1beta/files/{file_id}- Retrieve - DELETE
/v1beta/files/{file_id}- Delete - GET
/v1beta/files/{file_id}/content- Download
8. List Models
Request: GET/v1beta/models?pageSize={limit}&pageToken={token} (no body)
Field mapping:
name(remove “models/” prefix) →id(add “gemini/” prefix)displayName→namedescription→descriptioninputTokenLimit→max_input_tokensoutputTokenLimit→max_output_tokens- Context length =
inputTokenLimit + outputTokenLimit
nextPageToken
Content Type Support
Bifrost supports the following content modalities through Gemini:| Content Type | Support | Notes |
|---|---|---|
| Text | ✅ | Full support |
| Images (URL/Base64) | ✅ | Converted to {type: "image", source: {...}} |
| Video | ✅ | With fps, start/end offset metadata |
| Audio | ⚠️ | Via file references only |
| ✅ | Via file references | |
| Code Execution | ✅ | Auto-executed with results returned |
| Thinking/Reasoning | ✅ | Thought parts marked with thought: true |
| Function Calls | ✅ | With optional thought signatures |
Caveats
Tool Response Grouping
Tool Response Grouping
Severity: High
Behavior: Consecutive tool response messages merged into single user message
Impact: Message count and structure changes
Code:
chat.go:627-678Thinking Content Handling
Thinking Content Handling
Severity: Medium
Behavior: Thought content appears as
text parts with thought: true flag
Impact: Requires checking thought flag to distinguish from regular text
Code: chat.go:242-244, 302-304Function Call Arguments Serialization
Function Call Arguments Serialization
Severity: Low
Behavior: Tool call
args (object) converted to arguments (JSON string)
Impact: Requires JSON parsing to access arguments
Code: chat.go:101-106Thought Signature Base64 Encoding
Thought Signature Base64 Encoding
Severity: Low
Behavior:
thoughtSignature base64 URL-safe encoded, auto-converted during unmarshal
Impact: Transparent to user; handled automatically
Code: types.go:1048-1063Streaming Finish Reason Timing
Streaming Finish Reason Timing
Severity: Medium
Behavior:
finish_reason only present in final stream chunk with usage metadata
Impact: Cannot determine completion until end of stream
Code: chat.go:206-208, 325-328Cached Content Token Reporting
Cached Content Token Reporting
Severity: Low
Behavior: Cached tokens reported in
prompt_tokens_details.cached_tokens, cannot distinguish cache creation vs read
Impact: Billing estimates may be approximate
Code: utils.go:270-274System Instruction Integration
System Instruction Integration
Severity: Medium
Behavior: System instructions become
systemInstruction field (separate from messages), not included in message array
Impact: Structure differs from OpenAI’s system message approach
Code: responses.go:34-46
