Skip to main content

Overview

Google Gemini’s API has different structure from OpenAI. Bifrost performs extensive conversion including:
  • Role remapping - “assistant” → “model”, system messages integrated into main flow
  • Message grouping - Consecutive tool responses merged into single user message
  • Parameter renaming - e.g., max_completion_tokensmaxOutputTokens, stopstopSequences
  • Function call handling - Tool call ID preservation and thought signature support
  • Content modality - Support for text, images, video, code execution, and thought content
  • Thinking/Reasoning - Thinking configuration mapped to Bifrost reasoning structure

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/v1beta/models/{model}:generateContent
Responses API/v1beta/models/{model}:generateContent
Speech (TTS)/v1beta/models/{model}:generateContent
Transcriptions (STT)/v1beta/models/{model}:generateContent
Embeddings-/v1beta/models/{model}:embedContent
Files-/upload/storage/v1beta/files
Batch-/v1beta/batchJobs
List Models-/v1beta/models

1. Chat Completions

Request Parameters

Parameter Mapping

ParameterTransformation
max_completion_tokensRenamed to maxOutputTokens
temperature, top_pDirect pass-through
stopRenamed to stopSequences
response_formatConverted to responseMimeType and responseSchema
toolsSchema restructured (see Tool Conversion)
tool_choiceMapped to functionCallingConfig (see Tool Conversion)
reasoningMapped to thinkingConfig (see Reasoning / Thinking)
top_kVia extra_params (Gemini-specific)
presence_penalty, frequency_penaltyVia extra_params
seedVia extra_params

Dropped Parameters

The following parameters are silently ignored: logit_bias, logprobs, top_logprobs, parallel_tool_calls, service_tier

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway) for Gemini-specific fields:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40,
    "stop_sequences": ["###"]
  }'

Reasoning / Thinking

Documentation: See Bifrost Reasoning Reference

Parameter Mapping

  • reasoning.effortthinkingConfig.thinkingLevel (“low” → LOW, “high” → HIGH)
  • reasoning.max_tokensthinkingConfig.thinkingBudget (token budget for thinking)
  • reasoning parameter triggers thinkingConfig.includeThoughts = true

Supported Thinking Levels

  • "low" / "minimal"LOW
  • "medium" / "high"HIGH
  • null or unspecified → Based on max_tokens: -1 (dynamic), 0 (disabled), or specific budget

Example

// Request
{"reasoning": {"effort": "high", "max_tokens": 10000}}

// Gemini conversion
{"thinkingConfig": {"includeThoughts": true, "thinkingLevel": "HIGH", "thinkingBudget": 10000}}

Message Conversion

Critical Caveats

  • Role remapping: “assistant” → “model”, “system” → part of user/model content flow
  • Consecutive tool responses: Tool response messages merged into single user message with function response parts
  • Content flattening: Multi-part content in single message preserved as parts array

Image Conversion

  • URL images: {type: "image_url", image_url: {url: "..."}}{type: "image", source: {type: "url", url: "..."}}
  • Base64 images: Data URL → {type: "image", source: {type: "base64", media_type: "image/png", ...}}
  • Video content: Preserved with metadata (fps, start/end offset)

Tool Conversion

Tool definitions are restructured with these mappings:
  • function.namefunctionDeclarations.name (preserved)
  • function.parametersfunctionDeclarations.parameters (Schema format)
  • function.descriptionfunctionDeclarations.description
  • function.strict → Dropped (not supported by Gemini)

Tool Choice Mapping

OpenAIGemini
"auto"AUTO (default)
"none"NONE
"required"ANY
Specific toolANY with allowedFunctionNames

Response Conversion

Field Mapping

  • finishReasonfinish_reason:
    • STOPstop
    • MAX_TOKENSlength
    • SAFETY, RECITATION, LANGUAGE, BLOCKLIST, PROHIBITED_CONTENT, SPII, IMAGE_SAFETYcontent_filter
    • MALFORMED_FUNCTION_CALL, UNEXPECTED_TOOL_CALLtool_calls
  • candidates[0].content.parts[0].textchoices[0].message.content (if single text block)
  • candidates[0].content.parts[].functionCallchoices[0].message.tool_calls
  • promptTokenCountusage.prompt_tokens
  • candidatesTokenCountusage.completion_tokens
  • totalTokenCountusage.total_tokens
  • cachedContentTokenCountusage.prompt_tokens_details.cached_tokens
  • thoughtsTokenCountusage.completion_tokens_details.reasoning_tokens
  • Thought content (from text parts with thought: true) → reasoning field in stream deltas
  • Function call args (map) → JSON string arguments

Streaming

Event structure:
  • Streaming responses contain deltas in delta.content (text), delta.reasoning (thoughts), delta.toolCalls (function calls)
  • Function responses appear as text content in the delta
  • finish_reason only set on final chunk
  • Usage metadata only included in final chunk

2. Responses API

The Responses API uses the same underlying /generateContent endpoint but converts between OpenAI’s Responses format and Gemini’s Messages format.

Request Parameters

Parameter Mapping

ParameterTransformation
max_output_tokensRenamed to maxOutputTokens
temperature, top_pDirect pass-through
instructionsConverted to system instruction text
input (string or array)Converted to messages
toolsSchema restructured (see Chat Completions)
tool_choiceType mapped (see Chat Completions)
reasoningMapped to thinkingConfig (see Reasoning / Thinking)
textMaps to responseMimeType and responseSchema
stopVia extra_params, renamed to stopSequences
top_kVia extra_params

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway):
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash",
    "input": "Hello, how are you?",
    "instructions": "You are a helpful assistant.",
    "top_k": 40
  }'

Input & Instructions

  • Input: String wrapped as user message or array converted to messages
  • Instructions: Becomes system instruction (single text block)

Tool Support

Supported types: function, computer_use_preview, web_search, mcp Tool conversions same as Chat Completions with:
  • Computer tools auto-configured (if specified in Bifrost request)
  • Function-based tools always enabled

Response Conversion

  • finishReasonstatus: STOP/MAX_TOKENS/other → completed | SAFETYincomplete
  • Output items conversion:
    • Text parts → message field
    • Function calls → function_call field
    • Thought content → reasoning field
  • Usage fields preserved with cache tokens mapped to *_tokens_details.cached_tokens

Streaming

Event structure: Similar to Chat Completions streaming
  • content_part.added emitted for text and reasoning parts
  • Item IDs generated as msg_{responseID}_item_{outputIndex}

3. Speech (Text-to-Speech)

Speech synthesis uses the underlying chat generation endpoint with audio response modality.

Request Parameters

ParameterTransformation
inputText to synthesize → contents[0].parts[0].text
voiceVoice name → generationConfig.speechConfig.voiceConfig.prebuiltVoiceConfig.voiceName
response_formatOnly “wav” supported (default); auto-converted from PCM

Voice Configuration

Single Voice:
{
  "generationConfig": {
    "responseModalities": ["AUDIO"],
    "speechConfig": {
      "voiceConfig": {
        "prebuiltVoiceConfig": {
          "voiceName": "Chant-Female"
        }
      }
    }
  }
}
Multi-Speaker:
{
  "generationConfig": {
    "responseModalities": ["AUDIO"],
    "speechConfig": {
      "multiSpeakerVoiceConfig": {
        "speakerVoiceConfigs": [
          {
            "speaker": "Character A",
            "voiceConfig": {
              "prebuiltVoiceConfig": {
                "voiceName": "Chant-Female"
              }
            }
          }
        ]
      }
    }
  }
}

Response Conversion

  • Audio data extracted from candidates[0].content.parts[].inlineData
  • Format conversion: Gemini returns PCM audio (s16le, 24kHz, mono)
  • Auto-conversion: PCM → WAV when response_format: "wav" (default)
  • Raw audio returned if response_format is omitted or empty string

Supported Voices

Common Gemini voices include:
  • Chant-Female - Female voice
  • Chant-Male - Male voice
  • Additional voices depend on model capabilities
Check model documentation for complete list of supported voices.

4. Transcriptions (Speech-to-Text)

Transcriptions are implemented as chat completions with audio content and text prompts.

Request Parameters

ParameterTransformation
fileAudio bytes → contents[].parts[].inlineData
promptInstructions → contents[0].parts[0].text (defaults to “Generate a transcript of the speech.”)
languageVia extra_params (if supported by model)

Audio Input Handling

Audio is sent as inline data with auto-detected MIME type:
{
  "contents": [
    {
      "parts": [
        {
          "text": "<prompt text>"
        },
        {
          "inlineData": {
            "mimeType": "audio/wav",
            "data": "<base64-encoded-audio>"
          }
        }
      ]
    }
  ]
}

Extra Parameters

Safety settings and caching can be configured:
curl -X POST http://localhost:8080/v1/audio/transcriptions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash",
    "file": "<binary-audio-data>",
    "prompt": "Transcribe this audio in the original language."
  }'

Response Conversion

  • Transcribed text extracted from candidates[0].content.parts[].text
  • task set to "transcribe"
  • Usage metadata mapped:
    • promptTokenCountinput_tokens
    • candidatesTokenCountoutput_tokens
    • totalTokenCounttotal_tokens

5. Embeddings

Supports both single text and batch text embeddings via batch requests.
Request Parameters:
  • inputrequests[0].content.parts[0].text (single text joins arrays with space)
  • dimensionsoutputDimensionality
  • Extra task type and title via extra_params
Response Mapping:
  • embeddings[].values → Bifrost embedding array
  • metadata.billableCharacterCount → Usage prompt tokens (fallback)
  • Token counts extracted from usage metadata

6. Batch API

Request formats: Inline requests array or file-based input Pagination: Token-based with pageToken Endpoints:
  • POST /v1beta/batchJobs - Create
  • GET /v1beta/batchJobs?pageSize={limit}&pageToken={token} - List
  • GET /v1beta/batchJobs/{batch_id} - Retrieve
  • POST /v1beta/batchJobs/{batch_id}:cancel - Cancel
Response Structure:
  • Status mapping: BATCH_STATE_PENDING/BATCH_STATE_RUNNINGin_progress, BATCH_STATE_SUCCEEDEDcompleted, BATCH_STATE_FAILEDfailed, BATCH_STATE_CANCELLINGcancelling, BATCH_STATE_CANCELLEDcancelled, BATCH_STATE_EXPIREDexpired
  • Inline responses: Array in dest.inlinedResponses
  • File-based responses: JSONL file in dest.fileName
Note: RFC3339 timestamps converted to Unix timestamps

7. Files API

Supports file upload for batch processing and multimodal requests.
Upload: Multipart/form-data with file (binary) and filename (optional) Field mapping:
  • nameid
  • displayNamefilename
  • sizeBytessize_bytes
  • mimeTypecontent_type
  • createTime (RFC3339) → Converted to Unix timestamp
Endpoints:
  • POST /upload/storage/v1beta/files - Upload
  • GET /v1beta/files?limit={limit}&pageToken={token} (cursor pagination)
  • GET /v1beta/files/{file_id} - Retrieve
  • DELETE /v1beta/files/{file_id} - Delete
  • GET /v1beta/files/{file_id}/content - Download

8. List Models

Request: GET /v1beta/models?pageSize={limit}&pageToken={token} (no body) Field mapping:
  • name (remove “models/” prefix) → id (add “gemini/” prefix)
  • displayNamename
  • descriptiondescription
  • inputTokenLimitmax_input_tokens
  • outputTokenLimitmax_output_tokens
  • Context length = inputTokenLimit + outputTokenLimit
Pagination: Token-based with nextPageToken

Content Type Support

Bifrost supports the following content modalities through Gemini:
Content TypeSupportNotes
TextFull support
Images (URL/Base64)Converted to {type: "image", source: {...}}
VideoWith fps, start/end offset metadata
Audio⚠️Via file references only
PDFVia file references
Code ExecutionAuto-executed with results returned
Thinking/ReasoningThought parts marked with thought: true
Function CallsWith optional thought signatures

Caveats

Severity: High Behavior: Consecutive tool response messages merged into single user message Impact: Message count and structure changes Code: chat.go:627-678
Severity: Medium Behavior: Thought content appears as text parts with thought: true flag Impact: Requires checking thought flag to distinguish from regular text Code: chat.go:242-244, 302-304
Severity: Low Behavior: Tool call args (object) converted to arguments (JSON string) Impact: Requires JSON parsing to access arguments Code: chat.go:101-106
Severity: Low Behavior: thoughtSignature base64 URL-safe encoded, auto-converted during unmarshal Impact: Transparent to user; handled automatically Code: types.go:1048-1063
Severity: Medium Behavior: finish_reason only present in final stream chunk with usage metadata Impact: Cannot determine completion until end of stream Code: chat.go:206-208, 325-328
Severity: Low Behavior: Cached tokens reported in prompt_tokens_details.cached_tokens, cannot distinguish cache creation vs read Impact: Billing estimates may be approximate Code: utils.go:270-274
Severity: Medium Behavior: System instructions become systemInstruction field (separate from messages), not included in message array Impact: Structure differs from OpenAI’s system message approach Code: responses.go:34-46