Skip to main content

Overview

Google Gemini’s API has different structure from OpenAI. Bifrost performs extensive conversion including:
  • Role remapping - “assistant” → “model”, system messages integrated into main flow
  • Message grouping - Consecutive tool responses merged into single user message
  • Parameter renaming - e.g., max_completion_tokensmaxOutputTokens, stopstopSequences
  • Function call handling - Tool call ID preservation and thought signature support
  • Content modality - Support for text, images, video, code execution, and thought content
  • Thinking/Reasoning - Thinking configuration mapped to Bifrost reasoning structure

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/v1beta/models/{model}:generateContent
Responses API/v1beta/models/{model}:generateContent
Speech (TTS)/v1beta/models/{model}:generateContent
Transcriptions (STT)/v1beta/models/{model}:generateContent
Image Generation-/v1beta/models/{model}:generateContent or /v1beta/models/{model}:predict (Imagen)
Image Edit-/v1beta/models/{model}:generateContent or /v1beta/models/{model}:predict (Imagen)
Image Variation-Not supported
Embeddings-/v1beta/models/{model}:embedContent
Files-/upload/storage/v1beta/files
Batch-/v1beta/batchJobs
List Models-/v1beta/models

Authentication

Gemini supports API key authentication in addition to OAuth2 Bearer token authentication. The implementation conditionally uses the appropriate method based on the endpoint type.

API Key Authentication

API key authentication is supported via two methods:
  1. Header Method (standard Gemini endpoints):
    • Format: x-goog-api-key: YOUR_API_KEY header
    • Used for: Standard Gemini endpoints (e.g., /v1beta/models/{model}:generateContent)
  2. Query Parameter Method (Imagen and custom endpoints):
    • Format: ?key=YOUR_API_KEY appended to request URLs
    • Used for: Imagen models and custom endpoints
    • Example: https://generativelanguage.googleapis.com/v1beta/models/imagen-4.0-generate-001:predict?key=YOUR_API_KEY
Bifrost automatically selects the appropriate authentication method based on the endpoint type.

1. Chat Completions

Request Parameters

Parameter Mapping

ParameterTransformation
max_completion_tokensRenamed to maxOutputTokens
temperature, top_pDirect pass-through
stopRenamed to stopSequences
response_formatConverted to responseMimeType and responseJsonSchema
toolsSchema restructured (see Tool Conversion)
tool_choiceMapped to functionCallingConfig (see Tool Conversion)
reasoningMapped to thinkingConfig (see Reasoning / Thinking)
top_kVia extra_params (Gemini-specific)
presence_penalty, frequency_penaltyVia extra_params
seedVia extra_params

Dropped Parameters

The following parameters are silently ignored: logit_bias, logprobs, top_logprobs, parallel_tool_calls, service_tier

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway) for Gemini-specific fields:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40,
    "stop_sequences": ["###"]
  }'

Reasoning / Thinking

Documentation: See Bifrost Reasoning Reference

Parameter Mapping

  • reasoning.effortthinkingConfig.thinkingLevel (“low” → LOW, “high” → HIGH)
  • reasoning.max_tokensthinkingConfig.thinkingBudget (token budget for thinking)
  • reasoning parameter triggers thinkingConfig.includeThoughts = true

Supported Thinking Levels

  • "low" / "minimal"LOW
  • "medium" / "high"HIGH
  • null or unspecified → Based on max_tokens: -1 (dynamic), 0 (disabled), or specific budget

Example

// Request
{"reasoning": {"effort": "high", "max_tokens": 10000}}

// Gemini conversion
{"thinkingConfig": {"includeThoughts": true, "thinkingLevel": "HIGH", "thinkingBudget": 10000}}

Message Conversion

Critical Caveats

  • Role remapping: “assistant” → “model”, “system” → part of user/model content flow
  • Consecutive tool responses: Tool response messages merged into single user message with function response parts
  • Content flattening: Multi-part content in single message preserved as parts array

Image Conversion

  • URL images: {type: "image_url", image_url: {url: "..."}}{type: "image", source: {type: "url", url: "..."}}
  • Base64 images: Data URL → {type: "image", source: {type: "base64", media_type: "image/png", ...}}
  • Video content: Preserved with metadata (fps, start/end offset)

Tool Conversion

Tool definitions are restructured with these mappings:
  • function.namefunctionDeclarations.name (preserved)
  • function.parametersfunctionDeclarations.parameters (Schema format)
  • function.descriptionfunctionDeclarations.description
  • function.strict → Dropped (not supported by Gemini)

Tool Choice Mapping

OpenAIGemini
"auto"AUTO (default)
"none"NONE
"required"ANY
Specific toolANY with allowedFunctionNames

Response Conversion

Field Mapping

  • finishReasonfinish_reason:
    • STOPstop
    • MAX_TOKENSlength
    • SAFETY, RECITATION, LANGUAGE, BLOCKLIST, PROHIBITED_CONTENT, SPII, IMAGE_SAFETYcontent_filter
    • MALFORMED_FUNCTION_CALL, UNEXPECTED_TOOL_CALLtool_calls
  • candidates[0].content.parts[0].textchoices[0].message.content (if single text block)
  • candidates[0].content.parts[].functionCallchoices[0].message.tool_calls
  • promptTokenCountusage.prompt_tokens
  • candidatesTokenCountusage.completion_tokens
  • totalTokenCountusage.total_tokens
  • cachedContentTokenCountusage.prompt_tokens_details.cached_tokens
  • thoughtsTokenCountusage.completion_tokens_details.reasoning_tokens
  • Thought content (from text parts with thought: true) → reasoning field in stream deltas
  • Function call args (map) → JSON string arguments

Streaming

Event structure:
  • Streaming responses contain deltas in delta.content (text), delta.reasoning (thoughts), delta.toolCalls (function calls)
  • Function responses appear as text content in the delta
  • finish_reason only set on final chunk
  • Usage metadata only included in final chunk

2. Responses API

The Responses API uses the same underlying /generateContent endpoint but converts between OpenAI’s Responses format and Gemini’s Messages format.

Request Parameters

Parameter Mapping

ParameterTransformation
max_output_tokensRenamed to maxOutputTokens
temperature, top_pDirect pass-through
instructionsConverted to system instruction text
input (string or array)Converted to messages
toolsSchema restructured (see Chat Completions)
tool_choiceType mapped (see Chat Completions)
reasoningMapped to thinkingConfig (see Reasoning / Thinking)
textMaps to responseMimeType and responseJsonSchema
stopVia extra_params, renamed to stopSequences
top_kVia extra_params

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway):
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash",
    "input": "Hello, how are you?",
    "instructions": "You are a helpful assistant.",
    "top_k": 40
  }'

Input & Instructions

  • Input: String wrapped as user message or array converted to messages
  • Instructions: Becomes system instruction (single text block)

Tool Support

Supported types: function, computer_use_preview, web_search, mcp Tool conversions same as Chat Completions with:
  • Computer tools auto-configured (if specified in Bifrost request)
  • Function-based tools always enabled

Response Conversion

  • finishReasonstatus: STOP/MAX_TOKENS/other → completed | SAFETYincomplete
  • Output items conversion:
    • Text parts → message field
    • Function calls → function_call field
    • Thought content → reasoning field
  • Usage fields preserved with cache tokens mapped to *_tokens_details.cached_tokens

Streaming

Event structure: Similar to Chat Completions streaming
  • content_part.added emitted for text and reasoning parts
  • Item IDs generated as msg_{responseID}_item_{outputIndex}

3. Speech (Text-to-Speech)

Speech synthesis uses the underlying chat generation endpoint with audio response modality.

Request Parameters

ParameterTransformation
inputText to synthesize → contents[0].parts[0].text
voiceVoice name → generationConfig.speechConfig.voiceConfig.prebuiltVoiceConfig.voiceName
response_formatOnly “wav” supported (default); auto-converted from PCM

Voice Configuration

Single Voice:
{
  "generationConfig": {
    "responseModalities": ["AUDIO"],
    "speechConfig": {
      "voiceConfig": {
        "prebuiltVoiceConfig": {
          "voiceName": "Chant-Female"
        }
      }
    }
  }
}
Multi-Speaker:
{
  "generationConfig": {
    "responseModalities": ["AUDIO"],
    "speechConfig": {
      "multiSpeakerVoiceConfig": {
        "speakerVoiceConfigs": [
          {
            "speaker": "Character A",
            "voiceConfig": {
              "prebuiltVoiceConfig": {
                "voiceName": "Chant-Female"
              }
            }
          }
        ]
      }
    }
  }
}

Response Conversion

  • Audio data extracted from candidates[0].content.parts[].inlineData
  • Format conversion: Gemini returns PCM audio (s16le, 24kHz, mono)
  • Auto-conversion: PCM → WAV when response_format: "wav" (default)
  • Raw audio returned if response_format is omitted or empty string

Supported Voices

Common Gemini voices include:
  • Chant-Female - Female voice
  • Chant-Male - Male voice
  • Additional voices depend on model capabilities
Check model documentation for complete list of supported voices.

4. Transcriptions (Speech-to-Text)

Transcriptions are implemented as chat completions with audio content and text prompts.

Request Parameters

ParameterTransformation
fileAudio bytes → contents[].parts[].inlineData
promptInstructions → contents[0].parts[0].text (defaults to “Generate a transcript of the speech.”)
languageVia extra_params (if supported by model)

Audio Input Handling

Audio is sent as inline data with auto-detected MIME type:
{
  "contents": [
    {
      "parts": [
        {
          "text": "<prompt text>"
        },
        {
          "inlineData": {
            "mimeType": "audio/wav",
            "data": "<base64-encoded-audio>"
          }
        }
      ]
    }
  ]
}

Extra Parameters

Safety settings and caching can be configured:
curl -X POST http://localhost:8080/v1/audio/transcriptions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash",
    "file": "<binary-audio-data>",
    "prompt": "Transcribe this audio in the original language."
  }'

Response Conversion

  • Transcribed text extracted from candidates[0].content.parts[].text
  • task set to "transcribe"
  • Usage metadata mapped:
    • promptTokenCountinput_tokens
    • candidatesTokenCountoutput_tokens
    • totalTokenCounttotal_tokens

5. Embeddings

Supports both single text and batch text embeddings via batch requests.
Request Parameters:
  • inputrequests[0].content.parts[0].text (single text joins arrays with space)
  • dimensionsoutputDimensionality
  • Extra task type and title via extra_params
Response Mapping:
  • embeddings[].values → Bifrost embedding array
  • metadata.billableCharacterCount → Usage prompt tokens (fallback)
  • Token counts extracted from usage metadata

6. Batch API

Request formats: Inline requests array or file-based input Pagination: Token-based with pageToken Endpoints:
  • POST /v1beta/batchJobs - Create
  • GET /v1beta/batchJobs?pageSize={limit}&pageToken={token} - List
  • GET /v1beta/batchJobs/{batch_id} - Retrieve
  • POST /v1beta/batchJobs/{batch_id}:cancel - Cancel
Response Structure:
  • Status mapping: BATCH_STATE_PENDING/BATCH_STATE_RUNNINGin_progress, BATCH_STATE_SUCCEEDEDcompleted, BATCH_STATE_FAILEDfailed, BATCH_STATE_CANCELLINGcancelling, BATCH_STATE_CANCELLEDcancelled, BATCH_STATE_EXPIREDexpired
  • Inline responses: Array in dest.inlinedResponses
  • File-based responses: JSONL file in dest.fileName
Note: RFC3339 timestamps converted to Unix timestamps

7. Files API

Supports file upload for batch processing and multimodal requests.
Upload: Multipart/form-data with file (binary) and filename (optional) Field mapping:
  • nameid
  • displayNamefilename
  • sizeBytessize_bytes
  • mimeTypecontent_type
  • createTime (RFC3339) → Converted to Unix timestamp
Endpoints:
  • POST /upload/storage/v1beta/files - Upload
  • GET /v1beta/files?limit={limit}&pageToken={token} (cursor pagination)
  • GET /v1beta/files/{file_id} - Retrieve
  • DELETE /v1beta/files/{file_id} - Delete
  • GET /v1beta/files/{file_id}/content - Download

8. Image Generation

Gemini supports two image generation formats depending on the model:
  1. Standard Gemini Format: Uses the /v1beta/models/{model}:generateContent endpoint
  2. Imagen Format: Uses the /v1beta/models/{model}:predict endpoint for Imagen models (detected automatically)

Parameter Mapping

ParameterTransformation
promptText description of the image to generate
nNumber of images (mapped to sampleCount for Imagen, candidateCount for Gemini)
sizeImage size in WxH format (e.g., "1024x1024"). Converted to Imagen’s imageSize + aspectRatio format
output_formatOutput format: "png", "jpeg", "webp". Converted to MIME type for Imagen
seedSeed for reproducible generation (passed directly)
negative_promptNegative prompt (passed directly)

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway) for Gemini-specific fields:
ParameterTypeNotes
personGenerationstringPerson generation setting (Imagen only)
languagestringLanguage code (Imagen only)
enhancePromptboolPrompt enhancement flag (Imagen only)
safetySettings / safety_settingsstring/arraySafety settings configuration
cachedContent / cached_contentstringCached content ID
labelsobjectCustom labels map
curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/imagen-4.0-generate-001",
    "prompt": "A sunset over the mountains",
    "size": "1024x1024",
    "n": 2,
    "output_format": "png"
  }'

Request Conversion

Standard Gemini Format

  • Model mapping: bifrostReq.Modelreq.Model, with bifrostReq.Input.Promptreq.Contents[0].Parts[0].Text
  • Response modality: Set by bifrost internally to generationConfig.responseModalities = ["IMAGE"] to indicate image generation
  • Image count: Specify number of images via ngenerationConfig.candidateCount
  • Extra parameters: Include safetySettings, cachedContent, and labels mapped directly

Imagen Format

  • Prompt: bifrostReq.Promptreq.Instances[0].Prompt
  • Number of Images: nreq.Parameters.SampleCount
  • Size Conversion: size (WxH format) converted to:
    • imageSize: "1k" (if dimensions ≤ 1024), "2k" (if dimensions ≤ 2048). Sizes larger than "2k" are not supported by Imagen models.
    • aspectRatio: "1:1", "3:4", "4:3", "9:16", or "16:9" (based on width/height ratio)
  • Output Format: output_format ("png", "jpeg") → parameters.outputOptions.mimeType ("image/png", "image/jpeg")
  • Seed & Negative Prompt: Passed directly to seed and parameters.negativePrompt
  • Extra Parameters: personGeneration, language, enhancePrompt, safetySettings mapped to parameters

Response Conversion

Standard Gemini Format

  • Image Data: Extracts InlineData from candidates[0].content.parts[] with MIME type image/*
  • Output Format: Converts MIME type (image/png, image/jpeg, image/webp) → file extension (png, jpeg, webp)
  • Usage: Extracts token usage from usageMetadata
  • Multiple Images: Each image part becomes an ImageData entry in the response array

Imagen Format

  • Image Data: Each prediction in response.predictions[]ImageData with b64_json from bytesBase64Encoded
  • Output Format: Converts prediction.mimeType → file extension for outputFormat field (Imagen doesnt support webp)
  • Index: Each prediction gets an index (0, 1, 2, …) in the response array

Size Conversion

For Imagen format, size is converted between formats: Supported Image Sizes: "1k" (≤1024), "2k" (≤2048) Supported Aspect Ratios: "1:1", "3:4", "4:3", "9:16", "16:9"

Endpoint Selection

The provider automatically selects the endpoint based on model name:
  • Imagen models (detected via schemas.IsImagenModel()): Uses /v1beta/models/{model}:predict endpoint
  • Other models: Uses /v1beta/models/{model}:generateContent endpoint with image response modality

Streaming

Image generation streaming is not supported by Gemini.

9. Image Edit

Requests use multipart/form-data, not JSON.
Gemini supports image editing through two different APIs depending on the model:
  1. Standard Gemini Format: Uses the /v1beta/models/{model}:generateContent endpoint (for Gemini models)
  2. Imagen Format: Uses the /v1beta/models/{model}:predict endpoint (for Imagen models, detected automatically)
Request Parameters
ParameterTypeRequiredNotes
modelstringModel identifier (Gemini or Imagen model)
promptstringText description of the edit
image[]binaryImage file(s) to edit (supports multiple images)
maskbinaryMask image file
typestringEdit type: "inpainting", "outpainting", "inpaint_removal", "bgswap" (Imagen only)
nintNumber of images to generate (1-10)
output_formatstringOutput format: "png", "webp", "jpeg"
output_compressionintCompression level (0-100%)
seedintSeed for reproducibility (via ExtraParams["seed"])
negative_promptstringNegative prompt (via ExtraParams["negativePrompt"])
guidanceScaleintGuidance scale (via ExtraParams["guidanceScale"], Imagen only)
baseStepsintBase steps (via ExtraParams["baseSteps"], Imagen only)
maskModestringMask mode (via ExtraParams["maskMode"], Imagen only): "MASK_MODE_USER_PROVIDED", "MASK_MODE_BACKGROUND", "MASK_MODE_FOREGROUND", "MASK_MODE_SEMANTIC"
dilationfloatMask dilation (via ExtraParams["dilation"], Imagen only): Range [0, 1]
maskClassesint[]Mask classes (via ExtraParams["maskClasses"], Imagen only): For MASK_MODE_SEMANTIC

Request Conversion

Standard Gemini Format (Non-Imagen Models)

  • Model & Prompt: bifrostReq.Modelreq.Model, bifrostReq.Input.Promptreq.Contents[0].Parts[0].Text
  • Images: Each image in bifrostReq.Input.Images is converted to a Part with:
    • MIME type detection (image/jpeg, image/webp, image/png) with fallback to image/png
    • Base64 encoding: image.ImagePart.InlineData.Data (base64 string)
    • MIME type: Part.InlineData.MIMEType
  • Response Modality: GenerationConfig.ResponseModalities is set to [ModalityImage] to indicate image generation
  • Extra Parameters: Extracted from ExtraParams:
    • safetySettings / safety_settingsSafetySettings
    • cachedContent / cached_contentCachedContent
    • labelsLabels (map[string]string)

Imagen Format (Imagen Models)

  • Reference Images: Each image in bifrostReq.Input.Images is converted to ReferenceImage with:
    • ReferenceType: "REFERENCE_TYPE_RAW"
    • ReferenceID: Sequential IDs starting from 1
    • ReferenceImage.BytesBase64Encoded: Base64-encoded image data
  • Mask Configuration: If Params.Mask is provided or maskMode is specified:
    • Default maskMode: "MASK_MODE_USER_PROVIDED" when mask data is present
    • maskMode can be overridden via ExtraParams["maskMode"]
    • dilation extracted from ExtraParams["dilation"] (validated to range [0, 1])
    • maskClasses extracted from ExtraParams["maskClasses"] (for MASK_MODE_SEMANTIC)
    • Mask image (if provided) is base64-encoded and added as ReferenceType: "REFERENCE_TYPE_MASK"
  • Edit Mode Mapping: Params.Type is mapped to EditMode:
    • "inpainting""EDIT_MODE_INPAINT_INSERTION"
    • "outpainting""EDIT_MODE_OUTPAINT"
    • "inpaint_removal""EDIT_MODE_INPAINT_REMOVAL"
    • "bgswap""EDIT_MODE_BGSWAP"
    • If Type is not set, editMode can be specified directly via ExtraParams["editMode"]
  • Parameters:
    • nParameters.SampleCount
    • output_formatParameters.OutputOptions.MimeType (converted: "png""image/png", etc.)
    • output_compressionParameters.OutputOptions.CompressionQuality
    • seed (via ExtraParams["seed"]) → Parameters.Seed
    • negativePrompt (via ExtraParams["negativePrompt"]) → Parameters.NegativePrompt
    • guidanceScale (via ExtraParams["guidanceScale"]) → Parameters.GuidanceScale
    • baseSteps (via ExtraParams["baseSteps"]) → Parameters.BaseSteps
    • Additional Imagen-specific parameters: addWatermark, includeRaiReason, includeSafetyAttributes, personGeneration, safetySetting, language, storageUri
Response Conversion
  • Standard Gemini Format: Uses the same response conversion as image generation (see Image Generation section)
  • Imagen Format: Uses the same response conversion as Imagen image generation (see Image Generation section)
Endpoint Selection The provider automatically selects the endpoint based on model name:
  • Imagen models (detected via schemas.IsImagenModel()): Uses /v1beta/models/{model}:predict endpoint
  • Other models: Uses /v1beta/models/{model}:generateContent endpoint with image response modality
Streaming Image edit streaming is not supported by Gemini. Image Variation Image variation is not supported by Gemini.

10. List Models

Request: GET /v1beta/models?pageSize={limit}&pageToken={token} (no body) Field mapping:
  • name (remove “models/” prefix) → id (add “gemini/” prefix)
  • displayNamename
  • descriptiondescription
  • inputTokenLimitmax_input_tokens
  • outputTokenLimitmax_output_tokens
  • Context length = inputTokenLimit + outputTokenLimit
Pagination: Token-based with nextPageToken

Content Type Support

Bifrost supports the following content modalities through Gemini:
Content TypeSupportNotes
TextFull support
Images (URL/Base64)Converted to {type: "image", source: {...}}
VideoWith fps, start/end offset metadata
Audio⚠️Via file references only
PDFVia file references
Code ExecutionAuto-executed with results returned
Thinking/ReasoningThought parts marked with thought: true
Function CallsWith optional thought signatures

Caveats

Severity: High Behavior: Consecutive tool response messages merged into single user message Impact: Message count and structure changes Code: chat.go:627-678
Severity: Medium Behavior: Thought content appears as text parts with thought: true flag Impact: Requires checking thought flag to distinguish from regular text Code: chat.go:242-244, 302-304
Severity: Low Behavior: Tool call args (object) converted to arguments (JSON string) Impact: Requires JSON parsing to access arguments Code: chat.go:101-106
Severity: Low Behavior: thoughtSignature base64 URL-safe encoded, auto-converted during unmarshal Impact: Transparent to user; handled automatically Code: types.go:1048-1063
Severity: Medium Behavior: finish_reason only present in final stream chunk with usage metadata Impact: Cannot determine completion until end of stream Code: chat.go:206-208, 325-328
Severity: Low Behavior: Cached tokens reported in prompt_tokens_details.cached_tokens, cannot distinguish cache creation vs read Impact: Billing estimates may be approximate Code: utils.go:270-274
Severity: Medium Behavior: System instructions become systemInstruction field (separate from messages), not included in message array Impact: Structure differs from OpenAI’s system message approach Code: responses.go:34-46