Overview

Vertex AI is Google’s unified ML platform providing access to Google’s Gemini models, Anthropic Claude models, and other third-party LLMs through a single API. Bifrost performs conversions including:

Multi-model support - Unified interface for Gemini, Anthropic, and third-party models
OAuth2 authentication - Service account credentials with automatic token refresh
Project and region management - Automatic endpoint construction from GCP project/region
Model routing - Automatic provider detection (Gemini vs Anthropic) based on model name
Request conversion - Conversion to underlying provider format (Gemini or Anthropic)
Embeddings support - Vector generation with task type and truncation options
Model discovery - Paginated model listing with deployment information

Supported Operations

Operation	Non-Streaming	Streaming	Endpoint
Chat Completions	✅	✅	`/generate`
Responses API	✅	✅	`/messages`
Embeddings	✅	-	`/embeddings`
Image Generation	✅	-	`/generateContent` or `/predict` (Imagen)
Image Edit	✅	-	`/generateContent` or `/predict` (Imagen)
Image Variation	❌	-	Not supported
List Models	✅	-	`/models`
Text Completions	❌	❌	-
Speech (TTS)	❌	❌	-
Transcriptions (STT)	❌	❌	-
Files	❌	❌	-
Batch	❌	❌	-

Unsupported Operations (❌): Text Completions, Speech, Transcriptions, Files, and Batch are not supported by Vertex AI. These return UnsupportedOperationError.Vertex-specific: Endpoints vary by model type. Responses API available for both Gemini and Anthropic models.

1. Chat Completions

Request Parameters

Core Parameter Mapping

Parameter	Vertex Handling	Notes
`model`	Maps to Vertex model ID	Region-specific endpoint constructed automatically
All other params	Model-specific conversion	Converted per underlying provider (Gemini/Anthropic)

Key Configuration

The key configuration for Vertex requires Google Cloud credentials:

{
  "vertex_key_config": {
    "project_id": "my-gcp-project",
    "region": "us-central1",
    "auth_credentials": "{service-account-json}"
  }
}

Configuration Details:

project_id - GCP project ID (required)
region - GCP region for API endpoints (required)
- Examples: us-central1, us-west1, eu-west1, global
auth_credentials - Service account JSON credentials (optional if using default credentials)

Authentication Methods

Service Account JSON (recommended for production)
```
{"auth_credentials": "{full-service-account-json}"}
```
Application Default Credentials (for local development)
- Requires GOOGLE_APPLICATION_CREDENTIALS environment variable
- Leave auth_credentials empty

Gemini Models

When using Google’s Gemini models, Bifrost converts requests to Gemini’s API format.

Parameter Mapping for Gemini

All Gemini-compatible parameters are supported. Special handling includes:

System prompts: Converted to Gemini’s system message format
Tool usage: Mapped to Gemini’s function calling format
Streaming: Uses Gemini’s streaming protocol

Refer to Gemini documentation for detailed conversion details.

Anthropic Models (Claude)

When using Anthropic models through Vertex AI, Bifrost converts requests to Anthropic’s message format.

Parameter Mapping for Anthropic

All Anthropic-standard parameters are supported:

Reasoning/Thinking: reasoning parameters converted to thinking structure
System messages: Extracted and placed in separate system field
Tool message grouping: Consecutive tool messages merged
API version: Automatically set to vertex-2023-10-16 for Anthropic models

Refer to Anthropic documentation for detailed conversion details.

Special Notes for Vertex + Anthropic

Responses API uses special /v1/messages endpoint
anthropic_version automatically set to vertex-2023-10-16
Minimum reasoning budget: 1024 tokens
Model field removed from request (Vertex uses different identification)

Region Selection

The region determines the API endpoint:

Region	Endpoint	Purpose
`us-central1`	`us-central1-aiplatform.googleapis.com`	US Central
`us-west1`	`us-west1-aiplatform.googleapis.com`	US West
`eu-west1`	`eu-west1-aiplatform.googleapis.com`	Europe West
`global`	`aiplatform.googleapis.com`	Global (no region prefix)

Availability varies by region. Check GCP documentation for model availability.

Streaming

Streaming format depends on model type:

Gemini models: Standard Gemini streaming with server-sent events
Anthropic models: Anthropic message streaming format

2. Responses API

The Responses API is available for both Anthropic (Claude) and Gemini models on Vertex AI.

Request Parameters

Core Parameter Mapping

Parameter	Vertex Handling	Notes
`instructions`	Becomes system message	Model-specific conversion
`input`	Converted to messages	String or array support
`max_output_tokens`	Model-specific field mapping	Gemini vs Anthropic conversion
All other params	Model-specific conversion	Converted per underlying provider

Gemini Models

For Gemini models, conversion follows Gemini’s Responses API format.

Anthropic Models (Claude)

For Anthropic models, conversion follows Anthropic’s message format:

instructions becomes system message
reasoning mapped to thinking structure

Configuration

Gateway
Go SDK

curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex/claude-3-5-sonnet",
    "input": "What is AI?",
    "instructions": "You are a helpful assistant",
    "project_id": "my-gcp-project",
    "region": "us-central1"
  }' \
  -H "X-Goog-Authorization: Bearer {token}"

resp, err := client.ResponsesRequest(ctx, &schemas.BifrostResponsesRequest{
    Provider: schemas.Vertex,
    Model:    "claude-3-5-sonnet",
    Input:    messages,
    Params: &schemas.ResponsesParameters{
        Instructions: schemas.Ptr("You are a helpful assistant"),
    },
})

Special Handling

Endpoint: /v1/messages (Anthropic format)
anthropic_version set to vertex-2023-10-16 automatically
Model and region fields removed from request
Raw request body passthrough supported

Refer to Anthropic Responses API for parameter details.

3. Embeddings

Embeddings are supported for Gemini and other models that support embedding generation.

Request Parameters

Core Parameters

Parameter	Vertex Mapping	Notes
`input`	`instances[].content`	Text to embed
`dimensions`	`parameters.outputDimensionality`	Optional output size

Advanced Parameters

Use extra_params for embedding-specific options:

Gateway
Go SDK

curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-004",
    "input": ["text to embed"],
    "dimensions": 256,
    "task_type": "RETRIEVAL_DOCUMENT",
    "title": "Document title",
    "project_id": "my-gcp-project",
    "region": "us-central1",
    "autoTruncate": true
  }'

resp, err := client.EmbeddingRequest(ctx, &schemas.BifrostEmbeddingRequest{
    Provider: schemas.Vertex,
    Model:    "text-embedding-004",
    Input: &schemas.EmbeddingInput{
        Texts: []string{"text to embed"},
    },
    Params: &schemas.EmbeddingParameters{
        Dimensions: schemas.Ptr(256),
        ExtraParams: map[string]interface{}{
            "task_type": "RETRIEVAL_DOCUMENT",
            "title": "Document title",
            "autoTruncate": true,
        },
    },
})

Embedding Parameters

Parameter	Type	Description
`task_type`	string	Task type hint: `RETRIEVAL_QUERY`, `RETRIEVAL_DOCUMENT`, `SEMANTIC_SIMILARITY`, `CLASSIFICATION`, `CLUSTERING` (optional)
`title`	string	Optional title to help model produce better embeddings (used with task_type)
`autoTruncate`	boolean	Auto-truncate input to max tokens (defaults to true)

Task Type Effects

Different task types optimize embeddings for specific use cases:

RETRIEVAL_DOCUMENT - Optimized for documents in retrieval systems
RETRIEVAL_QUERY - Optimized for queries searching documents
SEMANTIC_SIMILARITY - Optimized for semantic similarity tasks
CLASSIFICATION - For classification tasks
CLUSTERING - For clustering tasks

Response Conversion

Embeddings response includes vectors and truncation information:

{
  "embeddings": [
    {
      "values": [0.1234, -0.5678, ...],
      "statistics": {
        "token_count": 15,
        "truncated": false
      }
    }
  ]
}

Response Fields:

values - Embedding vector as floats
statistics.token_count - Input token count
statistics.truncated - Whether input was truncated due to length

4. Image Generation

Image Generation is supported for Gemini and Imagen on Vertex AI. The provider automatically routes to the appropriate format based on the model type.

Request Parameters

Core Parameter Mapping

Parameter	Vertex Handling	Notes
`model`	Mapped to deployment/model identifier	Model type detected automatically
`prompt`	Model-specific conversion	Converted per underlying provider (Gemini/Imagen)
All other params	Model-specific conversion	Converted per underlying provider

Model Type Detection

Vertex automatically detects the model type and uses the appropriate conversion:

Gemini Models: Uses Gemini format (same as Gemini Image Generation)
Imagen Models: Uses Imagen format (detected via IsImagenModel())

Configuration

Gateway
Go SDK

curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex/imagen-4.0-generate-001",
    "prompt": "A sunset over the mountains",
    "size": "1024x1024",
    "n": 2,
    "project_id": "my-gcp-project",
    "region": "us-central1"
  }' \
  -H "X-Goog-Authorization: Bearer {token}"

resp, err := client.ImageGenerationRequest(ctx, &schemas.BifrostImageGenerationRequest{
    Provider: schemas.Vertex,
    Model:    "imagen-4.0-generate-001",
    Input: &schemas.ImageGenerationInput{
        Prompt: "A sunset over the mountains",
    },
    Params: &schemas.ImageGenerationParameters{
        Size: schemas.Ptr("1024x1024"),
        N:    schemas.Ptr(2),
    },
})

Request Conversion

Vertex converts requests based on model type:

Gemini Models: Uses gemini.ToGeminiImageGenerationRequest() - same conversion as standard Gemini (see Gemini Image Generation)
Imagen Models: Uses gemini.ToImagenImageGenerationRequest() - Imagen-specific format with size/aspect ratio conversion

All request bodies are converted to map[string]interface{} and the region field is removed before sending to Vertex API.

Response Conversion

Gemini Models: Responses converted using GenerateContentResponse.ToBifrostImageGenerationResponse() - same as standard Gemini
Imagen Models: Responses converted using GeminiImagenResponse.ToBifrostImageGenerationResponse() - Imagen-specific format

Endpoint Selection

The provider automatically selects the endpoint based on model type:

Fine-tuned models: /v1beta1/projects/{projectNumber}/locations/{region}/endpoints/{deployment}:generateContent
Imagen models: /v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:predict
Gemini models: /v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:generateContent

Streaming

Image generation streaming is not supported by Vertex AI.

5. Image Edit

Requests use multipart/form-data, not JSON.

Image Edit is supported for Gemini and Imagen models on Vertex AI. The provider automatically routes to the appropriate format based on the model type. Request Parameters

Parameter	Type	Required	Notes
`model`	string	✅	Model identifier (must be Gemini or Imagen model)
`prompt`	string	✅	Text description of the edit
`image[]`	binary	✅	Image file(s) to edit (supports multiple images)
`mask`	binary	❌	Mask image file
`type`	string	❌	Edit type: `"inpainting"`, `"outpainting"`, `"inpaint_removal"`, `"bgswap"` (Imagen only)
`n`	int	❌	Number of images to generate (1-10)
`output_format`	string	❌	Output format: `"png"`, `"webp"`, `"jpeg"`
`output_compression`	int	❌	Compression level (0-100%)
`seed`	int	❌	Seed for reproducibility (via `ExtraParams["seed"]`)
`negative_prompt`	string	❌	Negative prompt (via `ExtraParams["negativePrompt"]`)
`maskMode`	string	❌	Mask mode (via `ExtraParams["maskMode"]`, Imagen only): `"MASK_MODE_USER_PROVIDED"`, `"MASK_MODE_BACKGROUND"`, `"MASK_MODE_FOREGROUND"`, `"MASK_MODE_SEMANTIC"`
`dilation`	float	❌	Mask dilation (via `ExtraParams["dilation"]`, Imagen only): Range [0, 1]
`maskClasses`	int[]	❌	Mask classes (via `ExtraParams["maskClasses"]`, Imagen only): For `MASK_MODE_SEMANTIC`

Request Conversion Vertex uses the same conversion functions as Gemini:

Gemini Models: Uses gemini.ToGeminiImageEditRequest() - same conversion as standard Gemini (see Gemini Image Edit)
Imagen Models: Uses gemini.ToImagenImageEditRequest() - Imagen-specific format with edit mode mapping and mask configuration (see Gemini Image Edit)

Model Validation: Only Gemini and Imagen models are supported. Other models return ConfigurationError. Request Body Processing:

All request bodies are converted to map[string]interface{} for Vertex API compatibility
The region field is removed before sending to Vertex API
For Gemini models, unsupported fields are stripped via stripVertexGeminiUnsupportedFields() (removes id from function_call and function_response)

Response Conversion

Gemini Models: Responses converted using GenerateContentResponse.ToBifrostImageGenerationResponse() - same as standard Gemini
Imagen Models: Responses converted using GeminiImagenResponse.ToBifrostImageGenerationResponse() - Imagen-specific format

Endpoint Selection The provider automatically selects the endpoint based on model type:

Gemini models: /v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:generateContent
Imagen models: /v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:predict

Streaming Image edit streaming is not supported by Vertex AI. Image Variation Image variation is not supported by Vertex AI.

6. List Models

Request Parameters

None required. Automatically uses project_id and region from key config.

Response Conversion

Lists models available in the specified project and region with metadata and deployment information:

{
  "models": [
    {
      "name": "projects/{project}/locations/{region}/models/gemini-2.0-flash",
      "display_name": "Gemini 2.0 Flash",
      "description": "Fast multimodal model",
      "version_id": "1",
      "version_aliases": ["latest", "stable"],
      "capabilities": [...],
      "deployed_models": [...]
    }
  ],
  "next_page_token": "..."
}

Custom vs Non-Custom Models

Important: Vertex AI’s List Models API only returns custom fine-tuned models that have been deployed to your project. It does NOT return standard foundation models (Gemini, Claude, etc.).

To provide a complete model listing experience, Bifrost performs multi-pass model discovery:

Three-Pass Model Discovery

First Pass - Custom Models from API Response
- Queries Vertex AI’s List Models API
- Returns only custom fine-tuned models deployed to your project
- Custom models are identified by having deployment values that contain only digits
- Example: "deployment": "1234567890"
Second Pass - Non-Custom Models from Deployments
- Adds standard foundation models from your deployments configuration
- Non-custom models have alphanumeric deployment values (e.g., gemini-pro, claude-3-5-sonnet)
- Filters by allowedModels if specified
- Example: "deployment": "gemini-2.0-flash"
Third Pass - Allowed Models Not in Deployments
- Adds models specified in allowedModels that weren’t in the deployments map
- Ensures all explicitly allowed models appear in the list
- Uses the model name itself as the deployment value
- Skips digit-only model IDs (reserved for custom models)

Model Filtering Logic

If allowedModels is empty: All models from all three passes are included
If allowedModels is non-empty: Only models/deployments with keys in allowedModels are included
Duplicate Prevention: Each model ID is tracked to prevent duplicates across passes

Model Name Formatting

Non-custom models from deployments and allowed models are automatically formatted for display:

gemini-pro → “Gemini Pro”
claude-3-5-sonnet → “Claude 3 5 Sonnet”
gemini_2_flash → “Gemini 2 Flash”

Formatting uses title case and converts hyphens/underscores to spaces.

Example Configuration

With Custom Models Only
With Foundation Models
With Allowed Models Filter

{
  "vertex_key_config": {
    "project_id": "my-project",
    "region": "us-central1",
    "deployments": {
      "my-gemini-ft": "1234567890",
      "my-claude-ft": "9876543210"
    }
  }
}

This returns only your custom fine-tuned models from the API.

{
  "vertex_key_config": {
    "project_id": "my-project",
    "region": "us-central1",
    "deployments": {
      "gemini-2.0-flash": "gemini-2.0-flash",
      "claude-3-5-sonnet": "claude-3-5-sonnet-v2@20241022"
    }
  }
}

This returns both custom models AND foundation models from deployments.

{
  "vertex_key_config": {
    "project_id": "my-project",
    "region": "us-central1",
    "deployments": {
      "gemini-2.0-flash": "gemini-2.0-flash",
      "claude-3-5-sonnet": "claude-3-5-sonnet-v2@20241022",
      "gemini-1.5-pro": "gemini-1.5-pro"
    },
    "allowedModels": ["gemini-2.0-flash", "claude-3-5-sonnet"]
  }
}

Only returns gemini-2.0-flash and claude-3-5-sonnet, excluding gemini-1.5-pro.

Pagination

Model listing is paginated automatically. If more than 100 models exist, next_page_token will be present. Bifrost handles pagination internally.

Caveats

Project ID and Region Required

Severity: High Behavior: Both project_id and region required for all operations Impact: Request fails without valid GCP project/region configuration Code: vertex.go:127-138

OAuth2 Token Management

Severity: Medium Behavior: Tokens cached and automatically refreshed when expired Impact: First request slightly slower due to auth; cached for subsequent requests Code: vertex.go:34-55

Anthropic Model Detection

Severity: Medium Behavior: Automatic detection of Anthropic vs Gemini models Impact: Different conversion logic applied transparently Code: vertex.go chat/responses endpoints

Model-Specific Responses API Handling

Severity: Low Behavior: Responses API automatically routes to Anthropic or Gemini implementation based on model Impact: Different conversion logic applied transparently per model Code: vertex.go:836-1080

Anthropic Version Lock

Severity: Low Behavior: anthropic_version always set to vertex-2023-10-16 for Claude Impact: Cannot override Anthropic version for Claude on Vertex Code: utils.go:33, 71

Embeddings Float64 Conversion

Severity: Low Behavior: Vertex returns float64 embeddings, converted to float32 for Bifrost Impact: Minor precision loss (expected for embeddings) Code: embedding.go:84-87

List Models API Returns Only Custom Models

Severity: High Behavior: Vertex AI’s List Models API only returns custom fine-tuned models, NOT foundation models Impact: Bifrost performs three-pass discovery to include foundation models from deployments and allowedModels configuration Why: This is a Vertex AI API limitation - foundation models must be explicitly configured Code: models.go:76-217

Configuration

HTTP Settings: OAuth2 authentication with automatic token refresh | Region-specific endpoints | Max Connections 5000 | Max Idle 60 seconds Scope: https://www.googleapis.com/auth/cloud-platform Endpoint Format: https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/{resource} Note: For global region, endpoint is https://aiplatform.googleapis.com/v1/projects/{project}/locations/global/{resource}

Setup & Configuration

Vertex AI requires project configuration, region selection, and Google Cloud authentication. For detailed instructions on setting up Vertex AI, see the quickstart guides:

Gateway
Go SDK

See Provider-Specific Authentication - Google Vertex in the Gateway Quickstart for configuration steps using Web UI, API, or config.json.

Overview

Quick Start

Providers & Guides

SDK Integrations

MCP Gateway

Custom plugins

Open Source Features

Enterprise Features

​Overview

​Supported Operations

​1. Chat Completions

​Request Parameters

​Core Parameter Mapping

​Key Configuration

​Authentication Methods

​Gemini Models

​Parameter Mapping for Gemini

​Anthropic Models (Claude)

​Parameter Mapping for Anthropic

​Special Notes for Vertex + Anthropic

​Region Selection

​Streaming

​2. Responses API

​Request Parameters

​Core Parameter Mapping

​Gemini Models

​Anthropic Models (Claude)

​Configuration

​Special Handling

​3. Embeddings

​Request Parameters

​Core Parameters

​Advanced Parameters

​Embedding Parameters

​Task Type Effects

​Response Conversion

​4. Image Generation

​Request Parameters

​Core Parameter Mapping

​Model Type Detection

​Configuration

​Request Conversion

​Response Conversion

​Endpoint Selection

​Streaming

​5. Image Edit

​6. List Models

​Request Parameters

​Response Conversion

​Custom vs Non-Custom Models

​Three-Pass Model Discovery

​Model Filtering Logic

​Model Name Formatting

​Example Configuration

​Pagination

​Caveats

​Configuration

​Setup & Configuration

Overview

Supported Operations

1. Chat Completions

Request Parameters

Core Parameter Mapping

Key Configuration

Authentication Methods

Gemini Models

Parameter Mapping for Gemini

Anthropic Models (Claude)

Parameter Mapping for Anthropic

Special Notes for Vertex + Anthropic

Region Selection

Streaming

2. Responses API

Request Parameters

Core Parameter Mapping

Gemini Models

Anthropic Models (Claude)

Configuration

Special Handling

3. Embeddings

Request Parameters

Core Parameters

Advanced Parameters

Embedding Parameters

Task Type Effects

Response Conversion

4. Image Generation

Request Parameters

Core Parameter Mapping

Model Type Detection

Configuration

Request Conversion

Response Conversion

Endpoint Selection

Streaming

5. Image Edit

6. List Models

Request Parameters

Response Conversion

Custom vs Non-Custom Models

Three-Pass Model Discovery

Model Filtering Logic

Model Name Formatting

Example Configuration

Pagination

Caveats

Configuration

Setup & Configuration