Overview
Vertex AI is Google’s unified ML platform providing access to Google’s Gemini models, Anthropic Claude models, and other third-party LLMs through a single API. Bifrost performs conversions including:- Multi-model support - Unified interface for Gemini, Anthropic, and third-party models
- OAuth2 authentication - Service account credentials with automatic token refresh
- Project and region management - Automatic endpoint construction from GCP project/region
- Model routing - Automatic provider detection (Gemini vs Anthropic) based on model name
- Request conversion - Conversion to underlying provider format (Gemini or Anthropic)
- Embeddings support - Vector generation with task type and truncation options
- Model discovery - Paginated model listing with deployment information
Supported Operations
| Operation | Non-Streaming | Streaming | Endpoint |
|---|---|---|---|
| Chat Completions | ✅ | ✅ | /generate |
| Responses API | ✅ | ✅ | /messages |
| Embeddings | ✅ | - | /embeddings |
| List Models | ✅ | - | /models |
| Text Completions | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Files | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
Unsupported Operations (❌): Text Completions, Speech, Transcriptions, Files, and Batch are not supported by Vertex AI. These return
UnsupportedOperationError.Vertex-specific: Endpoints vary by model type. Responses API available for both Gemini and Anthropic models.1. Chat Completions
Request Parameters
Core Parameter Mapping
| Parameter | Vertex Handling | Notes |
|---|---|---|
model | Maps to Vertex model ID | Region-specific endpoint constructed automatically |
| All other params | Model-specific conversion | Converted per underlying provider (Gemini/Anthropic) |
Key Configuration
The key configuration for Vertex requires Google Cloud credentials:project_id- GCP project ID (required)region- GCP region for API endpoints (required)- Examples:
us-central1,us-west1,eu-west1,global
- Examples:
auth_credentials- Service account JSON credentials (optional if using default credentials)
Authentication Methods
-
Service Account JSON (recommended for production)
-
Application Default Credentials (for local development)
- Requires
GOOGLE_APPLICATION_CREDENTIALSenvironment variable - Leave
auth_credentialsempty
- Requires
Gemini Models
When using Google’s Gemini models, Bifrost converts requests to Gemini’s API format.Parameter Mapping for Gemini
All Gemini-compatible parameters are supported. Special handling includes:- System prompts: Converted to Gemini’s system message format
- Tool usage: Mapped to Gemini’s function calling format
- Streaming: Uses Gemini’s streaming protocol
Anthropic Models (Claude)
When using Anthropic models through Vertex AI, Bifrost converts requests to Anthropic’s message format.Parameter Mapping for Anthropic
All Anthropic-standard parameters are supported:- Reasoning/Thinking:
reasoningparameters converted tothinkingstructure - System messages: Extracted and placed in separate
systemfield - Tool message grouping: Consecutive tool messages merged
- API version: Automatically set to
vertex-2023-10-16for Anthropic models
Special Notes for Vertex + Anthropic
- Responses API uses special
/v1/messagesendpoint anthropic_versionautomatically set tovertex-2023-10-16- Minimum reasoning budget: 1024 tokens
- Model field removed from request (Vertex uses different identification)
Region Selection
The region determines the API endpoint:| Region | Endpoint | Purpose |
|---|---|---|
us-central1 | us-central1-aiplatform.googleapis.com | US Central |
us-west1 | us-west1-aiplatform.googleapis.com | US West |
eu-west1 | eu-west1-aiplatform.googleapis.com | Europe West |
global | aiplatform.googleapis.com | Global (no region prefix) |
Streaming
Streaming format depends on model type:- Gemini models: Standard Gemini streaming with server-sent events
- Anthropic models: Anthropic message streaming format
2. Responses API
The Responses API is available for both Anthropic (Claude) and Gemini models on Vertex AI.Request Parameters
Core Parameter Mapping
| Parameter | Vertex Handling | Notes |
|---|---|---|
instructions | Becomes system message | Model-specific conversion |
input | Converted to messages | String or array support |
max_output_tokens | Model-specific field mapping | Gemini vs Anthropic conversion |
| All other params | Model-specific conversion | Converted per underlying provider |
Gemini Models
For Gemini models, conversion follows Gemini’s Responses API format.Anthropic Models (Claude)
For Anthropic models, conversion follows Anthropic’s message format:instructionsbecomes system messagereasoningmapped tothinkingstructure
Configuration
- Gateway
- Go SDK
Special Handling
- Endpoint:
/v1/messages(Anthropic format) anthropic_versionset tovertex-2023-10-16automatically- Model and region fields removed from request
- Raw request body passthrough supported
3. Embeddings
Embeddings are supported for Gemini and other models that support embedding generation.Request Parameters
Core Parameters
| Parameter | Vertex Mapping | Notes |
|---|---|---|
input | instances[].content | Text to embed |
dimensions | parameters.outputDimensionality | Optional output size |
Advanced Parameters
Useextra_params for embedding-specific options:
- Gateway
- Go SDK
Embedding Parameters
| Parameter | Type | Description |
|---|---|---|
task_type | string | Task type hint: RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING (optional) |
title | string | Optional title to help model produce better embeddings (used with task_type) |
autoTruncate | boolean | Auto-truncate input to max tokens (defaults to true) |
Task Type Effects
Different task types optimize embeddings for specific use cases:RETRIEVAL_DOCUMENT- Optimized for documents in retrieval systemsRETRIEVAL_QUERY- Optimized for queries searching documentsSEMANTIC_SIMILARITY- Optimized for semantic similarity tasksCLASSIFICATION- For classification tasksCLUSTERING- For clustering tasks
Response Conversion
Embeddings response includes vectors and truncation information:values- Embedding vector as floatsstatistics.token_count- Input token countstatistics.truncated- Whether input was truncated due to length
4. List Models
Request Parameters
None required. Automatically uses project_id and region from key config.Response Conversion
Lists models available in the specified project and region with metadata and deployment information:Pagination
Model listing is paginated automatically. If more than 100 models exist,next_page_token will be present. Bifrost handles pagination internally.
Caveats
Project ID and Region Required
Project ID and Region Required
Severity: High
Behavior: Both project_id and region required for all operations
Impact: Request fails without valid GCP project/region configuration
Code:
vertex.go:127-138OAuth2 Token Management
OAuth2 Token Management
Severity: Medium
Behavior: Tokens cached and automatically refreshed when expired
Impact: First request slightly slower due to auth; cached for subsequent requests
Code:
vertex.go:34-55Anthropic Model Detection
Anthropic Model Detection
Severity: Medium
Behavior: Automatic detection of Anthropic vs Gemini models
Impact: Different conversion logic applied transparently
Code:
vertex.go chat/responses endpointsModel-Specific Responses API Handling
Model-Specific Responses API Handling
Severity: Low
Behavior: Responses API automatically routes to Anthropic or Gemini implementation based on model
Impact: Different conversion logic applied transparently per model
Code:
vertex.go:836-1080Anthropic Version Lock
Anthropic Version Lock
Severity: Low
Behavior:
anthropic_version always set to vertex-2023-10-16 for Claude
Impact: Cannot override Anthropic version for Claude on Vertex
Code: utils.go:33, 71Embeddings Float64 Conversion
Embeddings Float64 Conversion
Severity: Low
Behavior: Vertex returns float64 embeddings, converted to float32 for Bifrost
Impact: Minor precision loss (expected for embeddings)
Code:
embedding.go:84-87Configuration
HTTP Settings: OAuth2 authentication with automatic token refresh | Region-specific endpoints | Max Connections 5000 | Max Idle 60 seconds Scope:https://www.googleapis.com/auth/cloud-platform
Endpoint Format: https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/{resource}
Note: For global region, endpoint is https://aiplatform.googleapis.com/v1/projects/{project}/locations/global/{resource}
Setup & Configuration
Vertex AI requires project configuration, region selection, and Google Cloud authentication. For detailed instructions on setting up Vertex AI, see the quickstart guides:- Gateway
- Go SDK
See Provider-Specific Authentication - Google Vertex in the Gateway Quickstart for configuration steps using Web UI, API, or config.json.

