Skip to main content

Overview

Vertex AI is Google’s unified ML platform providing access to Google’s Gemini models, Anthropic Claude models, and other third-party LLMs through a single API. Bifrost performs conversions including:
  • Multi-model support - Unified interface for Gemini, Anthropic, and third-party models
  • OAuth2 authentication - Service account credentials with automatic token refresh
  • Project and region management - Automatic endpoint construction from GCP project/region
  • Model routing - Automatic provider detection (Gemini vs Anthropic) based on model name
  • Request conversion - Conversion to underlying provider format (Gemini or Anthropic)
  • Embeddings support - Vector generation with task type and truncation options
  • Model discovery - Paginated model listing with deployment information

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/generate
Responses API/messages
Embeddings-/embeddings
List Models-/models
Text Completions-
Speech (TTS)-
Transcriptions (STT)-
Files-
Batch-
Unsupported Operations (❌): Text Completions, Speech, Transcriptions, Files, and Batch are not supported by Vertex AI. These return UnsupportedOperationError.Vertex-specific: Endpoints vary by model type. Responses API available for both Gemini and Anthropic models.

1. Chat Completions

Request Parameters

Core Parameter Mapping

ParameterVertex HandlingNotes
modelMaps to Vertex model IDRegion-specific endpoint constructed automatically
All other paramsModel-specific conversionConverted per underlying provider (Gemini/Anthropic)

Key Configuration

The key configuration for Vertex requires Google Cloud credentials:
{
  "vertex_key_config": {
    "project_id": "my-gcp-project",
    "region": "us-central1",
    "auth_credentials": "{service-account-json}"
  }
}
Configuration Details:
  • project_id - GCP project ID (required)
  • region - GCP region for API endpoints (required)
    • Examples: us-central1, us-west1, eu-west1, global
  • auth_credentials - Service account JSON credentials (optional if using default credentials)

Authentication Methods

  1. Service Account JSON (recommended for production)
    {"auth_credentials": "{full-service-account-json}"}
    
  2. Application Default Credentials (for local development)
    • Requires GOOGLE_APPLICATION_CREDENTIALS environment variable
    • Leave auth_credentials empty

Gemini Models

When using Google’s Gemini models, Bifrost converts requests to Gemini’s API format.

Parameter Mapping for Gemini

All Gemini-compatible parameters are supported. Special handling includes:
  • System prompts: Converted to Gemini’s system message format
  • Tool usage: Mapped to Gemini’s function calling format
  • Streaming: Uses Gemini’s streaming protocol
Refer to Gemini documentation for detailed conversion details.

Anthropic Models (Claude)

When using Anthropic models through Vertex AI, Bifrost converts requests to Anthropic’s message format.

Parameter Mapping for Anthropic

All Anthropic-standard parameters are supported:
  • Reasoning/Thinking: reasoning parameters converted to thinking structure
  • System messages: Extracted and placed in separate system field
  • Tool message grouping: Consecutive tool messages merged
  • API version: Automatically set to vertex-2023-10-16 for Anthropic models
Refer to Anthropic documentation for detailed conversion details.

Special Notes for Vertex + Anthropic

  • Responses API uses special /v1/messages endpoint
  • anthropic_version automatically set to vertex-2023-10-16
  • Minimum reasoning budget: 1024 tokens
  • Model field removed from request (Vertex uses different identification)

Region Selection

The region determines the API endpoint:
RegionEndpointPurpose
us-central1us-central1-aiplatform.googleapis.comUS Central
us-west1us-west1-aiplatform.googleapis.comUS West
eu-west1eu-west1-aiplatform.googleapis.comEurope West
globalaiplatform.googleapis.comGlobal (no region prefix)
Availability varies by region. Check GCP documentation for model availability.

Streaming

Streaming format depends on model type:
  • Gemini models: Standard Gemini streaming with server-sent events
  • Anthropic models: Anthropic message streaming format

2. Responses API

The Responses API is available for both Anthropic (Claude) and Gemini models on Vertex AI.

Request Parameters

Core Parameter Mapping

ParameterVertex HandlingNotes
instructionsBecomes system messageModel-specific conversion
inputConverted to messagesString or array support
max_output_tokensModel-specific field mappingGemini vs Anthropic conversion
All other paramsModel-specific conversionConverted per underlying provider

Gemini Models

For Gemini models, conversion follows Gemini’s Responses API format.

Anthropic Models (Claude)

For Anthropic models, conversion follows Anthropic’s message format:
  • instructions becomes system message
  • reasoning mapped to thinking structure

Configuration

curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex/claude-3-5-sonnet",
    "input": "What is AI?",
    "instructions": "You are a helpful assistant",
    "project_id": "my-gcp-project",
    "region": "us-central1"
  }' \
  -H "X-Goog-Authorization: Bearer {token}"

Special Handling

  • Endpoint: /v1/messages (Anthropic format)
  • anthropic_version set to vertex-2023-10-16 automatically
  • Model and region fields removed from request
  • Raw request body passthrough supported
Refer to Anthropic Responses API for parameter details.

3. Embeddings

Embeddings are supported for Gemini and other models that support embedding generation.

Request Parameters

Core Parameters

ParameterVertex MappingNotes
inputinstances[].contentText to embed
dimensionsparameters.outputDimensionalityOptional output size

Advanced Parameters

Use extra_params for embedding-specific options:
curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-004",
    "input": ["text to embed"],
    "dimensions": 256,
    "task_type": "RETRIEVAL_DOCUMENT",
    "title": "Document title",
    "project_id": "my-gcp-project",
    "region": "us-central1",
    "autoTruncate": true
  }'

Embedding Parameters

ParameterTypeDescription
task_typestringTask type hint: RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING (optional)
titlestringOptional title to help model produce better embeddings (used with task_type)
autoTruncatebooleanAuto-truncate input to max tokens (defaults to true)

Task Type Effects

Different task types optimize embeddings for specific use cases:
  • RETRIEVAL_DOCUMENT - Optimized for documents in retrieval systems
  • RETRIEVAL_QUERY - Optimized for queries searching documents
  • SEMANTIC_SIMILARITY - Optimized for semantic similarity tasks
  • CLASSIFICATION - For classification tasks
  • CLUSTERING - For clustering tasks

Response Conversion

Embeddings response includes vectors and truncation information:
{
  "embeddings": [
    {
      "values": [0.1234, -0.5678, ...],
      "statistics": {
        "token_count": 15,
        "truncated": false
      }
    }
  ]
}
Response Fields:
  • values - Embedding vector as floats
  • statistics.token_count - Input token count
  • statistics.truncated - Whether input was truncated due to length

4. List Models

Request Parameters

None required. Automatically uses project_id and region from key config.

Response Conversion

Lists models available in the specified project and region with metadata and deployment information:
{
  "models": [
    {
      "name": "projects/{project}/locations/{region}/models/gemini-2.0-flash",
      "display_name": "Gemini 2.0 Flash",
      "description": "Fast multimodal model",
      "version_id": "1",
      "version_aliases": ["latest", "stable"],
      "capabilities": [...],
      "deployed_models": [...]
    }
  ],
  "next_page_token": "..."
}

Pagination

Model listing is paginated automatically. If more than 100 models exist, next_page_token will be present. Bifrost handles pagination internally.

Caveats

Severity: High Behavior: Both project_id and region required for all operations Impact: Request fails without valid GCP project/region configuration Code: vertex.go:127-138
Severity: Medium Behavior: Tokens cached and automatically refreshed when expired Impact: First request slightly slower due to auth; cached for subsequent requests Code: vertex.go:34-55
Severity: Medium Behavior: Automatic detection of Anthropic vs Gemini models Impact: Different conversion logic applied transparently Code: vertex.go chat/responses endpoints
Severity: Low Behavior: Responses API automatically routes to Anthropic or Gemini implementation based on model Impact: Different conversion logic applied transparently per model Code: vertex.go:836-1080
Severity: Low Behavior: anthropic_version always set to vertex-2023-10-16 for Claude Impact: Cannot override Anthropic version for Claude on Vertex Code: utils.go:33, 71
Severity: Low Behavior: Vertex returns float64 embeddings, converted to float32 for Bifrost Impact: Minor precision loss (expected for embeddings) Code: embedding.go:84-87

Configuration

HTTP Settings: OAuth2 authentication with automatic token refresh | Region-specific endpoints | Max Connections 5000 | Max Idle 60 seconds Scope: https://www.googleapis.com/auth/cloud-platform Endpoint Format: https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/{resource} Note: For global region, endpoint is https://aiplatform.googleapis.com/v1/projects/{project}/locations/global/{resource}

Setup & Configuration

Vertex AI requires project configuration, region selection, and Google Cloud authentication. For detailed instructions on setting up Vertex AI, see the quickstart guides:
See Provider-Specific Authentication - Google Vertex in the Gateway Quickstart for configuration steps using Web UI, API, or config.json.