Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getbifrost.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Vertex AI is Google’s unified ML platform providing access to Google’s Gemini models, Anthropic Claude models, and other third-party LLMs through a single API. Bifrost performs conversions including:
  • Multi-model support - Unified interface for Gemini, Anthropic, and third-party models
  • OAuth2 authentication - Service account credentials with automatic token refresh
  • Project and region management - Automatic endpoint construction from GCP project/region
  • Model routing - Automatic provider detection (Gemini vs Anthropic) based on model name
  • Request conversion - Conversion to underlying provider format (Gemini or Anthropic)
  • Embeddings support - Vector generation with task type and truncation options
  • Model discovery - Paginated model listing with deployment information

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/generate
Responses API/messages
Embeddings-/embeddings
Image Generation-/generateContent or /predict (Imagen)
Image Edit-/generateContent or /predict (Imagen)
Video Generation-/predictLongRunning (Veo models only)
Image Variation-Not supported
List Models-/models
Text Completions-
Speech (TTS)-
Transcriptions (STT)-
Files-
Batch-
Unsupported Operations (❌): Text Completions, Speech, Transcriptions, Files, and Batch are not supported by Vertex AI. These return UnsupportedOperationError.Vertex-specific: Endpoints vary by model type. Responses API available for both Gemini and Anthropic models.

Setup & Configuration

Vertex AI requires Google Cloud project configuration and authentication credentials. Three authentication methods are supported.
The aliases field (mapping model names to fine-tuned model IDs or endpoint identifiers) requires v1.5.0-prerelease2 or later. On v1.4.x, use deployments inside vertex_key_config instead - see the v1.5.0 Migration Guide for details.
Provide a credential JSON string in auth_credentials. The JSON must contain a type field. Supported types: service_account (most common), impersonated_service_account, authorized_user, external_account, external_account_authorized_user.
Google Vertex AI Service Account (JSON) authentication setup in the Bifrost Web UI showing Project ID, Region, and Auth Credentials fields
  1. Navigate to “Model Providers”“Configurations”“Google Vertex”
  2. Click “Add Key” (or edit an existing key)
  3. Under Authentication Method, select “Service Account (JSON)”
  4. Set Project ID: Your Google Cloud project ID
  5. Set Project Number (Required only for fine-tuned models): Your GCP project number; leave blank for standard models
  6. Set Region: e.g., us-central1
  7. Set Auth Credentials: Paste your service account JSON or reference an env var (e.g., env.VERTEX_CREDENTIALS)
  8. Configure Aliases: Map model names to fine-tuned model IDs (if using fine-tuned models)
  9. Save

2. Application Default Credentials

Leave auth_credentials empty. Bifrost calls google.FindDefaultCredentials() - Google’s ADC library - which resolves credentials in this order:
  1. GOOGLE_APPLICATION_CREDENTIALS env var (path to a JSON credential file)
  2. Application default credential file (~/.config/gcloud/application_default_credentials.json, written by gcloud auth application-default login)
  3. GCE/GKE/Cloud Run/App Engine metadata server (attached service account or Workload Identity)
Google Vertex AI Application Default Credentials setup in the Bifrost Web UI showing Project ID and Region fields with no credential inputs
  1. Navigate to “Model Providers”“Configurations”“Google Vertex”
  2. Click “Add Key” (or edit an existing key)
  3. Under Authentication Method, select “Service Account (Attached)”
  4. Set Project ID: Your Google Cloud project ID
  5. Set Project Number (Required only for fine-tuned models): Your GCP project number; leave blank for standard models
  6. Set Region: e.g., us-central1
  7. Configure Aliases if needed
  8. Save
Ensure GOOGLE_APPLICATION_CREDENTIALS is set in your environment, or that Workload Identity / gcloud is configured.

3. API Key (Gemini and Fine-Tuned Models Only)

Set value to your Vertex API key. API key authentication is supported only for Gemini models and fine-tuned Gemini models. For Anthropic models on Vertex, use Service Account or Application Default Credentials.
Google Vertex AI API Key authentication setup in the Bifrost Web UI showing API Key, Project ID, Region, and Project Number fields
  1. Navigate to “Model Providers”“Configurations”“Google Vertex”
  2. Click “Add Key” (or edit an existing key)
  3. Under Authentication Method, select “API Key”
  4. Set API Key: Your Vertex AI API key
  5. Set Project ID: Your Google Cloud project ID
  6. Set Project Number (Required only for fine-tuned models): Your GCP project number; leave blank for standard models
  7. Set Region: e.g., us-central1
  8. Configure Aliases: Map short names to fine-tuned model IDs (e.g., my-model123456789)
  9. Save
Vertex AI support for fine-tuned models is currently in beta. Requests to non-Gemini fine-tuned models may fail, so please test and report any issues.
vertex_key_config fields:
FieldRequiredDescription
project_idYesGoogle Cloud project ID
regionYesGCP region (e.g., us-central1, eu-west1, global)
auth_credentialsNoService account JSON string (leave empty for ADC)
project_numberNoGCP project number (required for fine-tuned models)
Key-level fields:
FieldRequiredDescription
valueNoVertex API key (Gemini and fine-tuned models only; leave empty for Service Account / ADC)
aliasesNoMap model names to fine-tuned model IDs or endpoint identifiers (v1.5.0-prerelease2+)
modelsYesModels this key can serve; use ["*"] to allow all

GKE Workload Identity Federation

When running Bifrost on GKE, Workload Identity Federation (WIF) lets pods authenticate to Vertex AI without managing service account keys. The pod inherits an IAM identity through the Kubernetes ServiceAccount, and Bifrost picks it up automatically via Application Default Credentials. What you need:
  1. The GCP-side prerequisites below (API enabled, IAM service account, WIF binding)
  2. A Bifrost Vertex key using “Service Account (Attached)” auth - see Application Default Credentials for Web UI, API, config.json, and Go SDK setup. For Helm, see Helm - Google Vertex AI.
  3. The Kubernetes ServiceAccount annotated for WIF:
kubectl annotate serviceaccount KSA_NAME \
  --namespace NAMESPACE \
  iam.gke.io/gcp-service-account=IAM_SA_NAME@PROJECT_ID.iam.gserviceaccount.com
Replace IAM_SA_NAME with the IAM Service Account created in Step 3 below.

GCP Prerequisites

The Vertex AI API must be enabled in your project. Search for aiplatform in the API Library or run:
gcloud services enable aiplatform.googleapis.com --project=PROJECT_ID
WIF uses the IAM Credentials API for token exchange. Enable it as well:
gcloud services enable iamcredentials.googleapis.com --project=PROJECT_ID
Autopilot clusters: WIF is always enabled. Skip this step.Standard clusters: Enable the workload identity pool and GKE metadata server:
# Enable Workload Identity on the cluster
gcloud container clusters update CLUSTER_NAME \
  --location=LOCATION \
  --workload-pool=PROJECT_ID.svc.id.goog

# Enable GKE metadata server on each node pool
gcloud container node-pools update NODEPOOL_NAME \
  --cluster=CLUSTER_NAME \
  --location=LOCATION \
  --workload-metadata=GKE_METADATA
Verify:
gcloud container clusters describe CLUSTER_NAME \
  --location=LOCATION \
  --format="value(workloadIdentityConfig.workloadPool)"
# Expected: PROJECT_ID.svc.id.goog
Create a dedicated IAM Service Account (or use an existing one) and grant it the Vertex AI User role:
# Create the service account
gcloud iam service-accounts create IAM_SA_NAME \
  --display-name="Bifrost Vertex AI" \
  --project=PROJECT_ID

# Grant Vertex AI access
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:IAM_SA_NAME@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"
Allow the Kubernetes ServiceAccount to impersonate the IAM Service Account:
gcloud iam service-accounts add-iam-policy-binding \
  IAM_SA_NAME@PROJECT_ID.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"
Replace NAMESPACE and KSA_NAME with your Bifrost pod’s namespace and Kubernetes ServiceAccount name.Then annotate the Kubernetes ServiceAccount so GKE knows which IAM identity to map:
kubectl annotate serviceaccount KSA_NAME \
  --namespace NAMESPACE \
  iam.gke.io/gcp-service-account=IAM_SA_NAME@PROJECT_ID.iam.gserviceaccount.com
If deploying with the Bifrost Helm chart, set the annotation via serviceAccount.annotations in your values file - see Helm - Google Vertex AI for the full example.

Verify

From inside the Bifrost pod, confirm the GKE metadata server returns a token:
kubectl exec -n NAMESPACE POD_NAME -- \
  wget -qO- --header="Metadata-Flavor: Google" \
  "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token"
Replace NAMESPACE and POD_NAME with your Bifrost namespace and any running Bifrost pod name (e.g., bifrost-0 for a StatefulSet or use kubectl get pods -n NAMESPACE to find it). A JSON response with an access_token field confirms WIF is working. Then send a request through Bifrost to a Vertex model (e.g., vertex/gemini-2.5-flash) to verify end-to-end.

Troubleshooting

SymptomLikely CauseFix
"could not find default credentials"GKE metadata server not enabled, or Kubernetes ServiceAccount missing WIF annotationEnable GKE metadata server on the node pool (Step 2); verify the iam.gke.io/gcp-service-account annotation on the ServiceAccount (Step 4)
403 Forbidden from Vertex APIIAM Service Account lacks Vertex permissionsGrant roles/aiplatform.user to the IAM Service Account
403 during token exchangeWIF binding missingRun the add-iam-policy-binding command from Step 4; confirm roles/iam.workloadIdentityUser is granted
Wrong project or region errorsBifrost config mismatchCheck project_id and region in the Vertex key configuration

Beta Headers

For Anthropic models on Vertex AI, Bifrost validates anthropic-beta headers and drops unsupported headers from the request. Supported: computer-use-*, compact-*, context-management-*, interleaved-thinking-*, context-1m-* Not supported: structured-outputs-*, advanced-tool-use-*, mcp-client-*, prompt-caching-scope-*, files-api-*, skills-*, fast-mode-*, redact-thinking-* You can override these defaults per provider via the Beta Headers tab in provider configuration or via beta_header_overrides. See the full support matrix in the Anthropic provider docs.
Vertex AI Beta Headers configuration tab showing supported and unsupported Anthropic beta features with override options

1. Chat Completions

Request Parameters

Core Parameter Mapping

ParameterVertex HandlingNotes
modelMaps to Vertex model IDRegion-specific endpoint constructed automatically
All other paramsModel-specific conversionConverted per underlying provider (Gemini/Anthropic)

Key Configuration

The key configuration for Vertex requires Google Cloud credentials:
{
  "vertex_key_config": {
    "project_id": "my-gcp-project",
    "region": "us-central1",
    "auth_credentials": "{service-account-json}"
  }
}
Configuration Details:
  • project_id - GCP project ID (required)
  • region - GCP region for API endpoints (required)
    • Examples: us-central1, us-west1, eu-west1, global
  • auth_credentials - Service account JSON credentials (optional if using default credentials)

Authentication Methods

  1. Service Account JSON (recommended for production)
    { "auth_credentials": "{full-service-account-json}" }
    
  2. Application Default Credentials (for local development)
    • Requires GOOGLE_APPLICATION_CREDENTIALS environment variable
    • Leave auth_credentials empty

Gemini Models

When using Google’s Gemini models, Bifrost converts requests to Gemini’s API format.

Parameter Mapping for Gemini

All Gemini-compatible parameters are supported. Special handling includes:
  • System prompts: Converted to Gemini’s system message format
  • Tool usage: Mapped to Gemini’s function calling format
  • Streaming: Uses Gemini’s streaming protocol
Refer to Gemini documentation for detailed conversion details.

Anthropic Models (Claude)

When using Anthropic models through Vertex AI, Bifrost converts requests to Anthropic’s message format.

Parameter Mapping for Anthropic

All Anthropic-standard parameters are supported:
  • Reasoning/Thinking: reasoning parameters converted to thinking structure
  • System messages: Extracted and placed in separate system field
  • Tool message grouping: Consecutive tool messages merged
  • API version: Automatically set to vertex-2023-10-16 for Anthropic models
Refer to Anthropic documentation for detailed conversion details.

Special Notes for Vertex + Anthropic

  • Responses API uses special /v1/messages endpoint
  • anthropic_version automatically set to vertex-2023-10-16
  • Minimum reasoning budget: 1024 tokens
  • Model field removed from request (Vertex uses different identification)

Region Selection

The region determines the API endpoint:
RegionEndpointPurpose
us-central1us-central1-aiplatform.googleapis.comUS Central
us-west1us-west1-aiplatform.googleapis.comUS West
eu-west1eu-west1-aiplatform.googleapis.comEurope West
globalaiplatform.googleapis.comGlobal (no region prefix)
Availability varies by region. Check GCP documentation for model availability.

Streaming

Streaming format depends on model type:
  • Gemini models: Standard Gemini streaming with server-sent events
  • Anthropic models: Anthropic message streaming format

2. Responses API

The Responses API is available for both Anthropic (Claude) and Gemini models on Vertex AI.

Request Parameters

Core Parameter Mapping

ParameterVertex HandlingNotes
instructionsBecomes system messageModel-specific conversion
inputConverted to messagesString or array support
max_output_tokensModel-specific field mappingGemini vs Anthropic conversion
All other paramsModel-specific conversionConverted per underlying provider

Gemini Models

For Gemini models, conversion follows Gemini’s Responses API format.

Anthropic Models (Claude)

For Anthropic models, conversion follows Anthropic’s message format:
  • instructions becomes system message
  • reasoning mapped to thinking structure

Configuration

curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex/claude-3-5-sonnet",
    "input": "What is AI?",
    "instructions": "You are a helpful assistant",
    "project_id": "my-gcp-project",
    "region": "us-central1"
  }' \
  -H "X-Goog-Authorization: Bearer {token}"

Special Handling

  • Endpoint: /v1/messages (Anthropic format)
  • anthropic_version set to vertex-2023-10-16 automatically
  • Model and region fields removed from request
  • Raw request body passthrough supported
Refer to Anthropic Responses API for parameter details.

3. Embeddings

Embeddings are supported for Gemini and other models that support embedding generation.

Request Parameters

Core Parameters

ParameterVertex MappingNotes
inputinstances[].contentText to embed
dimensionsparameters.outputDimensionalityOptional output size

Advanced Parameters

Use extra_params for embedding-specific options:
curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-004",
    "input": ["text to embed"],
    "dimensions": 256,
    "task_type": "RETRIEVAL_DOCUMENT",
    "title": "Document title",
    "project_id": "my-gcp-project",
    "region": "us-central1",
    "autoTruncate": true
  }'

Embedding Parameters

ParameterTypeDescription
task_typestringTask type hint: RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING (optional)
titlestringOptional title to help model produce better embeddings (used with task_type)
autoTruncatebooleanAuto-truncate input to max tokens (defaults to true)

Task Type Effects

Different task types optimize embeddings for specific use cases:
  • RETRIEVAL_DOCUMENT - Optimized for documents in retrieval systems
  • RETRIEVAL_QUERY - Optimized for queries searching documents
  • SEMANTIC_SIMILARITY - Optimized for semantic similarity tasks
  • CLASSIFICATION - For classification tasks
  • CLUSTERING - For clustering tasks

Response Conversion

Embeddings response includes vectors and truncation information:
{
  "embeddings": [
    {
      "values": [0.1234, -0.5678, ...],
      "statistics": {
        "token_count": 15,
        "truncated": false
      }
    }
  ]
}
Response Fields:
  • values - Embedding vector as floats
  • statistics.token_count - Input token count
  • statistics.truncated - Whether input was truncated due to length

4. Image Generation

Image Generation is supported for Gemini and Imagen on Vertex AI. The provider automatically routes to the appropriate format based on the model type.

Request Parameters

Core Parameter Mapping

ParameterVertex HandlingNotes
modelMapped to deployment/model identifierModel type detected automatically
promptModel-specific conversionConverted per underlying provider (Gemini/Imagen)
All other paramsModel-specific conversionConverted per underlying provider

Model Type Detection

Vertex automatically detects the model type and uses the appropriate conversion:
  1. Gemini Models: Uses Gemini format (same as Gemini Image Generation)
  2. Imagen Models: Uses Imagen format (detected via IsImagenModel())

Configuration

curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex/imagen-4.0-generate-001",
    "prompt": "A sunset over the mountains",
    "size": "1024x1024",
    "n": 2,
    "project_id": "my-gcp-project",
    "region": "us-central1"
  }' \
  -H "X-Goog-Authorization: Bearer {token}"

Request Conversion

Vertex converts requests based on model type:
  • Gemini Models: Uses gemini.ToGeminiImageGenerationRequest() - same conversion as standard Gemini (see Gemini Image Generation)
  • Imagen Models: Uses gemini.ToImagenImageGenerationRequest() - Imagen-specific format with size/aspect ratio conversion
All request bodies are converted to map[string]interface{} and the region field is removed before sending to Vertex API.

Response Conversion

  • Gemini Models: Responses converted using GenerateContentResponse.ToBifrostImageGenerationResponse() - same as standard Gemini
  • Imagen Models: Responses converted using GeminiImagenResponse.ToBifrostImageGenerationResponse() - Imagen-specific format

Endpoint Selection

The provider automatically selects the endpoint based on model type:
  • Fine-tuned models: /v1beta1/projects/{projectNumber}/locations/{region}/endpoints/{deployment}:generateContent
  • Imagen models: /v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:predict
  • Gemini models: /v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:generateContent

Streaming

Image generation streaming is not supported by Vertex AI.

5. Image Edit

Requests use multipart/form-data, not JSON.
Image Edit is supported for Gemini and Imagen models on Vertex AI. The provider automatically routes to the appropriate format based on the model type. Request Parameters
ParameterTypeRequiredNotes
modelstringModel identifier (must be Gemini or Imagen model)
promptstringText description of the edit
image[]binaryImage file(s) to edit (supports multiple images)
maskbinaryMask image file
typestringEdit type: "inpainting", "outpainting", "inpaint_removal", "bgswap" (Imagen only)
nintNumber of images to generate (1-10)
output_formatstringOutput format: "png", "webp", "jpeg"
output_compressionintCompression level (0-100%)
seedintSeed for reproducibility (via ExtraParams["seed"])
negative_promptstringNegative prompt (via ExtraParams["negativePrompt"])
maskModestringMask mode (via ExtraParams["maskMode"], Imagen only): "MASK_MODE_USER_PROVIDED", "MASK_MODE_BACKGROUND", "MASK_MODE_FOREGROUND", "MASK_MODE_SEMANTIC"
dilationfloatMask dilation (via ExtraParams["dilation"], Imagen only): Range [0, 1]
maskClassesint[]Mask classes (via ExtraParams["maskClasses"], Imagen only): For MASK_MODE_SEMANTIC

Request Conversion Vertex uses the same conversion functions as Gemini:
  1. Gemini Models: Uses gemini.ToGeminiImageEditRequest() - same conversion as standard Gemini (see Gemini Image Edit)
  2. Imagen Models: Uses gemini.ToImagenImageEditRequest() - Imagen-specific format with edit mode mapping and mask configuration (see Gemini Image Edit)
Model Validation: Only Gemini and Imagen models are supported. Other models return ConfigurationError. Request Body Processing:
  • All request bodies are converted to map[string]interface{} for Vertex API compatibility
  • The region field is removed before sending to Vertex API
  • For Gemini models, unsupported fields are stripped via stripVertexGeminiUnsupportedFields() (removes id from function_call and function_response)
Response Conversion
  • Gemini Models: Responses converted using GenerateContentResponse.ToBifrostImageGenerationResponse() - same as standard Gemini
  • Imagen Models: Responses converted using GeminiImagenResponse.ToBifrostImageGenerationResponse() - Imagen-specific format
Endpoint Selection The provider automatically selects the endpoint based on model type:
  • Gemini models: /v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:generateContent
  • Imagen models: /v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:predict
Streaming Image edit streaming is not supported by Vertex AI. Image Variation Image variation is not supported by Vertex AI.

6. List Models

Request Parameters

None required. Automatically uses project_id and region from key config.

Response Conversion

Lists models available in the specified project and region with metadata and deployment information:
{
  "models": [
    {
      "name": "projects/{project}/locations/{region}/models/gemini-2.0-flash",
      "display_name": "Gemini 2.0 Flash",
      "description": "Fast multimodal model",
      "version_id": "1",
      "version_aliases": ["latest", "stable"],
      "capabilities": [...],
      "deployed_models": [...]
    }
  ],
  "next_page_token": "..."
}

Custom vs Non-Custom Models

Important: Vertex AI’s List Models API only returns custom fine-tuned models that have been deployed to your project. It does NOT return standard foundation models (Gemini, Claude, etc.).
To provide a complete model listing experience, Bifrost performs multi-pass model discovery:

Three-Pass Model Discovery

  1. First Pass - Custom Models from API Response
    • Queries Vertex AI’s List Models API
    • Returns only custom fine-tuned models deployed to your project
    • Custom models are identified by having deployment values that contain only digits
    • Example: "deployment": "1234567890"
  2. Second Pass - Non-Custom Models from Aliases
    • Adds standard foundation models from your aliases configuration
    • Non-custom models have alphanumeric deployment values (e.g., gemini-pro, claude-3-5-sonnet)
    • Filters by the key-level models allowlist, if specified
    • Example: "deployment": "gemini-2.0-flash"
  3. Third Pass - Allowed Models Not in Aliases
    • Adds models specified in models that weren’t in the aliases map
    • Ensures all explicitly allowed models appear in the list
    • Uses the model name itself as the deployment value
    • Skips digit-only model IDs (reserved for custom models)

Model Filtering Logic

  • If models is empty and no aliases are configured: No models are returned
  • If models is empty but aliases are configured: Only aliased models are returned
  • If models is ["*"]: All models from all three passes are included (unrestricted)
  • If models is non-empty: Only models/aliases whose request names appear in models are included
  • Duplicate Prevention: Each model ID is tracked to prevent duplicates across passes

Model Name Formatting

Non-custom models from aliases and allowed models are automatically formatted for display:
  • gemini-pro → “Gemini Pro”
  • claude-3-5-sonnet → “Claude 3 5 Sonnet”
  • gemini_2_flash → “Gemini 2 Flash”
Formatting uses title case and converts hyphens/underscores to spaces.

Example Configuration

{
  "aliases": {
    "my-gemini-ft": "1234567890",
    "my-claude-ft": "9876543210"
  },
  "vertex_key_config": {
    "project_id": "my-project",
    "region": "us-central1"
  }
}
This returns only your custom fine-tuned models from the API.

Pagination

Model listing is paginated automatically. If more than 100 models exist, next_page_token will be present. Bifrost handles pagination internally.

Caveats

Severity: High Behavior: Both project_id and region required for all operations Impact: Request fails without valid GCP project/region configuration Code: vertex.go:127-138
Severity: Medium Behavior: Tokens cached and automatically refreshed when expired Impact: First request slightly slower due to auth; cached for subsequent requests Code: vertex.go:34-55
Severity: Medium Behavior: Automatic detection of Anthropic vs Gemini models Impact: Different conversion logic applied transparently Code: vertex.go chat/responses endpoints
Severity: Low Behavior: Responses API automatically routes to Anthropic or Gemini implementation based on model Impact: Different conversion logic applied transparently per model Code: vertex.go:836-1080
Severity: Low Behavior: anthropic_version always set to vertex-2023-10-16 for Claude Impact: Cannot override Anthropic version for Claude on Vertex Code: utils.go:33, 71
Severity: Low Behavior: Vertex returns float64 embeddings, and Bifrost preserves that precision in normalized embedding responses Impact: No precision loss in the /v1/embeddings response path Code: embedding.go:84-91
Severity: High Behavior: Vertex AI’s List Models API only returns custom fine-tuned models, NOT foundation models Impact: Bifrost performs three-pass discovery to include foundation models from aliases and the key-level models allowlist Why: This is a Vertex AI API limitation - foundation models must be explicitly configured Code: models.go:76-217

Configuration

HTTP Settings: OAuth2 authentication with automatic token refresh | Region-specific endpoints | Max Connections 5000 | Max Idle 60 seconds Scope: https://www.googleapis.com/auth/cloud-platform Endpoint Format: https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/{resource} Note: For global region, endpoint is https://aiplatform.googleapis.com/v1/projects/{project}/locations/global/{resource}

Video Generation

Vertex AI routes video generation through Gemini’s Veo models using the predictLongRunning endpoint. All parameters are identical to Gemini Video Generation.
Only Veo models are supported (e.g., veo-2.0-generate-001). Passing a non-Veo model name returns a configuration error.
Supported Operations
OperationSupportedNotes
GeneratePOST /v1/videos
RetrieveGET /v1/videos/{id}
DownloadGET /v1/videos/{id}/content
DeleteNot supported
ListNot supported
RemixNot supported