Skip to main content

Overview

Azure is a cloud provider offering access to OpenAI and Anthropic models through the Azure OpenAI Service. Bifrost performs conversions including:
  • Deployment mapping - Model identifiers mapped to Azure deployment IDs with version handling
  • Authentication modes - API key or bearer token (OAuth) support
  • Model routing - Automatic provider detection (OpenAI vs Anthropic) based on deployment
  • API versioning - Configurable API versions with preview support for Responses API
  • Custom endpoints - Full control over Azure endpoint configuration
  • Multi-model support - Unified interface for OpenAI, Anthropic (via Azure), and Gemini models
  • Request/response pass-through - Support for raw request/response bodies for advanced use cases

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/openai/v1/chat/completions
Responses API/openai/v1/responses
Embeddings-/openai/v1/embeddings
Files-/openai/v1/files
List Models-/openai/v1/models
Image Generation/openai/v1/images/generations
Image Edit/openai/v1/images/edits
Image Variation-
Batch-
Text Completions-
Speech (TTS)-
Azure-specific: Batch operations and Text Completions are not supported by Azure OpenAI Service. Responses API uses preview API version and is available for both OpenAI and Anthropic models.

1. Chat Completions

Request Parameters

Core Parameter Mapping

ParameterAzure HandlingNotes
modelMapped to deployment_idSupports version matching and base model matching
max_completion_tokensDirect pass-throughOpenAI models only
temperature, top_pDirect pass-throughSame across all models
All other paramsModel-specific conversionConverted per underlying provider (OpenAI/Anthropic)

Authentication Configuration

Azure uses custom endpoint and deployment configuration:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "azure/gpt-4-deployment",
    "messages": [{"role": "user", "content": "Hello"}],
    "deployment": "my-gpt4-deployment",
    "endpoint": "https://my-org.openai.azure.com"
  }' \
  -H "api-key: YOUR_AZURE_API_KEY"

Key Configuration

Azure supports two authentication methods:

Azure Entra ID (Service Principal)

If you set client_id, client_secret, and tenant_id, Azure Entra ID authentication will be used with priority over API key authentication.
{
  "azure_key_config": {
    "endpoint": "https://your-org.openai.azure.com",
    "client_id": "your-client-id",
    "client_secret": "your-client-secret",
    "tenant_id": "your-tenant-id",
    "scopes": ["https://cognitiveservices.azure.com/.default"],
    "api_version": "2024-10-21",
    "deployments": {
      "gpt-4": "my-gpt4-deployment",
      "gpt-4-turbo": "my-gpt4-turbo-deployment",
      "claude-3": "my-claude-deployment"
    }
  }
}
Required Azure Roles:
  • For OpenAI models: Cognitive Services OpenAI User
  • For Anthropic models: Cognitive Services AI Services User

Direct Authentication (API Key)

{
  "azure_key_config": {
    "endpoint": "https://your-org.openai.azure.com",
    "api_version": "2024-10-21",
    "deployments": {
      "gpt-4": "my-gpt4-deployment",
      "gpt-4-turbo": "my-gpt4-turbo-deployment",
      "claude-3": "my-claude-deployment"
    }
  }
}
Configuration Details:
  • endpoint - Azure OpenAI resource endpoint (required)
  • client_id - Azure Entra ID client ID (optional, for Service Principal auth)
  • client_secret - Azure Entra ID client secret (optional, for Service Principal auth)
  • tenant_id - Azure Entra ID tenant ID (optional, for Service Principal auth)
  • scopes - OAuth scopes for token requests (default: ["https://cognitiveservices.azure.com/.default"])
  • api_version - API version to use (default: 2024-10-21)
  • deployments - Map of model names to deployment IDs (optional, can be provided per-request)
  • allowed_models - List of allowed models to use from this key (optional)

Deployment Selection

Deployments can be specified at three levels (in order of precedence):
  1. Per-request (highest priority)
    {"deployment": "custom-deployment"}
    
  2. Key configuration
    {"deployments": {"gpt-4": "my-gpt4-deployment"}}
    
  3. Model name (lowest priority, if no deployment specified) Model name is used as deployment ID directly

OpenAI Models

When using OpenAI models (GPT-4, GPT-4 Turbo, GPT-3.5-Turbo, etc.), Bifrost passes through OpenAI-compatible parameters directly.

Parameter Mapping for OpenAI

All OpenAI-standard parameters are supported. Refer to OpenAI documentation for detailed conversion details.

Anthropic Models

When using Anthropic models through Azure (Claude 3 family), Bifrost converts requests to Anthropic format.

Parameter Mapping for Anthropic

All Anthropic-standard parameters are supported with special handling:
  • Reasoning/Thinking: reasoning parameters converted to Anthropic’s thinking structure
  • System messages: Extracted and placed in separate system field
  • Tool message grouping: Consecutive tool messages merged
Refer to Anthropic documentation for detailed conversion details.

Special Notes for Azure + Anthropic

  • API version automatically set to 2023-06-01 for Anthropic models
  • Endpoints use /anthropic/v1/ paths internally
  • Authentication uses x-api-key header for Anthropic models
  • Minimum reasoning budget: 1024 tokens

API Versioning

  • Default version: 2024-10-21 (supports latest OpenAI features)
  • Preview version: preview (used for Responses API)
  • Custom version: Set via api_version in key config
Different API versions may have different feature support. Bifrost automatically adjusts endpoint paths and parameters based on the configured version.

Streaming

Streaming uses OpenAI or Anthropic format depending on model type:
  • OpenAI models: Standard OpenAI streaming with chat.completion.chunk events
  • Anthropic models: Anthropic streaming format with content blocks

2. Responses API

The Responses API is available for both OpenAI and Anthropic models on Azure and uses the preview API version.

Request Parameters

Core Parameter Mapping

ParameterAzure HandlingNotes
instructionsBecomes system messageModel-specific conversion
inputConverted to user message(s)String or array support
max_output_tokensModel-specific field mappingOpenAI vs Anthropic conversion
All other paramsModel-specific conversionConverted per underlying provider

OpenAI Models

For OpenAI models (GPT-4, etc.), conversion follows OpenAI’s Responses API format.

Anthropic Models

For Anthropic models (Claude, etc.), conversion follows Anthropic’s message format:
  • instructions becomes system message
  • reasoning mapped to thinking structure

Endpoint Configuration

curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "azure/claude-3-sonnet",
    "input": "Hello, how are you?",
    "instructions": "You are a helpful assistant",
    "deployment": "my-claude-deployment",
    "endpoint": "https://my-org.openai.azure.com"
  }' \
  -H "api-key: YOUR_AZURE_API_KEY"

Special Handling

  • Uses /openai/v1/responses endpoint with preview API version
  • All request body conversions handled automatically
  • Supports raw request body passthrough for advanced cases
OpenAI Models - gpt-oss Special Message Handling: For OpenAI models through Azure, see OpenAI Responses API documentation for details on special gpt-oss model handling regarding reasoning conversion (summaries vs. content blocks). Anthropic Models: Refer to Anthropic Responses API for parameter details.

3. Embeddings

Embeddings are supported for OpenAI models only (not available for Anthropic models on Azure).

Request Parameters

ParameterAzure Handling
inputDirect pass-through
modelMapped to deployment
dimensionsDirect pass-through (when supported)
curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": ["text to embed"],
    "deployment": "my-embedding-deployment"
  }' \
  -H "api-key: YOUR_AZURE_API_KEY"

Response Conversion

Embeddings response is passed through directly from Azure OpenAI with standard format:
{
  "data": [
    {
      "object": "embedding",
      "embedding": [0.1234, -0.5678, ...],
      "index": 0
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10
  }
}

4. Files API

Files operations are supported for OpenAI models only.

Supported Operations

OperationSupport
Upload
List
Retrieve
Delete
Get Content
Files are stored in Azure and can be used with batch operations.

5. Image Generation

Image Generation is supported for OpenAI models on Azure and uses the OpenAI-compatible format.

Request Parameters

Core Parameter Mapping

ParameterAzure HandlingNotes
modelMapped to deployment_idDeployment ID must be configured
promptDirect pass-throughPrompt text for image generation
All other paramsDirect pass-throughUses OpenAI format
Azure uses the same conversion as OpenAI (see OpenAI Image Generation):
  • Model & Prompt: bifrostReq.Modelreq.Model (mapped to deployment), bifrostReq.Promptreq.Prompt
  • Parameters: All other fields from bifrostReq are embedded directly into the request struct via struct embedding

Configuration

curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "azure/dall-e-3",
    "prompt": "A sunset over the mountains",
    "size": "1024x1024",
    "n": 1,
    "deployment": "my-image-gen-deployment"
  }' \
  -H "api-key: YOUR_AZURE_API_KEY"

Response Conversion

  • Non-streaming: Azure responses are unmarshaled directly into BifrostImageGenerationResponse since Bifrost’s response schema is a superset of OpenAI’s format. All fields are passed through as-is.
  • Streaming: Azure streaming responses use Server-Sent Events (SSE) format with the same event types as OpenAI (see OpenAI Image Generation Streaming).

Streaming

Image generation streaming is supported and uses OpenAI’s streaming format with Server-Sent Events (SSE).

6. Image Edit

Requests use multipart/form-data, not JSON.
Image Edit is supported for OpenAI models on Azure and uses the OpenAI-compatible format. Azure uses the same conversion as OpenAI (see OpenAI Image Edit):
  • Request Conversion: Uses openai.HandleOpenAIImageEditRequest with Azure-specific URL construction
  • URL Format: {endpoint}/openai/deployments/{deployment}/images/edits?api-version={apiVersion}
  • Authentication: Azure API key or OAuth bearer token (via getAzureAuthHeaders)
  • Deployment Mapping: Model identifier mapped to Azure deployment ID
  • Response Conversion: Same as OpenAI - responses unmarshaled directly into BifrostImageGenerationResponse
  • Streaming: Supported via openai.HandleOpenAIImageEditStreamRequest with Azure-specific URL and authentication
Endpoint: /openai/deployments/{deployment}/images/edits?api-version={apiVersion}

7. List Models

Request Parameters

None required.

Response Conversion

Lists available models/deployments configured in the Azure key. Response includes model metadata, capabilities, and lifecycle status.
{
  "data": [
    {
      "id": "gpt-4",
      "object": "model",
      "created": 1687882411,
      "status": "active",
      "lifecycle_status": "stable",
      "capabilities": {
        "chat_completion": true,
        "embeddings": false
      }
    }
  ]
}

Caveats

Severity: High Behavior: Model names must map to Azure deployment IDs Impact: Request fails without valid deployment mapping Code: azure.go:145-200
Severity: Medium Behavior: Automatic detection of OpenAI vs Anthropic based on model name Impact: Different conversion logic applied transparently Code: azure.go:92-114
Severity: Medium Behavior: Responses API automatically uses preview API version, which differs from Chat Completions API version. For Anthropic models, Responses API specifically uses preview API version. Impact: Different API version for Responses vs Chat Completions. Automatic version override for Responses requests. Code: azure.go:92-114, azure.go:109-113, azure.go:694
Severity: Low Behavior: Model version differences ignored when matching to deployments Impact: gpt-4 and gpt-4-turbo can map to same deployment Code: models.go:13-58

Configuration

HTTP Settings: API Version 2024-10-21 (configurable) | Max Connections 5000 | Max Idle 60 seconds Endpoint Format: https://{resource-name}.openai.azure.com/openai/v1/{path}?api-version={version} Note: Bifrost automatically constructs URLs using the endpoint from key configuration and the configured API version.

Setup & Configuration

Azure requires endpoint URLs, deployment mappings, and API version configuration. For detailed instructions on setting up Azure authentication, see the quickstart guides:
See Provider-Specific Authentication - Azure in the Gateway Quickstart for configuration steps using Web UI, API, or config.json.