Skip to main content

Overview

Bifrost provides complete Anthropic API compatibility through protocol adaptation. The integration handles request transformation, response normalization, and error mapping between Anthropic’s Messages API specification and Bifrost’s internal processing pipeline. This integration enables you to utilize Bifrost’s features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing Anthropic SDK-based architecture. Endpoint: /anthropic

Setup

import anthropic

# Configure client to use Bifrost
client = anthropic.Anthropic(
    base_url="http://localhost:8080/anthropic",
    api_key="dummy-key"  # Keys handled by Bifrost
)

# Make requests as usual
response = client.messages.create(
    model="claude-3-sonnet-20240229",
    max_tokens=1000,
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.content[0].text)

Provider/Model Usage Examples

Use multiple providers through the same Anthropic SDK format by prefixing model names with the provider:
import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:8080/anthropic",
    api_key="dummy-key"
)

# Anthropic models (default)
anthropic_response = client.messages.create(
    model="claude-3-sonnet-20240229",
    max_tokens=1000,
    messages=[{"role": "user", "content": "Hello from Claude!"}]
)

# OpenAI models via Anthropic SDK format
openai_response = client.messages.create(
    model="openai/gpt-4o-mini",
    max_tokens=1000,
    messages=[{"role": "user", "content": "Hello from OpenAI!"}]
)

# Google Vertex models via Anthropic SDK format
vertex_response = client.messages.create(
    model="vertex/gemini-pro",
    max_tokens=1000,
    messages=[{"role": "user", "content": "Hello from Gemini!"}]
)

# Azure models
azure_response = client.messages.create(
    model="azure/gpt-4o",
    max_tokens=1000,
    messages=[{"role": "user", "content": "Hello from Azure!"}]
)

# Local Ollama models
ollama_response = client.messages.create(
    model="ollama/llama3.1:8b",
    max_tokens=1000,
    messages=[{"role": "user", "content": "Hello from Ollama!"}]
)

Adding Custom Headers

Pass custom headers required by Bifrost plugins (like governance, telemetry, etc.):
import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:8080/anthropic",
    api_key="dummy-key",
    default_headers={
        "x-bf-vk": "vk_12345",  # Virtual key for governance
    }
)

response = client.messages.create(
    model="claude-3-sonnet-20240229",
    max_tokens=1000,
    messages=[{"role": "user", "content": "Hello with custom headers!"}]
)

Using Direct Keys

Pass API keys directly in requests to bypass Bifrost’s load balancing. You can pass any provider’s API key (OpenAI, Anthropic, Mistral, etc.) since Bifrost only looks for Authorization or x-api-key headers. This requires the Allow Direct API keys option to be enabled in Bifrost configuration.
Learn more: See Key Management for enabling direct API key usage.
import anthropic

# Using Anthropic's API key directly
client_with_direct_key = anthropic.Anthropic(
    base_url="http://localhost:8080/anthropic",
    api_key="sk-your-anthropic-key"  # Anthropic's API key works
)

anthropic_response = client_with_direct_key.messages.create(
    model="claude-3-sonnet-20240229",
    max_tokens=1000,
    messages=[{"role": "user", "content": "Hello from Claude!"}]
)

# or pass different provider keys per request using headers
client = anthropic.Anthropic(
    base_url="http://localhost:8080/anthropic",
    api_key="dummy-key"
)

# Use Anthropic key for Claude
anthropic_response = client.messages.create(
    model="claude-3-sonnet-20240229",
    max_tokens=1000,
    messages=[{"role": "user", "content": "Hello Claude!"}],
    extra_headers={
        "x-api-key": "sk-ant-your-anthropic-key"
    }
)

# Use OpenAI key for GPT models
openai_response = client.messages.create(
    model="openai/gpt-4o-mini",
    max_tokens=1000,
    messages=[{"role": "user", "content": "Hello GPT!"}],
    extra_headers={
        "Authorization": "Bearer sk-your-openai-key"
    }
)

Async Inference

Submit inference requests asynchronously and poll for results later using the x-bf-async header. This is useful for long-running requests where you don’t want to hold a connection open. See Async Inference for full details.
Async inference requires a Logs Store to be configured and is not compatible with streaming.

Messages

import anthropic
import time

client = anthropic.Anthropic(
    base_url="http://localhost:8080/anthropic",
    api_key="dummy-key"
)

# Submit async request
initial = client.messages.create(
    model="anthropic/claude-sonnet-4-20250514",
    max_tokens=256,
    messages=[{"role": "user", "content": "Tell me a short story."}],
    extra_headers={"x-bf-async": "true"}
)

# If content is present, the request completed synchronously
if initial.content:
    print(initial.content[0].text)
else:
    # Poll until completed
    while True:
        time.sleep(2)
        poll = client.messages.create(
            model="anthropic/claude-sonnet-4-20250514",
            max_tokens=256,
            messages=[{"role": "user", "content": "Tell me a short story."}],
            extra_headers={"x-bf-async-id": initial.id}
        )
        if poll.content:
            print(poll.content[0].text)
            break

Async Headers

HeaderDescription
x-bf-async: trueSubmit the request as an async job. Returns immediately with a job ID.
x-bf-async-id: <job-id>Poll for results of a previously submitted async job.
x-bf-async-job-result-ttl: <seconds>Override the default result TTL (default: 3600s).

Supported Features

The Anthropic integration supports all features that are available in both the Anthropic SDK and Bifrost core functionality. If the Anthropic SDK supports a feature and Bifrost supports it, the integration will work seamlessly.

Next Steps