Skip to main content

Overview

Anthropic has significant structural differences from OpenAI’s format. Bifrost performs extensive conversion including:
  • System message extraction - Removed from messages array, placed in separate system field
  • Tool message grouping - Consecutive tool messages merged into single user message
  • Thinking block transformation - reasoning parameters mapped to Anthropic’s thinking structure
  • Parameter renaming - e.g., max_completion_tokensmax_tokens, stopstop_sequences
  • Content format conversion - Images, files, and other content types adapted to Anthropic’s schema

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/v1/messages
Responses API/v1/messages
Text Completions/v1/complete
Embeddings-
Speech (TTS)-
Transcriptions (STT)-
Image Generation-
Files-/v1/files
Batch-/v1/messages/batches
List Models-/v1/models
Unsupported Operations (❌): Embeddings, Speech, Transcriptions, and Image Generation are not supported by the upstream Anthropic API. These return UnsupportedOperationError.

Beta Headers

Bifrost automatically manages Anthropic beta headers — detecting required headers from request features and injecting them. Headers are validated per provider to prevent unsupported headers from reaching the upstream API.
Beta HeaderAnthropicAzureVertexBedrockAuto-Injected
computer-use-2025-01-24 / computer-use-2025-11-24✅ (tool type detection)
structured-outputs-2025-11-13✅ (strict/output_format)
advanced-tool-use-2025-11-20✅ (defer_loading/input_examples/allowed_callers)
mcp-client-2025-11-20✅ (mcp_servers detection)
prompt-caching-scope-2026-01-05✅ (cache_control.scope)
compact-2026-01-12✅ (compaction edit)
context-management-2025-06-27✅ (clear edits)
files-api-2025-04-14✅ (files endpoint)
interleaved-thinking-2025-05-14✅ (thinking enabled/adaptive)
skills-2025-10-02Passthrough
context-1m-2025-08-07Passthrough
fast-mode-2026-02-01✅ (speed=fast)
redact-thinking-2026-02-12Passthrough
Passthrough headers are not auto-injected but are validated and forwarded when set manually via the anthropic-beta request header. Unknown headers are forwarded to Anthropic only; for other providers (Vertex, Bedrock, Azure), unknown headers are silently dropped by default to prevent upstream errors.Beta header overrides: You can override the default support per provider via the Beta Headers tab in provider configuration, or by setting beta_header_overrides in the provider’s network_config. See Beta Header Overrides for details.

1. Chat Completions

Request Parameters

Parameter Mapping

ParameterTransformation
max_completion_tokensRenamed to max_tokens
temperature, top_pDirect pass-through
stopRenamed to stop_sequences
response_formatConverted to output_format
toolsSchema restructured (see Tool Conversion)
tool_choiceType mapped (see Tool Conversion)
reasoningMapped to thinking (see Reasoning / Thinking)
userWrapped in metadata.user_id
top_kVia extra_params (Anthropic-specific)

Dropped Parameters

The following parameters are silently ignored: frequency_penalty, presence_penalty, logit_bias, logprobs, top_logprobs, seed, parallel_tool_calls, service_tier

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway) for Anthropic-specific fields:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40
  }'
Anthropic also accepts a top-level "cache_control": {"type": "ephemeral"} object on /anthropic/v1/messages requests to enable automatic prompt caching, and Bifrost now forwards that directive through unchanged.

Cache Control

Cache directives can be added to system messages, user messages, and tool definitions to enable prompt caching:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "This is cached context",
            "cache_control": {"type": "ephemeral"}
          }
        ]
      }
    ],
    "system": [
      {
        "type": "text",
        "text": "You are a helpful assistant",
        "cache_control": {"type": "ephemeral"}
      }
    ]
  }'

Reasoning / Thinking

Documentation: See Bifrost Reasoning Reference

Parameter Mapping

  • reasoning.effortthinking.type (always mapped to "enabled")
  • reasoning.max_tokensthinking.budget_tokens (token budget for thinking)

Critical Constraints

  • Minimum budget: 1024 tokens required; requests below this fail with error
  • Dynamic budget: -1 is converted to 1024 automatically

Example

// Request
{"reasoning": {"effort": "high", "max_tokens": 2048}}

// Anthropic conversion
{"thinking": {"type": "enabled", "budget_tokens": 2048}}

Message Conversion

Critical Caveats

  • System message extraction: System messages are removed from messages array and placed in separate system field. Multiple system messages become separate text blocks in the system array.
  • Tool message grouping: Consecutive tool messages are merged into single user message with tool_result content blocks.

Image Conversion

  • URL images: {"type": "image_url", "image_url": {}}{"type": "image", "source": {"type": "url", ...}}
  • Base64 images: Data URL → {"type": "image", "source": {"type": "base64", "media_type": "image/png", ...}}

Cache Control Locations

Cache directives supported on: system content blocks, user message content blocks, tool definitions (see Cache Control examples above)

Tool Conversion

Tool definitions are restructured: function.namename, function.parametersinput_schema, function.strict is dropped. Tool choice mapping: "auto"auto | "none"none | "required"any | Specific tool → {"type": "tool", "name": "X"}

Response Conversion

Field Mapping

  • stop_reasonfinish_reason: end_turn/stop_sequencestop, max_tokenslength, tool_usetool_calls
  • input_tokens + cache_read_input_tokens + cache_creation_input_tokensprompt_tokens (all cache counts rolled into the total)
  • Cache token breakdown surfaced in prompt_tokens_details:
    • cache_read_input_tokensprompt_tokens_details.cached_read_tokens
    • cache_creation_input_tokensprompt_tokens_details.cached_write_tokens
  • output_tokenscompletion_tokens
  • thinking blocks → reasoning_details with index, type, text, and signature fields
  • Tool call arguments converted from JSON object → JSON string

Streaming

Event sequence: message_startcontent_block_startcontent_block_deltacontent_block_stopmessage_deltamessage_stop Delta types: text_delta → content | input_json_delta → tool arguments | thinking_delta → reasoning text | signature_delta → reasoning signature

Caveats

Severity: High Behavior: System messages removed from array, placed in separate system field Impact: Message array structure differs from input Code: chat.go:145-167
Severity: High Behavior: Consecutive tool messages merged into single user message Impact: Message count and structure changes Code: chat.go:169-216
Severity: High Behavior: reasoning.max_tokens must be >= 1024 Impact: Requests with lower values fail with error Code: chat.go:113-115
Severity: Medium Behavior: reasoning.max_tokens = -1 converted to 1024 Impact: Dynamic budgeting not supported Code: chat.go:107-111
Severity: Medium Behavior: strict: true in tool definitions silently dropped Impact: No schema validation enforcement Code: chat.go:43-72
Severity: Low Behavior: Tool call input (object) serialized to arguments (JSON string) Code: chat.go:341-350

2. Responses API

The Responses API uses the same underlying /v1/messages endpoint but converts between OpenAI’s Responses format and Anthropic’s Messages format.

Request Parameters

Parameter Mapping

ParameterTransformation
max_output_tokensRenamed to max_tokens
temperature, top_pDirect pass-through
instructionsBecomes system message
toolsSchema restructured (see Chat Completions)
tool_choiceType mapped (see Chat Completions)
reasoningMapped to thinking (see Reasoning / Thinking)
userWrapped in metadata.user_id
textConverted to output_format
includeVia extra_params (Anthropic-specific)
stopVia extra_params, renamed to stop_sequences
top_kVia extra_params (Anthropic-specific)
truncationAuto-set to "auto" for computer tools

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway):
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "input": "Hello, how are you?",
    "top_k": 40
  }'

Cache Control

Cache directives can be added to instructions (system) and input messages to enable prompt caching:
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "instructions": "You are a helpful assistant. This instruction is cached.",
    "instructions_cache_control": {"type": "ephemeral"},
    "input": [
      {
        "type": "text",
        "text": "Answer this question",
        "cache_control": {"type": "ephemeral"}
      }
    ]
  }'

Input & Instructions

  • Input: String wrapped as user message or array converted to messages
  • Instructions: Becomes system message (same extraction as Chat Completions)

Tool Support

Supported types: function, computer_use_preview, web_search, mcp Tool conversions same as Chat Completions with: MCP tools mapped to mcp_servers (server_label → name, server_url → url) and computer tools auto-set with truncation: "auto" Cache control supported on instructions and input blocks (see Cache Control examples)

Response Conversion

  • stop_reasonstatus: end_turn/stop_sequencecompleted, max_tokensincomplete
  • Top-level input_tokens and output_tokens are rollups that include cache-related usage; they map as input_tokensinput_tokens | output_tokensoutput_tokens.
  • Cache-specific counts are exposed in details: cache_read_input_tokensinput_tokens_details.cached_read_tokens | cache_creation_input_tokensinput_tokens_details.cached_write_tokens
  • Output items: textmessage | tool_usefunction_call | thinkingreasoning

Streaming

Event sequence: message_startcontent_block_startcontent_block_deltacontent_block_stopmessage_deltamessage_stop Special handling: Computer tool arguments accumulated across chunks (emitted on content_block_stop), synthetic content_part.added events emitted for text/reasoning, MCP calls use mcp_call_arguments_delta, item IDs generated as msg_{messageID}_item_{outputIndex}

3. Text Completions (Legacy)

Legacy API using /v1/complete endpoint. Streaming not supported.
Request: prompt auto-wrapped with \n\nHuman: {prompt}\n\nAssistant: | max_tokensmax_tokens_to_sample | temperature, top_p direct pass-through | top_k, stop via extra_params (→ stop_sequences) Response: completionchoices[0].text | stop_reasonfinish_reason

4. Batch API

Request formats: requests array (CustomID + Params) or input_file_id Pagination: Cursor-based with after_id, before_id, limit Endpoints:
  • POST /v1/messages/batches - Create
  • GET /v1/messages/batches - List
  • GET /v1/messages/batches/{batch_id} - Retrieve
  • POST /v1/messages/batches/{batch_id}/cancel - Cancel
Response: JSONL format with {custom_id, result: {type, message}} Status mapping: in_progressInProgress, cancelingCancelling, endedEnded Note: RFC3339Nano timestamps converted to Unix, multi-key retry supported

5. Files API

Requires beta header: anthropic-beta: files-api-2025-04-14
Upload: Multipart/form-data with file (required) and filename (optional) Field mapping: id | filename | size_bytesbytes | created_at (Unix) | mime_typecontent_type Endpoints: POST /v1/files, GET /v1/files (cursor pagination), GET /v1/files/{file_id}, DELETE /v1/files/{file_id}, GET /v1/files/{file_id}/content Note: File purpose always "batch", status always "processed"

6. List Models

Request: GET /v1/models?limit={defaultPageSize} (no body) Field mapping: id (prefixed anthropic/) | display_namename | created_at (Unix timestamp) Pagination: Token-based with NextPageToken, FirstID, LastID Multi-key support: Results aggregated from all keys, filtered by allowed_models if configured