Skip to main content

Overview

Anthropic has significant structural differences from OpenAI’s format. Bifrost performs extensive conversion including:
  • System message extraction - Removed from messages array, placed in separate system field
  • Tool message grouping - Consecutive tool messages merged into single user message
  • Thinking block transformation - reasoning parameters mapped to Anthropic’s thinking structure
  • Parameter renaming - e.g., max_completion_tokensmax_tokens, stopstop_sequences
  • Content format conversion - Images, files, and other content types adapted to Anthropic’s schema

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/v1/messages
Responses API/v1/messages
Text Completions/v1/complete
Embeddings-
Speech (TTS)-
Transcriptions (STT)-
Image Generation-
Files-/v1/files
Batch-/v1/messages/batches
List Models-/v1/models
Unsupported Operations (❌): Embeddings, Speech, Transcriptions, and Image Generation are not supported by the upstream Anthropic API. These return UnsupportedOperationError.

1. Chat Completions

Request Parameters

Parameter Mapping

ParameterTransformation
max_completion_tokensRenamed to max_tokens
temperature, top_pDirect pass-through
stopRenamed to stop_sequences
response_formatConverted to output_format
toolsSchema restructured (see Tool Conversion)
tool_choiceType mapped (see Tool Conversion)
reasoningMapped to thinking (see Reasoning / Thinking)
userWrapped in metadata.user_id
top_kVia extra_params (Anthropic-specific)

Dropped Parameters

The following parameters are silently ignored: frequency_penalty, presence_penalty, logit_bias, logprobs, top_logprobs, seed, parallel_tool_calls, service_tier

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway) for Anthropic-specific fields:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40
  }'

Cache Control

Cache directives can be added to system messages, user messages, and tool definitions to enable prompt caching:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "This is cached context",
            "cache_control": {"type": "ephemeral"}
          }
        ]
      }
    ],
    "system": [
      {
        "type": "text",
        "text": "You are a helpful assistant",
        "cache_control": {"type": "ephemeral"}
      }
    ]
  }'

Reasoning / Thinking

Documentation: See Bifrost Reasoning Reference

Parameter Mapping

  • reasoning.effortthinking.type (always mapped to "enabled")
  • reasoning.max_tokensthinking.budget_tokens (token budget for thinking)

Critical Constraints

  • Minimum budget: 1024 tokens required; requests below this fail with error
  • Dynamic budget: -1 is converted to 1024 automatically

Example

// Request
{"reasoning": {"effort": "high", "max_tokens": 2048}}

// Anthropic conversion
{"thinking": {"type": "enabled", "budget_tokens": 2048}}

Message Conversion

Critical Caveats

  • System message extraction: System messages are removed from messages array and placed in separate system field. Multiple system messages become separate text blocks in the system array.
  • Tool message grouping: Consecutive tool messages are merged into single user message with tool_result content blocks.

Image Conversion

  • URL images: {"type": "image_url", "image_url": {}}{"type": "image", "source": {"type": "url", ...}}
  • Base64 images: Data URL → {"type": "image", "source": {"type": "base64", "media_type": "image/png", ...}}

Cache Control Locations

Cache directives supported on: system content blocks, user message content blocks, tool definitions (see Cache Control examples above)

Tool Conversion

Tool definitions are restructured: function.namename, function.parametersinput_schema, function.strict is dropped. Tool choice mapping: "auto"auto | "none"none | "required"any | Specific tool → {"type": "tool", "name": "X"}

Response Conversion

Field Mapping

  • stop_reasonfinish_reason: end_turn/stop_sequencestop, max_tokenslength, tool_usetool_calls
  • input_tokensprompt_tokens | output_tokenscompletion_tokens
  • Cache tokens: cache_read_input_tokensprompt_tokens_details.cached_tokens (cannot distinguish between read vs creation)
  • thinking blocks → reasoning_details with index, type, text, and signature fields
  • Tool call arguments converted from JSON object → JSON string

Streaming

Event sequence: message_startcontent_block_startcontent_block_deltacontent_block_stopmessage_deltamessage_stop Delta types: text_delta → content | input_json_delta → tool arguments | thinking_delta → reasoning text | signature_delta → reasoning signature

Caveats

Severity: High Behavior: System messages removed from array, placed in separate system field Impact: Message array structure differs from input Code: chat.go:145-167
Severity: High Behavior: Consecutive tool messages merged into single user message Impact: Message count and structure changes Code: chat.go:169-216
Severity: High Behavior: reasoning.max_tokens must be >= 1024 Impact: Requests with lower values fail with error Code: chat.go:113-115
Severity: Medium Behavior: reasoning.max_tokens = -1 converted to 1024 Impact: Dynamic budgeting not supported Code: chat.go:107-111
Severity: Medium Behavior: strict: true in tool definitions silently dropped Impact: No schema validation enforcement Code: chat.go:43-72
Severity: Low Behavior: Tool call input (object) serialized to arguments (JSON string) Code: chat.go:341-350

2. Responses API

The Responses API uses the same underlying /v1/messages endpoint but converts between OpenAI’s Responses format and Anthropic’s Messages format.

Request Parameters

Parameter Mapping

ParameterTransformation
max_output_tokensRenamed to max_tokens
temperature, top_pDirect pass-through
instructionsBecomes system message
toolsSchema restructured (see Chat Completions)
tool_choiceType mapped (see Chat Completions)
reasoningMapped to thinking (see Reasoning / Thinking)
userWrapped in metadata.user_id
textConverted to output_format
includeVia extra_params (Anthropic-specific)
stopVia extra_params, renamed to stop_sequences
top_kVia extra_params (Anthropic-specific)
truncationAuto-set to "auto" for computer tools

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway):
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "input": "Hello, how are you?",
    "top_k": 40
  }'

Cache Control

Cache directives can be added to instructions (system) and input messages to enable prompt caching:
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "instructions": "You are a helpful assistant. This instruction is cached.",
    "instructions_cache_control": {"type": "ephemeral"},
    "input": [
      {
        "type": "text",
        "text": "Answer this question",
        "cache_control": {"type": "ephemeral"}
      }
    ]
  }'

Input & Instructions

  • Input: String wrapped as user message or array converted to messages
  • Instructions: Becomes system message (same extraction as Chat Completions)

Tool Support

Supported types: function, computer_use_preview, web_search, mcp Tool conversions same as Chat Completions with: MCP tools mapped to mcp_servers (server_label → name, server_url → url) and computer tools auto-set with truncation: "auto" Cache control supported on instructions and input blocks (see Cache Control examples)

Response Conversion

  • stop_reasonstatus: end_turn/stop_sequencecompleted, max_tokensincomplete
  • input_tokens/output_tokens preserved with cache tokens → *_tokens_details.cached_tokens
  • Output items: textmessage | tool_usefunction_call | thinkingreasoning

Streaming

Event sequence: message_startcontent_block_startcontent_block_deltacontent_block_stopmessage_deltamessage_stop Special handling: Computer tool arguments accumulated across chunks (emitted on content_block_stop), synthetic content_part.added events emitted for text/reasoning, MCP calls use mcp_call_arguments_delta, item IDs generated as msg_{messageID}_item_{outputIndex}

3. Text Completions (Legacy)

Legacy API using /v1/complete endpoint. Streaming not supported.
Request: prompt auto-wrapped with \n\nHuman: {prompt}\n\nAssistant: | max_tokensmax_tokens_to_sample | temperature, top_p direct pass-through | top_k, stop via extra_params (→ stop_sequences) Response: completionchoices[0].text | stop_reasonfinish_reason

4. Batch API

Request formats: requests array (CustomID + Params) or input_file_id Pagination: Cursor-based with after_id, before_id, limit Endpoints:
  • POST /v1/messages/batches - Create
  • GET /v1/messages/batches - List
  • GET /v1/messages/batches/{batch_id} - Retrieve
  • POST /v1/messages/batches/{batch_id}/cancel - Cancel
Response: JSONL format with {custom_id, result: {type, message}} Status mapping: in_progressInProgress, cancelingCancelling, endedEnded Note: RFC3339Nano timestamps converted to Unix, multi-key retry supported

5. Files API

Requires beta header: anthropic-beta: files-api-2025-04-14
Upload: Multipart/form-data with file (required) and filename (optional) Field mapping: id | filename | size_bytesbytes | created_at (Unix) | mime_typecontent_type Endpoints: POST /v1/files, GET /v1/files (cursor pagination), GET /v1/files/{file_id}, DELETE /v1/files/{file_id}, GET /v1/files/{file_id}/content Note: File purpose always "batch", status always "processed"

6. List Models

Request: GET /v1/models?limit={defaultPageSize} (no body) Field mapping: id (prefixed anthropic/) | display_namename | created_at (Unix timestamp) Pagination: Token-based with NextPageToken, FirstID, LastID Multi-key support: Results aggregated from all keys, filtered by allowed_models if configured