Skip to main content

Overview

Cohere has a different API structure from OpenAI’s format. Bifrost performs conversions including:
  • Parameter renaming - e.g., max_completion_tokensmax_tokens, top_pp, stopstop_sequences
  • Message content conversion - String and content block formats handled
  • Tool conversion - Tool definitions and tool choice mapped to Cohere format
  • Thinking/Reasoning transformation - reasoning parameters mapped to Cohere’s thinking structure
  • Response format conversion - JSON schema handling adapted to Cohere’s format

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/v2/chat
Responses API/v2/chat
Embeddings-/v2/embed
List Models-/v1/models
Text Completions-
Speech (TTS)-
Transcriptions (STT)-
Files-
Batch-
Unsupported Operations (❌): Text Completions, Speech, Transcriptions, Files, and Batch are not supported by the upstream Cohere API. These return UnsupportedOperationError.

1. Chat Completions

Request Parameters

Parameter Mapping

ParameterTransformation
max_completion_tokensRenamed to max_tokens
temperature, top_ppDirect pass-through for temperature; top_p renamed to p
stopRenamed to stop_sequences
frequency_penalty, presence_penaltyDirect pass-through
response_formatConverted to structured format (see Response Format)
toolsSchema structure adapted (see Tool Conversion)
tool_choiceType mapped (see Tool Conversion)
reasoningMapped to thinking (see Reasoning / Thinking)
userVia extra_params (not directly supported in Cohere v2 API)
top_kVia extra_params (Cohere-specific)

Dropped Parameters

The following parameters are silently ignored: logit_bias, logprobs, top_logprobs, seed, parallel_tool_calls, service_tier

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway) for Cohere-specific fields:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere/command-r-plus",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40,
    "safety_mode": "STRICT",
    "log_probs": true,
    "strict_tool_choice": false
  }'

Reasoning / Thinking

Documentation: See Bifrost Reasoning Reference

Parameter Mapping

  • reasoning.effortthinking.type (mapped to "enabled" or "disabled")
  • reasoning.max_tokensthinking.token_budget (token budget for thinking)

Critical Constraints

  • Minimum budget: 1 token required; requests with 0 tokens will be converted to disabled
  • Dynamic budget: -1 is converted to 1 automatically

Example

// Request
{"reasoning": {"effort": "high", "max_tokens": 2048}}

// Cohere conversion
{"thinking": {"type": "enabled", "token_budget": 2048}}

Message Conversion

Content Handling

  • String content: Messages can have simple string content
  • Content blocks: Messages can have arrays of content blocks (text, images, thinking)
  • Image conversion: image_url blocks with URL are supported
  • Tool calls: Converted from message assistant tool calls to Cohere format
  • Tool messages: Tool call results are passed with tool_call_id

Tool Conversion

Tool definitions are adapted to Cohere format with the following mappings:
  • Function namename (unchanged)
  • Function parametersparameters (flexible JSON format)
  • Strict mode (strict: true) is silently dropped (not supported)
Tool choice mapping:
  • "none""NONE"
  • "auto" or "required""REQUIRED" or "AUTO"
  • Specific tool selection → "REQUIRED" (Cohere uses function-level selection)

Response Format

Supported formats:
  • text - Plain text response
  • json_object - Structured JSON response
  • json_schema - JSON with schema validation (converted to json_object)
Schema is passed through response_format.json_schema field.

Response Conversion

Field Mapping

  • finish_reason: COMPLETE / STOP_SEQUENCEstop, MAX_TOKENSlength, TOOL_CALLtool_calls
  • input_tokensprompt_tokens | output_tokenscompletion_tokens
  • cached_tokensprompt_tokens_details.cached_tokens (if present)
  • Tool call arguments converted from string → string (no conversion needed, Cohere uses string format)

Streaming

Event sequence: message-startcontent-startcontent-deltacontent-endmessage-end Delta types:
  • content-delta with text → message content
  • content-delta with thinking → reasoning text
  • tool-call-start/delta/end → tool call events
  • tool-plan-delta → tool planning output

Caveats

Severity: Low Behavior: reasoning.max_tokens must be >= 1 Impact: Very low impact, conversion happens automatically Code: chat.go:104-130
Severity: Low Behavior: top_p parameter renamed to p Impact: Parameter name changes internally Code: chat.go:99
Severity: Low Behavior: strict: true in tool definitions silently dropped Impact: No schema validation enforcement Code: chat.go:168-185
Severity: Low Behavior: Tool arguments are already strings, no JSON serialization needed Impact: Minimal - Cohere v2 API expects string format Code: chat.go:70-78

2. Responses API

The Responses API uses the same underlying /v2/chat endpoint but converts between OpenAI’s Responses format and Cohere’s format.

Request Parameters

Parameter Mapping

ParameterTransformation
max_output_tokensRenamed to max_tokens
temperature, top_ppDirect pass-through for temperature; top_p renamed to p
instructionsBecomes system message
text.formatConverted to response_format
toolsSchema restructured (see Chat Completions)
tool_choiceType mapped (see Chat Completions)
reasoningMapped to thinking (see Reasoning / Thinking)
stopVia extra_params, renamed to stop_sequences
top_kVia extra_params (Cohere-specific)
frequency_penalty, presence_penaltyVia extra_params

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway):
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere/command-r-plus",
    "input": "Hello, how are you?",
    "top_k": 40,
    "stop": [".", "!"]
  }'

Input & Instructions

  • Input: String converted to user message or array converted to messages
  • Instructions: Becomes system message (prepended to messages)

Tool Support

Supported types: function Tool conversions same as Chat Completions.

Response Conversion

  • textmessage | tool_usefunction_call
  • input_tokens / output_tokens preserved
  • Token details with cached tokens support

Streaming

Event sequence: message-startcontent-startcontent-deltacontent-endmessage-end Special handling:
  • Tool call arguments accumulated across chunks
  • Synthetic output_item.added events emitted for text/reasoning
  • Stable item IDs generated as msg_{messageID}_item_{outputIndex}

3. Embeddings

Request Parameters

Parameter Mapping

ParameterTransformation
input (text or array)Converted to texts array
dimensionsRenamed to output_dimension
input_typeVia extra_params (required, defaults to "search_document")
embedding_typesVia extra_params (array of embedding types)
truncateVia extra_params (how to handle long inputs)
max_tokensVia extra_params (max tokens to embed per input)

Extra Parameters

Use extra_params for Cohere-specific embedding options:
curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere/embed-english-v3.0",
    "input": ["text to embed"],
    "input_type": "search_query",
    "embedding_types": ["float"],
    "truncate": "START"
  }'

Critical Notes

  • Input Type Required: Cohere v3+ models require input_type parameter (defaults to "search_document")
  • Embedding Types: Specify which embedding types to return (e.g., "float", "int8")

Response Conversion

  • embeddings.floatdata[].embedding
  • meta.tokens → usage information
  • Multiple embedding types handled

4. List Models

Request: GET /v1/models?page_size={defaultPageSize} Field mapping: Model data converted to standard format Pagination: Cursor-based with next_page_token Note: endpoint and default_only filters available via extra_params