> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getbifrost.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Cohere

> Cohere API conversion guide - parameter mapping, message handling, reasoning/thinking, and tool conversion

## Overview

Cohere has a different API structure from OpenAI's format. Bifrost performs conversions including:

* **Parameter renaming** - e.g., `max_completion_tokens` → `max_tokens`, `top_p` → `p`, `stop` → `stop_sequences`
* **Message content conversion** - String and content block formats handled
* **Tool conversion** - Tool definitions and tool choice mapped to Cohere format
* **Thinking/Reasoning transformation** - `reasoning` parameters mapped to Cohere's `thinking` structure
* **Response format conversion** - JSON schema handling adapted to Cohere's format

### Supported Operations

| Operation            | Non-Streaming | Streaming | Endpoint     |
| -------------------- | ------------- | --------- | ------------ |
| Chat Completions     | ✅             | ✅         | `/v2/chat`   |
| Responses API        | ✅             | ✅         | `/v2/chat`   |
| Embeddings           | ✅             | -         | `/v2/embed`  |
| List Models          | ✅             | -         | `/v1/models` |
| Text Completions     | ❌             | ❌         | -            |
| Image Generation     | ❌             | ❌         | -            |
| Speech (TTS)         | ❌             | ❌         | -            |
| Transcriptions (STT) | ❌             | ❌         | -            |
| Files                | ❌             | ❌         | -            |
| Batch                | ❌             | ❌         | -            |

<Note>
  **Unsupported Operations** (❌): Text Completions, Image Generation, Speech, Transcriptions, Files, and Batch are not supported by the upstream Cohere API. These return `UnsupportedOperationError`.
</Note>

***

# 1. Chat Completions

## Request Parameters

### Parameter Mapping

| Parameter                               | Transformation                                                           |
| --------------------------------------- | ------------------------------------------------------------------------ |
| `max_completion_tokens`                 | Renamed to `max_tokens`                                                  |
| `temperature`, `top_p` → `p`            | Direct pass-through for temperature; `top_p` renamed to `p`              |
| `stop`                                  | Renamed to `stop_sequences`                                              |
| `frequency_penalty`, `presence_penalty` | Direct pass-through                                                      |
| `response_format`                       | Converted to structured format (see [Response Format](#response-format)) |
| `tools`                                 | Schema structure adapted (see [Tool Conversion](#tool-conversion))       |
| `tool_choice`                           | Type mapped (see [Tool Conversion](#tool-conversion))                    |
| `reasoning`                             | Mapped to `thinking` (see [Reasoning / Thinking](#reasoning--thinking))  |
| `user`                                  | Via `extra_params` (not directly supported in Cohere v2 API)             |
| `top_k`                                 | Via `extra_params` (Cohere-specific)                                     |

### Dropped Parameters

The following parameters are silently ignored: `logit_bias`, `logprobs`, `top_logprobs`, `seed`, `parallel_tool_calls`, `service_tier`

### Extra Parameters

Use `extra_params` (SDK) or pass directly in request body (Gateway) for Cohere-specific fields:

<Tabs>
  <Tab title="Gateway">
    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "cohere/command-r-plus",
        "messages": [{"role": "user", "content": "Hello"}],
        "top_k": 40,
        "safety_mode": "STRICT",
        "log_probs": true,
        "strict_tool_choice": false
      }'
    ```
  </Tab>

  <Tab title="Go SDK">
    ```go theme={null}
    resp, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{
        Provider: schemas.Cohere,
        Model:    "cohere/command-r-plus",
        Input:    messages,
        Params: &schemas.ChatParameters{
            ExtraParams: map[string]interface{}{
                "top_k": 40,
                "safety_mode": "STRICT",
                "log_probs": true,
                "strict_tool_choice": false,
            },
        },
    })
    ```
  </Tab>
</Tabs>

## Reasoning / Thinking

**Documentation**: See [Bifrost Reasoning Reference](/providers/reasoning)

### Parameter Mapping

* `reasoning.effort` → `thinking.type` (mapped to `"enabled"` or `"disabled"`)
* `reasoning.max_tokens` → `thinking.token_budget` (token budget for thinking)

### Critical Constraints

* **Minimum budget**: 1 token required; requests with 0 tokens will be converted to disabled
* **Dynamic budget**: `-1` is converted to `1` automatically

### Example

```json theme={null}
// Request
{"reasoning": {"effort": "high", "max_tokens": 2048}}

// Cohere conversion
{"thinking": {"type": "enabled", "token_budget": 2048}}
```

## Message Conversion

### Content Handling

* **String content**: Messages can have simple string content
* **Content blocks**: Messages can have arrays of content blocks (text, images, thinking)
* **Image conversion**: `image_url` blocks with URL are supported
* **Tool calls**: Converted from message assistant tool calls to Cohere format
* **Tool messages**: Tool call results are passed with `tool_call_id`

## Tool Conversion

Tool definitions are adapted to Cohere format with the following mappings:

* Function `name` → `name` (unchanged)
* Function `parameters` → `parameters` (flexible JSON format)
* Strict mode (`strict: true`) is silently dropped (not supported)

Tool choice mapping:

* `"none"` → `"NONE"`
* `"auto"` or `"required"` → `"REQUIRED"` or `"AUTO"`
* Specific tool selection → `"REQUIRED"` (Cohere uses function-level selection)

## Response Format

Supported formats:

* `text` - Plain text response
* `json_object` - Structured JSON response
* `json_schema` - JSON with schema validation (converted to `json_object`)

Schema is passed through `response_format.json_schema` field.

## Response Conversion

### Field Mapping

* `finish_reason`: `COMPLETE` / `STOP_SEQUENCE` → `stop`, `MAX_TOKENS` → `length`, `TOOL_CALL` → `tool_calls`
* `input_tokens` → `prompt_tokens` | `output_tokens` → `completion_tokens`
* `cached_tokens` → `prompt_tokens_details.cached_tokens` (if present)
* Tool call arguments converted from string → string (no conversion needed, Cohere uses string format)

## Streaming

Event sequence: `message-start` → `content-start` → `content-delta` → `content-end` → `message-end`

Delta types:

* `content-delta` with text → message content
* `content-delta` with thinking → reasoning text
* `tool-call-start/delta/end` → tool call events
* `tool-plan-delta` → tool planning output

***

## Caveats

<Accordion title="Minimum Thinking Budget">
  **Severity**: Low
  **Behavior**: `reasoning.max_tokens` must be >= 1
  **Impact**: Very low impact, conversion happens automatically
  **Code**: `chat.go:104-130`
</Accordion>

<Accordion title="Top P Renamed">
  **Severity**: Low
  **Behavior**: `top_p` parameter renamed to `p`
  **Impact**: Parameter name changes internally
  **Code**: `chat.go:99`
</Accordion>

<Accordion title="Strict Tool Mode Dropped">
  **Severity**: Low
  **Behavior**: `strict: true` in tool definitions silently dropped
  **Impact**: No schema validation enforcement
  **Code**: `chat.go:168-185`
</Accordion>

<Accordion title="Tool Arguments Format">
  **Severity**: Low
  **Behavior**: Tool arguments are already strings, no JSON serialization needed
  **Impact**: Minimal - Cohere v2 API expects string format
  **Code**: `chat.go:70-78`
</Accordion>

***

# 2. Responses API

The Responses API uses the same underlying `/v2/chat` endpoint but converts between OpenAI's Responses format and Cohere's format.

## Request Parameters

### Parameter Mapping

| Parameter                               | Transformation                                                          |
| --------------------------------------- | ----------------------------------------------------------------------- |
| `max_output_tokens`                     | Renamed to `max_tokens`                                                 |
| `temperature`, `top_p` → `p`            | Direct pass-through for temperature; `top_p` renamed to `p`             |
| `instructions`                          | Becomes system message                                                  |
| `text.format`                           | Converted to `response_format`                                          |
| `tools`                                 | Schema restructured (see [Chat Completions](#1-chat-completions))       |
| `tool_choice`                           | Type mapped (see [Chat Completions](#1-chat-completions))               |
| `reasoning`                             | Mapped to `thinking` (see [Reasoning / Thinking](#reasoning--thinking)) |
| `stop`                                  | Via `extra_params`, renamed to `stop_sequences`                         |
| `top_k`                                 | Via `extra_params` (Cohere-specific)                                    |
| `frequency_penalty`, `presence_penalty` | Via `extra_params`                                                      |

### Extra Parameters

Use `extra_params` (SDK) or pass directly in request body (Gateway):

<Tabs>
  <Tab title="Gateway">
    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/responses \
      -H "Content-Type: application/json" \
      -d '{
        "model": "cohere/command-r-plus",
        "input": "Hello, how are you?",
        "top_k": 40,
        "stop": [".", "!"]
      }'
    ```
  </Tab>

  <Tab title="Go SDK">
    ```go theme={null}
    resp, err := client.ResponsesRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostResponsesRequest{
        Provider: schemas.Cohere,
        Model:    "cohere/command-r-plus",
        Input:    messages,
        Params: &schemas.ResponsesParameters{
            ExtraParams: map[string]interface{}{
                "top_k": 40,
                "stop": []string{".", "!"},
            },
        },
    })
    ```
  </Tab>
</Tabs>

## Input & Instructions

* **Input**: String converted to user message or array converted to messages
* **Instructions**: Becomes system message (prepended to messages)

## Tool Support

Supported types: `function`

Tool conversions same as [Chat Completions](#1-chat-completions).

## Response Conversion

* `text` → `message` | `tool_use` → `function_call`
* `input_tokens` / `output_tokens` preserved
* Token details with cached tokens support

## Streaming

Event sequence: `message-start` → `content-start` → `content-delta` → `content-end` → `message-end`

Special handling:

* Tool call arguments accumulated across chunks
* Synthetic `output_item.added` events emitted for text/reasoning
* Stable item IDs generated as `msg_{messageID}_item_{outputIndex}`

***

# 3. Embeddings

## Request Parameters

### Parameter Mapping

| Parameter               | Transformation                                                 |
| ----------------------- | -------------------------------------------------------------- |
| `input` (text or array) | Converted to `texts` array                                     |
| `dimensions`            | Renamed to `output_dimension`                                  |
| `input_type`            | Via `extra_params` (required, defaults to `"search_document"`) |
| `embedding_types`       | Via `extra_params` (array of embedding types)                  |
| `truncate`              | Via `extra_params` (how to handle long inputs)                 |
| `max_tokens`            | Via `extra_params` (max tokens to embed per input)             |

### Extra Parameters

Use `extra_params` for Cohere-specific embedding options:

<Tabs>
  <Tab title="Gateway">
    ```bash theme={null}
    curl -X POST http://localhost:8080/v1/embeddings \
      -H "Content-Type: application/json" \
      -d '{
        "model": "cohere/embed-english-v3.0",
        "input": ["text to embed"],
        "input_type": "search_query",
        "embedding_types": ["float"],
        "truncate": "START"
      }'
    ```
  </Tab>

  <Tab title="Go SDK">
    ```go theme={null}
    resp, err := client.EmbeddingRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostEmbeddingRequest{
        Provider: schemas.Cohere,
        Model:    "cohere/embed-english-v3.0",
        Input: &schemas.EmbeddingInput{
            Texts: []string{"text to embed"},
        },
        Params: &schemas.EmbeddingParameters{
            Dimensions: schemas.Ptr(1024),
            ExtraParams: map[string]interface{}{
                "input_type": "search_query",
                "embedding_types": []string{"float"},
                "truncate": "START",
            },
        },
    })
    ```
  </Tab>
</Tabs>

### Critical Notes

* **Input Type Required**: Cohere v3+ models require `input_type` parameter (defaults to `"search_document"`)
* **Embedding Types**: Specify which embedding types to return (e.g., `"float"`, `"int8"`)

## Response Conversion

* `embeddings.float` → `data[].embedding`
* `meta.tokens` → usage information
* Multiple embedding types handled

***

# 4. List Models

**Request**: GET `/v1/models?page_size={defaultPageSize}`

**Field mapping**: Model data converted to standard format

**Pagination**: Cursor-based with `next_page_token`

**Note**: `endpoint` and `default_only` filters available via `extra_params`
