Reasoning - Bifrost

Overview

Reasoning (also called “thinking” in some providers) allows AI models to show their step-by-step thought process before providing a final answer. This feature is available across multiple providers with different implementations.

Bifrost normalizes all provider-specific reasoning formats to a consistent OpenAI-compatible structure using reasoning in requests and reasoning_details in responses.

Provider Support Matrix

Provider	Request Field	Response Field	Min Budget	Effort Levels	Streaming
OpenAI	`reasoning`	`reasoning_details`	None	`minimal`, `low`, `medium`, `high`	✅
Anthropic	`thinking`	Content blocks	1024 tokens	`enabled` only	✅
Bedrock (Anthropic)	`thinking`	Content blocks	1024 tokens	`enabled` only	✅
Gemini 2.5+	`thinking_config`	`thought` parts	1024	Budget-only	✅
Gemini 3.0+	`thinking_config`	`thought` parts	1024	`minimal`, `low`, `medium`, `high` + Budget	✅

Request Configuration

Chat Completions API

JSON
Go SDK

{
  "model": "provider/model-name",
  "messages": [...],
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}

package main

import (
	"github.com/maximhq/bifrost"
	"github.com/maximhq/bifrost/core/schemas"
)

chatReq := &schemas.BifrostChatRequest{
	Provider: schemas.OpenAI,
	Model:    "gpt-4o",
	Input: []schemas.ChatMessage{
		{
			Role: schemas.ChatMessageRoleUser,
			Content: &schemas.ChatMessageContent{
				ContentStr: schemas.Ptr("Explain quantum computing"),
			},
		},
	},
	Params: &schemas.ChatParameters{
		MaxCompletionTokens: schemas.Ptr(4096),
		Reasoning: &schemas.ChatReasoning{
			Effort:    schemas.Ptr("high"),
			MaxTokens: schemas.Ptr(4096),
		},
	},
}

Responses API

JSON
Go SDK

{
  "model": "provider/model-name",
  "input": [...],
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096,
    "summary": "detailed"
  }
}

package main

import (
	"github.com/maximhq/bifrost/core/schemas"
)

responsesReq := &schemas.BifrostResponsesRequest{
	Provider: schemas.Anthropic,
	Model:    "claude-3-5-sonnet-20241022",
	Input: []schemas.ResponsesMessage{
		{
			Role: schemas.Ptr(schemas.ResponsesInputMessageRoleUser),
			Content: &schemas.ResponsesMessageContent{
				ContentStr: schemas.Ptr("Explain quantum computing"),
			},
		},
	},
	Params: &schemas.ResponsesParameters{
		MaxOutputTokens: schemas.Ptr(4096),
		Reasoning: &schemas.ResponsesParametersReasoning{
			Effort:    schemas.Ptr("high"),
			MaxTokens: schemas.Ptr(4096),
			Summary:   schemas.Ptr("detailed"),
		},
	},
}

Responses API supports both effort + max_tokens (like Chat Completions) and adds the optional summary parameter for output summarization.

Parameter Reference

Chat Completions API Parameters

Parameter	Type	Description
`effort`	`string`	Reasoning intensity level
`max_tokens`	`int`	Maximum tokens for reasoning (budget)

Responses API Parameters

Parameter	Type	Description
`effort`	`string`	Reasoning intensity level
`max_tokens`	`int`	Maximum tokens for reasoning (budget)
`summary`	`string`	Summary level: `brief`, `detailed`, or `json`

Responses API accepts the same effort and max_tokens parameters as Chat Completions, but adds an optional summary parameter for reasoning output summarization.

Provider-Specific Conversions

OpenAI

OpenAI uses effort-based reasoning only. Bifrost applies priority logic:

If reasoning.effort is provided → use it directly
Else if reasoning.max_tokens is provided → estimate effort from it
The max_tokens field is cleared before sending to OpenAI

Conversion Examples:

Effort (JSON)
Effort (Go)
Max Tokens (JSON)
Max Tokens (Go)

// Bifrost Request (with effort)
{
  "reasoning": {
    "effort": "high"
  }
}

// OpenAI Request Sent
{
  "reasoning": {
    "effort": "high"
  }
}

// Bifrost request with effort (native field)
chatReq := &schemas.BifrostChatRequest{
	Provider: schemas.OpenAI,
	Model:    "gpt-4o",
	Input:    messages,
	Params: &schemas.ChatParameters{
		MaxCompletionTokens: schemas.Ptr(4096),
		Reasoning: &schemas.ChatReasoning{
			Effort: schemas.Ptr("high"),
		},
	},
}

// OpenAI receives effort directly, max_tokens is cleared

// Bifrost Request (with max_tokens only)
{
  "max_completion_tokens": 4096,
  "reasoning": {
    "max_tokens": 3000
  }
}

// Estimation: ratio = 3000/4096 ≈ 0.73 → "high"
// OpenAI Request Sent
{
  "reasoning": {
    "effort": "high"
  }
}

// Bifrost request with max_tokens only
chatReq := &schemas.BifrostChatRequest{
	Provider: schemas.OpenAI,
	Model:    "gpt-4o",
	Input:    messages,
	Params: &schemas.ChatParameters{
		MaxCompletionTokens: schemas.Ptr(4096),
		Reasoning: &schemas.ChatReasoning{
			MaxTokens: schemas.Ptr(3000),
		},
	},
}

// Bifrost estimates effort from max_tokens
// ratio = 3000/4096 ≈ 0.73 → effort = "high"
// OpenAI receives effort, max_tokens cleared

Supported Effort Levels: minimal, low, medium, high

When minimal is encountered, it’s converted to low for non-OpenAI providers. OpenAI receives only: low, medium, high.

Anthropic

Anthropic uses a thinking parameter with different structure.

Request Conversion (JSON)
Request Conversion (Go)
Response Conversion (JSON)
Response Conversion (Go)

// Bifrost Request
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}

// Anthropic Request
{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 4096
  }
}

// Using Bifrost Go SDK
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Anthropic,
  Model:    "claude-3-5-sonnet-20241022",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      MaxTokens: schemas.Ptr(4096), // Anthropic native field
    },
  },
}

// Bifrost converts to Anthropic format:
// {
//   "thinking": {
//     "type": "enabled",
//     "budget_tokens": 4096
//   }
// }

// Anthropic Response (content blocks)
{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "EqoBCkgIAR..."
    },
    {
      "type": "text",
      "text": "The answer is 42."
    }
  ]
}

// Bifrost Response
{
  "choices": [{
    "message": {
      "content": "The answer is 42.",
      "reasoning": "Let me analyze this step by step...",
      "reasoning_details": [{
        "index": 0,
        "type": "text",
        "text": "Let me analyze this step by step...",
        "signature": "EqoBCkgIAR..."
      }]
    }
  }]
}

// After calling Bifrost Chat Completions with reasoning
resp, err := client.ChatCompletionRequest(ctx, chatReq)
if err != nil {
  log.Fatal(err)
}

// Extract reasoning from response
choice := resp.Choices[0]
message := choice.Message

// Access combined reasoning text
reasoningText := message.Reasoning

// Access detailed reasoning blocks
for i, details := range message.ReasoningDetails {
  fmt.Printf("Block %d: %s\n", i, details.Text)
  if details.Signature != "" {
    fmt.Printf("  Signature: %s\n", details.Signature)
  }
}

Conversion Rules:

Bifrost	Anthropic	Notes
`reasoning.effort`	`thinking.type`	Always mapped to `"enabled"`
`reasoning.max_tokens`	`thinking.budget_tokens`	Token budget for reasoning

Critical Constraint: Anthropic requires reasoning.max_tokens >= 1024. Requests with lower values will fail with an error.

Dynamic Budget Handling:

Input Value	Converted To
`-1` (dynamic)	`1024` (minimum default)
`< 1024`	Error
`>= 1024`	Pass-through

Code Reference: core/providers/anthropic/chat.go:104-134

Bedrock (Anthropic Models)

Bedrock uses the same structure as Anthropic for Claude models.

Request (JSON)
Request (Go)

// Bifrost Request
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}

// Bedrock Request (for Anthropic/Claude models)
{
  "additionalModelRequestFields": {
    "reasoning_config": {
      "type": "enabled",
      "budget_tokens": 4096
    }
  }
}

// Using Bifrost Go SDK with Bedrock provider
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Bedrock,
  Model:    "us.anthropic.claude-3-5-sonnet-20241022-v2:0",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      MaxTokens: schemas.Ptr(4096), // Bedrock Anthropic native field
    },
  },
}

// Bifrost converts to Bedrock format with reasoning_config

The same 1024 minimum token budget constraint applies to Bedrock Anthropic models. Attempts to set max_tokens below 1024 will result in an error.

Code Reference: core/providers/bedrock/utils.go:34-47

Bedrock (Nova Models)

Bedrock Nova models use an effort-based approach similar to OpenAI.

Request Conversion (JSON)
Request Conversion (Go)
Effort Levels

// Bifrost Request
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}

// Bedrock Request (for Nova models)
{
  "additionalModelRequestFields": {
    "reasoningConfig": {
      "type": "enabled",
      "maxReasoningEffort": "high"
    }
  }
}

// Using Bifrost Go SDK with Bedrock Nova
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Bedrock,
  Model:    "us.amazon.nova-pro-v1:0",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      Effort: schemas.Ptr("high"), // Nova native field
    },
  },
}

// Bifrost converts to Bedrock Nova format:
// reasoningConfig: {
//   type: "enabled",
//   maxReasoningEffort: "high"
// }

Bifrost Effort	Nova Effort	Configuration
`minimal`, `low`	`"low"`	Normal parameters allowed
`medium`	`"medium"`	Normal parameters allowed
`high`	`"high"`	Clears `maxTokens`, `temperature`, `topP`

Key Differences from Anthropic:

No minimum token budget constraint
Uses effort levels instead of token budgets
High effort mode automatically clears conflicting parameters

Code Reference: core/providers/bedrock/utils.go:48-89

Gemini

Gemini uses thinking_config with dual support for both token budgets and effort levels, depending on the model version.

Model Version Support

Gemini Version	`thinkingBudget`	`thinkingLevel`	Notes
2.5+	✅	❌	Budget-only models
3.0+	✅	✅	Support both budget and level

Important: Only ONE parameter (thinkingBudget or thinkingLevel) should be sent to Gemini at a time. When both reasoning.max_tokens and reasoning.effort are provided in a Bifrost request, max_tokens takes priority and is converted to thinkingBudget.

Priority Rules

When both reasoning.max_tokens and reasoning.effort are present:

1. If max_tokens is provided → USE thinkingBudget (ignores effort)
2. Else if effort is provided:
   - Gemini 3.0+ → USE thinkingLevel (more native)
   - Gemini 2.5 → CONVERT effort to thinkingBudget
3. Else → disable reasoning

Budget Priority (JSON)
Effort to Level (Gemini 3.0+)
Effort to Budget (Gemini 2.5)

// Bifrost Request - Both fields provided
{
  "model": "gemini-3.0-flash",
  "reasoning": {
    "effort": "high",        // Ignored
    "max_tokens": 4096      // Takes priority
  }
}

// Gemini 3.0+ Request - Only budget sent
{
  "generation_config": {
    "thinking_config": {
      "include_thoughts": true,
      "thinking_budget": 4096
    }
  }
}

// Bifrost Request - Effort only
{
  "model": "gemini-3.0-flash",
  "reasoning": {
    "effort": "high"
  }
}

// Gemini 3.0+ Request - Converted to level
{
  "generation_config": {
    "thinking_config": {
      "include_thoughts": true,
      "thinking_level": "high"
    }
  }
}

// Bifrost Request - Effort only
{
  "model": "gemini-2.5-flash",
  "max_completion_tokens": 4096,
  "reasoning": {
    "effort": "high"
  }
}

// Gemini 2.5 Request - Converted to budget
// Calculation: 1024 + (0.80 × (4096 - 1024)) = 3482
{
  "generation_config": {
    "thinking_config": {
      "include_thoughts": true,
      "thinking_budget": 3482
    }
  }
}

Model-Specific Level Conversions

Gemini Pro models have stricter constraints on thinking levels:

Bifrost Effort	Non-Pro Models	Pro Models	Notes
`"none"`	Empty string	Empty string	Disables thinking
`"minimal"`	`"minimal"`	`"low"`	Pro doesn’t support minimal
`"low"`	`"low"`	`"low"`	Supported on all
`"medium"`	`"medium"`	`"high"`	Pro doesn’t support medium
`"high"`	`"high"`	`"high"`	Supported on all

Example:

// For "gemini-3.0-flash-thinking-exp" (non-Pro)
effort: "medium" → thinkingLevel: "medium"

// For "gemini-3.0-pro" (Pro model)
effort: "medium" → thinkingLevel: "high"  // Converted up

Special Values

Value	Field	Behavior	Use Case
`0`	`max_tokens`	`thinking_budget: 0`, `include_thoughts: false`	Explicitly disable reasoning
`-1`	`max_tokens`	`thinking_budget: -1`	Dynamic budget (Gemini decides)
`"none"`	`effort`	`thinking_budget: 0`, `include_thoughts: false`	Disable reasoning

Dynamic Budget (JSON)
Disable Reasoning (JSON)
Go SDK Examples

// Bifrost Request - Dynamic budget
{
  "reasoning": {
    "max_tokens": -1
  }
}

// Gemini Request - Sent as-is
{
  "generation_config": {
    "thinking_config": {
      "include_thoughts": true,
      "thinking_budget": -1
    }
  }
}

// Bifrost Request - Method 1
{
  "reasoning": {
    "max_tokens": 0
  }
}

// Bifrost Request - Method 2
{
  "reasoning": {
    "effort": "none"
  }
}

// Gemini Request - Both become
{
  "generation_config": {
    "thinking_config": {
      "include_thoughts": false,
      "thinking_budget": 0
    }
  }
}

// Using Bifrost Go SDK with Gemini
// Example 1: Dynamic budget
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Gemini,
  Model:    "gemini-2.0-flash-thinking-exp-1219",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      MaxTokens: schemas.Ptr(-1), // Let Gemini decide
    },
  },
}

// Example 2: Effort-based for Gemini 3.0+
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Gemini,
  Model:    "gemini-3.0-flash",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      Effort: schemas.Ptr("high"), // Converts to thinkingLevel
    },
  },
}

// Example 3: Budget-based (all versions)
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Gemini,
  Model:    "gemini-2.5-flash",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      MaxTokens: schemas.Ptr(3000), // Direct budget
    },
  },
}

Response Conversion

Response (JSON)
Response (Go)

// Gemini Response
{
  "candidates": [{
    "content": {
      "parts": [
        {
          "thought": true,
          "text": "Analyzing the problem..."
        },
        {
          "text": "The answer is 42."
        }
      ]
    }
  }]
}

// Bifrost Response
{
  "choices": [{
    "message": {
      "content": "The answer is 42.",
      "reasoning": "Analyzing the problem...",
      "reasoning_details": [{
        "index": 0,
        "type": "text",
        "text": "Analyzing the problem..."
      }]
    }
  }]
}

// After calling Bifrost Chat Completions with Gemini
resp, err := client.ChatCompletionRequest(ctx, chatReq)
if err != nil {
  log.Fatal(err)
}

// Extract reasoning from response
choice := resp.Choices[0]
message := choice.Message

// Access combined reasoning text
fmt.Printf("Reasoning: %s\n", message.Reasoning)

// Access detailed reasoning blocks
for i, details := range message.ReasoningDetails {
  if details.Type == "text" {
    fmt.Printf("Thinking block %d:\n%s\n", i, details.Text)
  }
}

// Access final answer
fmt.Printf("Answer:\n%s\n", message.Content)

Conversion Summary

Bifrost → Gemini (Request):

Input	Gemini 2.5	Gemini 3.0+	Note
`max_tokens: 4096`	`thinking_budget: 4096`	`thinking_budget: 4096`	Direct pass-through
`max_tokens: -1`	`thinking_budget: -1`	`thinking_budget: -1`	Dynamic budget
`max_tokens: 0`	`thinking_budget: 0`	`thinking_budget: 0`	Disabled
`effort: "high"` only	`thinking_budget: 3482`*	`thinking_level: "high"`	Estimated or native
`effort: "medium"` only	`thinking_budget: 2330`*	`thinking_level: "medium"` or `"high"`**	Estimated or native
Both `effort` + `max_tokens`	Uses `max_tokens`	Uses `max_tokens`	Priority rule

* Assumes max_completion_tokens: 8192 (default), uses estimation formula
** Pro models convert "medium" to "high" Gemini → Bifrost (Response):

Gemini Field	Bifrost Field	Conversion
`thinking_budget`	`reasoning.max_tokens`	Direct mapping
`thinking_level`	`reasoning.effort`	Level → effort mapping
`thought: true` parts	`reasoning_details[]`	Array of reasoning blocks

Code References:

core/providers/gemini/utils.go (Chat Completions)
core/providers/gemini/responses.go (Responses API)
core/providers/gemini/types.go (Constants)

Two Reasoning Methods: Effort vs. Max Tokens

Bifrost supports two distinct reasoning models across different providers:

Reasoning Model Types

Model	Providers	Request Field	Native Format
Effort-Based	OpenAI, AWS Bedrock Nova	`reasoning.effort`	`reasoning_effort` (Chat) / `effort` (Responses)
Max-Tokens-Based	Anthropic, Cohere, Gemini	`reasoning.max_tokens`	`thinking.budget_tokens`

Important: Both effort and max_tokens can be specified in a single request. Bifrost uses a priority hierarchy to determine which field is used.

Priority Logic: Native vs. Estimated

When both effort and max_tokens are present in a request, Bifrost prioritizes the native compatible field for the target provider:

For Max-Tokens-Based Providers (Anthropic, Cohere, Gemini)

If reasoning.max_tokens is provided → USE IT (native field)
Else if reasoning.effort is provided → ESTIMATE max_tokens from effort
Else → disable reasoning

Example (Cohere):

// Request with both fields
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 2000
  }
}

Result: Uses max_tokens: 2000 directly, ignores effort

For Effort-Based Providers (OpenAI, AWS Bedrock Nova)

If reasoning.effort is provided → USE IT (native field)
Else if reasoning.max_tokens is provided → ESTIMATE effort from max_tokens
Else → disable reasoning

Example (OpenAI Chat Completions):

// Request with both fields
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 2000
  }
}

Result: Uses effort: "high" directly, strips max_tokens from JSON

Why Priority Matters

Reason 1: Accuracy - Native fields provide direct control without estimation lossReason 2: Consistency - Using native fields ensures the exact user intent is preservedReason 3: Performance - Avoids unnecessary conversions when native field is already provided

Estimator Functions

Bifrost provides two estimator functions to convert between reasoning methods. These are used when the native field is not available.

Function 1: Effort → Max Tokens

Function: GetBudgetTokensFromReasoningEffort() File: core/providers/utils/utils.go:1350-1387 Signature:

func GetBudgetTokensFromReasoningEffort(
    effort string,           // "minimal", "low", "medium", "high"
    minBudgetTokens int,     // Provider-specific minimum (e.g., 1024 for Anthropic)
    maxTokens int,           // Total completion tokens available
) (int, error)

Algorithm:

1. Define ratio for effort level:
   - "minimal"  → 2.5%  (0.025)
   - "low"      → 15%   (0.15)
   - "medium"   → 42.5% (0.425)
   - "high"     → 80%   (0.80)

2. Calculate budget:
   budget = minBudgetTokens + (ratio × (maxTokens - minBudgetTokens))

3. Clamp to valid range:
   if budget < minBudgetTokens → budget = minBudgetTokens
   if budget > maxTokens → budget = maxTokens

Conversion Examples (with minBudgetTokens=1024, maxTokens=4096):

Effort	Ratio	Calculation	Result
`minimal`	2.5%	1024 + 0.025 × 3072	1101 → 1024*
`low`	15%	1024 + 0.15 × 3072	1485
`medium`	42.5%	1024 + 0.425 × 3072	2330
`high`	80%	1024 + 0.80 × 3072	3482

*When result is below minimum, clamped to minBudgetTokens (for Anthropic minimum of 1024)

Error Handling:

if minBudgetTokens > maxTokens {
    return 0, fmt.Errorf("max_tokens must be > minBudgetTokens")
}

Code Example:

// Cohere: Convert effort to token budget
budgetTokens, err := providerUtils.GetBudgetTokensFromReasoningEffort(
    "high",                    // effort
    1,                         // Cohere min
    4096,                      // max completion tokens
)
// Returns: 3277 tokens

Function 2: Max Tokens → Effort

Function: GetReasoningEffortFromBudgetTokens() File: core/providers/utils/utils.go:1308-1345 Signature:

func GetReasoningEffortFromBudgetTokens(
    budgetTokens int,        // Reasoning token budget
    minBudgetTokens int,     // Provider-specific minimum
    maxTokens int,           // Total completion tokens available
) string                     // Returns: "low", "medium", "high"

Algorithm:

1. Normalize budget to valid range:
   if budget < min → budget = min
   if budget > max → budget = max

2. Calculate ratio:
   ratio = (budgetTokens - minBudgetTokens) / (maxTokens - minBudgetTokens)

3. Map ratio to effort level:
   if ratio ≤ 0.25  → "low"
   if ratio ≤ 0.60  → "medium"
   if ratio > 0.60  → "high"

Conversion Examples (with minBudgetTokens=1024, maxTokens=4096):

Budget Tokens	Ratio	Effort
1024	0%	`low`
1101	2.5%	`low`
1500	15.6%	`low`
1900	28.6%	`medium`
2500	48.1%	`medium`
3000	64.5%	`high`
3400	77.6%	`high`

Defensive Defaults:

if budgetTokens <= 0 {
    return "none"
}
if maxTokens <= 0 {
    return "medium"  // Safe default
}
if maxTokens <= minBudgetTokens {
    return "high"    // Can't calculate ratio
}

Code Example:

// Convert Anthropic budget back to effort for display
effort := providerUtils.GetReasoningEffortFromBudgetTokens(
    3000,   // budget tokens from Anthropic response
    1024,   // Anthropic minimum
    4096,   // max tokens
)
// Returns: "high"

Provider-Specific Constants

Different providers have different constraints on reasoning budget:

Min Budget Constants

Provider	File	MinBudgetTokens	Reason
Anthropic	`core/providers/anthropic/types.go`	1024	Anthropic API requirement
Bedrock Anthropic	`core/providers/bedrock/types.go`	1024	Same as Anthropic
Bedrock Nova	`core/providers/bedrock/types.go`	1	More flexible
Cohere	`core/providers/cohere/types.go`	1	Flexible
Gemini	`core/providers/gemini/types.go`	1024	Default minimum for conversions

Default Completion Tokens (for ratio calculation)

When max_completion_tokens is not provided, these defaults are used for ratio calculations:

Provider	Default	File
OpenAI, Anthropic, Cohere, Bedrock	4096	`core/providers/*/types.go`
Gemini	8192	`core/providers/gemini/types.go`

Effort-to-Token Conversion Examples

Example 1: Estimate tokens from effort (Anthropic)

JSON
Go SDK

Input:

{
  "model": "anthropic/claude-3-5-sonnet",
  "max_completion_tokens": 2000,
  "reasoning": {
    "effort": "high"
  }
}

Conversion Process:

effort = "high" → ratio = 0.80
minBudgetTokens = 1024 (Anthropic)
maxCompletionTokens = 2000
budget = 1024 + (0.80 × (2000 - 1024))
budget = 1024 + (0.80 × 976)
budget = 1024 + 780
Result: 1804 tokens

Anthropic Request Generated:

{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 1804
  }
}

import (
  "github.com/maximhq/bifrost/core/providers/utils"
  "github.com/maximhq/bifrost/core/schemas"
)

// Using Bifrost Go SDK
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Anthropic,
  Model:    "claude-3-5-sonnet-20241022",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(2000),
    Reasoning: &schemas.ChatReasoning{
      Effort: schemas.Ptr("high"), // Effort provided, max_tokens not set
    },
  },
}

// Bifrost automatically converts effort to budget tokens:
// 1. Get ratio for "high": 0.80
// 2. Calculate: 1024 + (0.80 × (2000 - 1024)) = 1804
// 3. Send to Anthropic with budget_tokens: 1804

// Alternatively, manually call the estimator function:
budgetTokens, _ := utils.GetBudgetTokensFromReasoningEffort(
  "high",     // effort
  1024,       // Anthropic minimum
  2000,       // max completion tokens
)
// Returns: 1804

Example 2: Estimate effort from tokens (Bedrock Nova)

JSON
Go SDK

Input:

{
  "model": "bedrock/us.amazon.nova-pro-v1:0",
  "max_completion_tokens": 4096,
  "reasoning": {
    "max_tokens": 2000
  }
}

Conversion Process:

budgetTokens = 2000
minBudgetTokens = 1 (Nova)
maxCompletionTokens = 4096
ratio = (2000 - 1) / (4096 - 1)
ratio = 1999 / 4095
ratio = 0.488 (48.8%)
Since 0.25 < 0.488 ≤ 0.60 → Result: “medium”

Bedrock Nova Request Generated:

{
  "reasoningConfig": {
    "type": "enabled",
    "maxReasoningEffort": "medium"
  }
}

import (
  "github.com/maximhq/bifrost/core/providers/utils"
  "github.com/maximhq/bifrost/core/schemas"
)

// Using Bifrost Go SDK with max_tokens (not effort)
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Bedrock,
  Model:    "us.amazon.nova-pro-v1:0",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      MaxTokens: schemas.Ptr(2000), // Max tokens provided, effort not set
    },
  },
}

// Bifrost automatically estimates effort from max_tokens:
// 1. Calculate ratio: (2000 - 1) / (4096 - 1) = 0.488
// 2. Since 0.25 < 0.488 ≤ 0.60 → "medium"
// 3. Send to Bedrock Nova with effort: "medium"

// Alternatively, manually call the estimator function:
effort := utils.GetReasoningEffortFromBudgetTokens(
  2000,  // budget tokens
  1,     // Nova minimum
  4096,  // max completion tokens
)
// Returns: "medium"

Example 3: Both fields provided (priority used)

JSON
Go SDK

Input:

{
  "model": "anthropic/claude-3-5-sonnet",
  "max_completion_tokens": 4096,
  "reasoning": {
    "effort": "medium",
    "max_tokens": 2500
  }
}

Logic for Max-Tokens-Based Provider:

Check: Is max_tokens provided? → YES
Use max_tokens directly (ignore effort)
Validate: 2500 >= 1024? → YES

Anthropic Request Generated:

{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 2500
  }
}

Note: The effort: "medium" is completely ignored because max_tokens takes priority.

import "github.com/maximhq/bifrost/core/schemas"

// Using Bifrost Go SDK with BOTH effort and max_tokens
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Anthropic,
  Model:    "claude-3-5-sonnet-20241022",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      Effort:    schemas.Ptr("medium"),   // Provided but ignored
      MaxTokens: schemas.Ptr(2500),       // This takes priority
    },
  },
}

// Bifrost Priority Logic:
// 1. For max-tokens-based providers (Anthropic):
//    → Check if max_tokens is provided? YES
//    → Use it directly: 2500
//    → Ignore effort: "medium"
//    → Validate: 2500 >= 1024? YES ✓
// 2. Send to Anthropic with budget_tokens: 2500

// Result: effort is completely ignored, max_tokens is used

Response Format

Bifrost Standard Response

All providers return reasoning in a normalized reasoning_details array:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Final response text",
      "reasoning_details": [
        {
          "index": 0,
          "type": "text",
          "text": "Step-by-step reasoning content...",
          "signature": "optional_signature_for_verification"
        }
      ]
    }
  }]
}

Reasoning Details Fields

Field	Type	Description	Present In
`index`	`int`	Position in reasoning sequence	All
`type`	`string`	Content type (`text`, `encrypted`, `summary`)	All
`text`	`string`	Reasoning content	Chat Completions
`summary`	`string`	Reasoning summary	Responses API
`signature`	`string`	Cryptographic signature for verification	Anthropic, Bedrock

Type Mappings

Reasoning Type	When Used	Source
`reasoning.text`	Direct thinking/reasoning content	Anthropic, Gemini, Bedrock
`reasoning.encrypted`	Signature-verified reasoning	Anthropic, Bedrock Nova
`reasoning.summary`	Summarized reasoning (Responses API)	All providers

OpenAI Implementation: OpenAI (both Chat Completions and Responses API) is effort-based, following the standard priority logic: if effort is provided, it’s used directly; if only max_tokens is provided, effort is estimated from it. The max_tokens field is then cleared before JSON serialization via MarshalJSON (core/providers/openai/types.go:383-453), since OpenAI’s APIs don’t accept it.

Streaming

Stream Event Types

Provider	Reasoning Event	Signature Event
OpenAI	`reasoning` (top-level)	N/A
Anthropic	`thinking_delta`	`signature_delta`
Bedrock	`thinking_delta`	`signature_delta`
Gemini	`thought` (in content)	`thought_signature`

Anthropic Streaming Example

// Stream events
event: content_block_start
data: {"type": "content_block_start", "content_block": {"type": "thinking"}}

event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "thinking_delta", "thinking": "Let me"}}

event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "thinking_delta", "thinking": " analyze..."}}

event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "signature_delta", "signature": "EqoB..."}}

event: content_block_stop
data: {"type": "content_block_stop"}

Bifrost Stream Response

// Thinking delta
{
  "choices": [{
    "delta": {
      "reasoning_details": [{
        "index": 0,
        "type": "text",
        "text": "Let me analyze..."
      }]
    }
  }]
}

// Signature delta
{
  "choices": [{
    "delta": {
      "reasoning_details": [{
        "index": 0,
        "signature": "EqoB..."
      }]
    }
  }]
}

Caveats Summary

Minimum Budget (Anthropic/Bedrock)

Severity: High Behavior: reasoning.max_tokens must be >= 1024 Impact: Requests with lower values fail with error Workaround: Always set max_tokens >= 1024 for Anthropic/Bedrock

Dynamic Budget Not Supported

Severity: Medium Behavior: reasoning.max_tokens = -1 converted to 1024 Impact: Dynamic budgeting not available on Anthropic/Bedrock Workaround: Set explicit token budget

Effort Level Normalization

Severity: Low Behavior: OpenAI’s minimal converted to low when routing to other providers Impact: Slightly different reasoning behavior

Signature Field Provider-Specific

Severity: Low Behavior: signature field only present in Anthropic/Bedrock responses Impact: Signature-based verification only available for these providers

Thinking Type Always Enabled

Severity: Low Behavior: Anthropic’s thinking.type always set to "enabled" regardless of effort Impact: Cannot disable thinking once reasoning param is present

Gemini: Only One Parameter Sent

Severity: Medium Behavior: When both effort and max_tokens are provided, only thinkingBudget is sent to Gemini (effort is dropped) Impact: Effort value is completely ignored when max_tokens is present Workaround: Provide only the parameter you want to use

Gemini: Model Version Differences

Severity: Medium Behavior: Gemini 2.5 only supports thinkingBudget, while 3.0+ supports both thinkingBudget and thinkingLevel Impact: Effort-only requests on 2.5 are converted to budget; on 3.0+ they use native levels Note: Bifrost automatically detects version and uses appropriate conversion

Gemini Pro: Limited Level Support

Severity: Low Behavior: Pro models only support “low” and “high” thinking levels Impact: "minimal" → "low", "medium" → "high" for Pro models Note: Non-Pro models support all four levels: minimal, low, medium, high

Complete Provider Comparison

Reasoning Model

Provider	Model Type	Budget Type	Min Budget	Signature Support
OpenAI	Effort-based	Effort-based	None	❌
Anthropic	Thinking blocks	Token budget	1024	✅
Bedrock (Anthropic)	Reasoning config	Token budget	1024	✅
Bedrock (Nova)	Reasoning config	Effort-based	None	❌
Gemini 2.5+	Thinking config	Token budget	1024	✅
Gemini 3.0+	Thinking config	Dual (budget + level)	1024	✅

Parameter Support

Provider	`effort`	`max_tokens`	`summary`	Streaming
OpenAI	✅ (4 levels)	✅	❌	✅
Anthropic	❌ (binary)	✅	✅	✅
Bedrock (Anthropic)	❌ (binary)	✅	✅	✅
Bedrock (Nova)	✅ (3 levels)	⚠️ (ignored)	❌	✅
Gemini 2.5+	⚠️ (converts to budget)	✅	❌	✅
Gemini 3.0+	✅ (4 levels)	✅	❌	✅

Troubleshooting

Anthropic: “reasoning.max_tokens must be >= 1024”

Cause: Attempting to use reasoning with max_tokens < 1024 Solution: Ensure reasoning.max_tokens >= 1024 for Anthropic/Bedrock Anthropic models

// ❌ Invalid
{"reasoning": {"effort": "high", "max_tokens": 500}}

// ✅ Valid
{"reasoning": {"effort": "high", "max_tokens": 1024}}

OpenAI: Model doesn’t support reasoning

Cause: Using an older model that doesn’t support reasoning (e.g., gpt-4-turbo) Solution: Use models with reasoning support: gpt-4o, gpt-4o-mini (o1 series with native reasoning)

Bedrock Nova: `max_tokens` parameter being ignored

Expected Behavior: Bedrock Nova uses effort-based reasoning only Solution: Provide effort parameter instead of max_tokens for Nova models

// ✅ Correct for Nova
{"reasoning": {"effort": "high"}}

Overview

Quick Start

Providers & Guides

SDK Integrations

MCP Gateway

Custom plugins

Open Source Features

Enterprise Features

​Overview

​Provider Support Matrix

​Request Configuration

​Chat Completions API

​Responses API

​Parameter Reference

​Chat Completions API Parameters

​Responses API Parameters

​Provider-Specific Conversions

​OpenAI

​Anthropic

​Bedrock (Anthropic Models)

​Bedrock (Nova Models)

​Gemini

​Model Version Support

​Priority Rules

​Model-Specific Level Conversions

​Special Values

​Response Conversion

​Conversion Summary

​Two Reasoning Methods: Effort vs. Max Tokens

​Reasoning Model Types

​Priority Logic: Native vs. Estimated

​For Max-Tokens-Based Providers (Anthropic, Cohere, Gemini)

​For Effort-Based Providers (OpenAI, AWS Bedrock Nova)

​Estimator Functions

​Function 1: Effort → Max Tokens

​Function 2: Max Tokens → Effort

​Provider-Specific Constants

​Min Budget Constants

​Default Completion Tokens (for ratio calculation)

​Effort-to-Token Conversion Examples

​Example 1: Estimate tokens from effort (Anthropic)

​Example 2: Estimate effort from tokens (Bedrock Nova)

​Example 3: Both fields provided (priority used)

​Response Format

​Bifrost Standard Response

​Reasoning Details Fields

​Type Mappings

​Streaming

​Stream Event Types

​Anthropic Streaming Example

​Bifrost Stream Response

​Caveats Summary

​Complete Provider Comparison

​Reasoning Model

​Parameter Support

​Troubleshooting

​Anthropic: “reasoning.max_tokens must be >= 1024”

​OpenAI: Model doesn’t support reasoning

​Bedrock Nova: max_tokens parameter being ignored

Overview

Provider Support Matrix

Request Configuration

Chat Completions API

Responses API

Parameter Reference

Chat Completions API Parameters

Responses API Parameters

Provider-Specific Conversions

OpenAI

Anthropic

Bedrock (Anthropic Models)

Bedrock (Nova Models)

Gemini

Model Version Support

Priority Rules

Model-Specific Level Conversions

Special Values

Response Conversion

Conversion Summary

Two Reasoning Methods: Effort vs. Max Tokens

Reasoning Model Types

Priority Logic: Native vs. Estimated

For Max-Tokens-Based Providers (Anthropic, Cohere, Gemini)

For Effort-Based Providers (OpenAI, AWS Bedrock Nova)

Estimator Functions

Function 1: Effort → Max Tokens

Function 2: Max Tokens → Effort

Provider-Specific Constants

Min Budget Constants

Default Completion Tokens (for ratio calculation)

Effort-to-Token Conversion Examples

Example 1: Estimate tokens from effort (Anthropic)

Example 2: Estimate effort from tokens (Bedrock Nova)

Example 3: Both fields provided (priority used)

Response Format

Bifrost Standard Response

Reasoning Details Fields

Type Mappings

Streaming

Stream Event Types

Anthropic Streaming Example

Bifrost Stream Response

Caveats Summary

Complete Provider Comparison

Reasoning Model

Parameter Support

Troubleshooting

Anthropic: “reasoning.max_tokens must be >= 1024”

OpenAI: Model doesn’t support reasoning

Bedrock Nova: `max_tokens` parameter being ignored