Skip to main content

Overview

Reasoning (also called “thinking” in some providers) allows AI models to show their step-by-step thought process before providing a final answer. This feature is available across multiple providers with different implementations.
Bifrost normalizes all provider-specific reasoning formats to a consistent OpenAI-compatible structure using reasoning in requests and reasoning_details in responses.

Provider Support Matrix

ProviderRequest FieldResponse FieldMin BudgetEffort LevelsStreaming
OpenAIreasoningreasoning_detailsNoneminimal, low, medium, high
AnthropicthinkingContent blocks1024 tokensenabled only
Bedrock (Anthropic)thinkingContent blocks1024 tokensenabled only
Geminithinking_configthought partsNoneoff, low, medium, high

Request Configuration

Chat Completions API

{
  "model": "provider/model-name",
  "messages": [...],
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}

Responses API

{
  "model": "provider/model-name",
  "input": [...],
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096,
    "summary": "detailed"
  }
}
Responses API supports both effort + max_tokens (like Chat Completions) and adds the optional summary parameter for output summarization.

Parameter Reference

Chat Completions API Parameters

ParameterTypeDescription
effortstringReasoning intensity level
max_tokensintMaximum tokens for reasoning (budget)

Responses API Parameters

ParameterTypeDescription
effortstringReasoning intensity level
max_tokensintMaximum tokens for reasoning (budget)
summarystringSummary level: brief, detailed, or json
Responses API accepts the same effort and max_tokens parameters as Chat Completions, but adds an optional summary parameter for reasoning output summarization.

Provider-Specific Conversions

OpenAI

OpenAI uses effort-based reasoning only. Bifrost applies priority logic:
  1. If reasoning.effort is provided → use it directly
  2. Else if reasoning.max_tokens is provided → estimate effort from it
  3. The max_tokens field is cleared before sending to OpenAI
Conversion Examples:
// Bifrost Request (with effort)
{
  "reasoning": {
    "effort": "high"
  }
}

// OpenAI Request Sent
{
  "reasoning": {
    "effort": "high"
  }
}
Supported Effort Levels: minimal, low, medium, high
When minimal is encountered, it’s converted to low for non-OpenAI providers. OpenAI receives only: low, medium, high.

Anthropic

Anthropic uses a thinking parameter with different structure.
// Bifrost Request
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}

// Anthropic Request
{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 4096
  }
}
Conversion Rules:
BifrostAnthropicNotes
reasoning.effortthinking.typeAlways mapped to "enabled"
reasoning.max_tokensthinking.budget_tokensToken budget for reasoning
Critical Constraint: Anthropic requires reasoning.max_tokens >= 1024. Requests with lower values will fail with an error.
Dynamic Budget Handling:
Input ValueConverted To
-1 (dynamic)1024 (minimum default)
< 1024Error
>= 1024Pass-through
Code Reference: core/providers/anthropic/chat.go:104-134

Bedrock (Anthropic Models)

Bedrock uses the same structure as Anthropic for Claude models.
// Bifrost Request
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}

// Bedrock Request (for Anthropic/Claude models)
{
  "additionalModelRequestFields": {
    "reasoning_config": {
      "type": "enabled",
      "budget_tokens": 4096
    }
  }
}
The same 1024 minimum token budget constraint applies to Bedrock Anthropic models. Attempts to set max_tokens below 1024 will result in an error.
Code Reference: core/providers/bedrock/utils.go:34-47

Bedrock (Nova Models)

Bedrock Nova models use an effort-based approach similar to OpenAI.
// Bifrost Request
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}

// Bedrock Request (for Nova models)
{
  "additionalModelRequestFields": {
    "reasoningConfig": {
      "type": "enabled",
      "maxReasoningEffort": "high"
    }
  }
}
Key Differences from Anthropic:
  • No minimum token budget constraint
  • Uses effort levels instead of token budgets
  • High effort mode automatically clears conflicting parameters
Code Reference: core/providers/bedrock/utils.go:48-89

Gemini

Gemini uses thinking_config with effort-based configuration.
// Bifrost Request
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}

// Gemini Request
{
  "generation_config": {
    "thinking_config": {
      "thinking_budget": 4096
    }
  }
}
Effort Level Mapping:
Bifrost EffortGemini Mode
Not setoff
lowUses budget
mediumUses budget
highUses budget
Code Reference: core/providers/gemini/chat.go

Two Reasoning Methods: Effort vs. Max Tokens

Bifrost supports two distinct reasoning models across different providers:

Reasoning Model Types

ModelProvidersRequest FieldNative Format
Effort-BasedOpenAI, AWS Bedrock Novareasoning.effortreasoning_effort (Chat) / effort (Responses)
Max-Tokens-BasedAnthropic, Cohere, Geminireasoning.max_tokensthinking.budget_tokens
Important: Both effort and max_tokens can be specified in a single request. Bifrost uses a priority hierarchy to determine which field is used.

Priority Logic: Native vs. Estimated

When both effort and max_tokens are present in a request, Bifrost prioritizes the native compatible field for the target provider:

For Max-Tokens-Based Providers (Anthropic, Cohere, Gemini)

1. If reasoning.max_tokens is provided → USE IT (native field)
2. Else if reasoning.effort is provided → ESTIMATE max_tokens from effort
3. Else → disable reasoning
Example (Cohere):
// Request with both fields
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 2000
  }
}
Result: Uses max_tokens: 2000 directly, ignores effort

For Effort-Based Providers (OpenAI, AWS Bedrock Nova)

1. If reasoning.effort is provided → USE IT (native field)
2. Else if reasoning.max_tokens is provided → ESTIMATE effort from max_tokens
3. Else → disable reasoning
Example (OpenAI Chat Completions):
// Request with both fields
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 2000
  }
}
Result: Uses effort: "high" directly, strips max_tokens from JSON
Reason 1: Accuracy - Native fields provide direct control without estimation lossReason 2: Consistency - Using native fields ensures the exact user intent is preservedReason 3: Performance - Avoids unnecessary conversions when native field is already provided

Estimator Functions

Bifrost provides two estimator functions to convert between reasoning methods. These are used when the native field is not available.

Function 1: Effort → Max Tokens

Function: GetBudgetTokensFromReasoningEffort() File: core/providers/utils/utils.go:1350-1387 Signature:
func GetBudgetTokensFromReasoningEffort(
    effort string,           // "minimal", "low", "medium", "high"
    minBudgetTokens int,     // Provider-specific minimum (e.g., 1024 for Anthropic)
    maxTokens int,           // Total completion tokens available
) (int, error)
Algorithm:
1. Define ratio for effort level:
   - "minimal"  → 2.5%  (0.025)
   - "low"      → 15%   (0.15)
   - "medium"   → 42.5% (0.425)
   - "high"     → 80%   (0.80)

2. Calculate budget:
   budget = minBudgetTokens + (ratio × (maxTokens - minBudgetTokens))

3. Clamp to valid range:
   if budget < minBudgetTokens → budget = minBudgetTokens
   if budget > maxTokens → budget = maxTokens
Conversion Examples (with minBudgetTokens=1024, maxTokens=4096):
EffortRatioCalculationResult
minimal2.5%1024 + 0.025 × 30721101 → 1024*
low15%1024 + 0.15 × 30721485
medium42.5%1024 + 0.425 × 30722330
high80%1024 + 0.80 × 30723482
*When result is below minimum, clamped to minBudgetTokens (for Anthropic minimum of 1024)
Error Handling:
if minBudgetTokens > maxTokens {
    return 0, fmt.Errorf("max_tokens must be > minBudgetTokens")
}
Code Example:
// Cohere: Convert effort to token budget
budgetTokens, err := providerUtils.GetBudgetTokensFromReasoningEffort(
    "high",                    // effort
    1,                         // Cohere min
    4096,                      // max completion tokens
)
// Returns: 3277 tokens

Function 2: Max Tokens → Effort

Function: GetReasoningEffortFromBudgetTokens() File: core/providers/utils/utils.go:1308-1345 Signature:
func GetReasoningEffortFromBudgetTokens(
    budgetTokens int,        // Reasoning token budget
    minBudgetTokens int,     // Provider-specific minimum
    maxTokens int,           // Total completion tokens available
) string                     // Returns: "low", "medium", "high"
Algorithm:
1. Normalize budget to valid range:
   if budget < min → budget = min
   if budget > max → budget = max

2. Calculate ratio:
   ratio = (budgetTokens - minBudgetTokens) / (maxTokens - minBudgetTokens)

3. Map ratio to effort level:
   if ratio ≤ 0.25  → "low"
   if ratio ≤ 0.60  → "medium"
   if ratio > 0.60  → "high"
Conversion Examples (with minBudgetTokens=1024, maxTokens=4096):
Budget TokensRatioEffort
10240%low
11012.5%low
150015.6%low
190028.6%medium
250048.1%medium
300064.5%high
340077.6%high
Defensive Defaults:
if budgetTokens <= 0 {
    return "none"
}
if maxTokens <= 0 {
    return "medium"  // Safe default
}
if maxTokens <= minBudgetTokens {
    return "high"    // Can't calculate ratio
}
Code Example:
// Convert Anthropic budget back to effort for display
effort := providerUtils.GetReasoningEffortFromBudgetTokens(
    3000,   // budget tokens from Anthropic response
    1024,   // Anthropic minimum
    4096,   // max tokens
)
// Returns: "high"

Provider-Specific Constants

Different providers have different constraints on reasoning budget:

Min Budget Constants

ProviderFileMinBudgetTokensReason
Anthropiccore/providers/anthropic/types.go1024Anthropic API requirement
Bedrock Anthropiccore/providers/bedrock/types.go1024Same as Anthropic
Bedrock Novacore/providers/bedrock/types.go1More flexible
Coherecore/providers/cohere/types.go1Flexible
Geminicore/providers/gemini/types.go1Flexible

Default Completion Tokens (for ratio calculation)

When max_completion_tokens is not provided, these defaults are used for ratio calculations:
ProviderDefaultFile
All providers4096core/providers/*/types.go

Effort-to-Token Conversion Examples

Example 1: Estimate tokens from effort (Anthropic)

Input:
{
  "model": "anthropic/claude-3-5-sonnet",
  "max_completion_tokens": 2000,
  "reasoning": {
    "effort": "high"
  }
}
Conversion Process:
  1. effort = "high"ratio = 0.80
  2. minBudgetTokens = 1024 (Anthropic)
  3. maxCompletionTokens = 2000
  4. budget = 1024 + (0.80 × (2000 - 1024))
  5. budget = 1024 + (0.80 × 976)
  6. budget = 1024 + 780
  7. Result: 1804 tokens
Anthropic Request Generated:
{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 1804
  }
}

Example 2: Estimate effort from tokens (Bedrock Nova)

Input:
{
  "model": "bedrock/us.amazon.nova-pro-v1:0",
  "max_completion_tokens": 4096,
  "reasoning": {
    "max_tokens": 2000
  }
}
Conversion Process:
  1. budgetTokens = 2000
  2. minBudgetTokens = 1 (Nova)
  3. maxCompletionTokens = 4096
  4. ratio = (2000 - 1) / (4096 - 1)
  5. ratio = 1999 / 4095
  6. ratio = 0.488 (48.8%)
  7. Since 0.25 < 0.488 ≤ 0.60Result: “medium”
Bedrock Nova Request Generated:
{
  "reasoningConfig": {
    "type": "enabled",
    "maxReasoningEffort": "medium"
  }
}

Example 3: Both fields provided (priority used)

Input:
{
  "model": "anthropic/claude-3-5-sonnet",
  "max_completion_tokens": 4096,
  "reasoning": {
    "effort": "medium",
    "max_tokens": 2500
  }
}
Logic for Max-Tokens-Based Provider:
  1. Check: Is max_tokens provided? → YES
  2. Use max_tokens directly (ignore effort)
  3. Validate: 2500 >= 1024? → YES
Anthropic Request Generated:
{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 2500
  }
}
Note: The effort: "medium" is completely ignored because max_tokens takes priority.

Response Format

Bifrost Standard Response

All providers return reasoning in a normalized reasoning_details array:
{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Final response text",
      "reasoning_details": [
        {
          "index": 0,
          "type": "text",
          "text": "Step-by-step reasoning content...",
          "signature": "optional_signature_for_verification"
        }
      ]
    }
  }]
}

Reasoning Details Fields

FieldTypeDescriptionPresent In
indexintPosition in reasoning sequenceAll
typestringContent type (text, encrypted, summary)All
textstringReasoning contentChat Completions
summarystringReasoning summaryResponses API
signaturestringCryptographic signature for verificationAnthropic, Bedrock

Type Mappings

Reasoning TypeWhen UsedSource
reasoning.textDirect thinking/reasoning contentAnthropic, Gemini, Bedrock
reasoning.encryptedSignature-verified reasoningAnthropic, Bedrock Nova
reasoning.summarySummarized reasoning (Responses API)All providers
OpenAI Implementation: OpenAI (both Chat Completions and Responses API) is effort-based, following the standard priority logic: if effort is provided, it’s used directly; if only max_tokens is provided, effort is estimated from it. The max_tokens field is then cleared before JSON serialization via MarshalJSON (core/providers/openai/types.go:383-453), since OpenAI’s APIs don’t accept it.

Streaming

Stream Event Types

ProviderReasoning EventSignature Event
OpenAIreasoning (top-level)N/A
Anthropicthinking_deltasignature_delta
Bedrockthinking_deltasignature_delta
Geminithought (in content)thought_signature

Anthropic Streaming Example

// Stream events
event: content_block_start
data: {"type": "content_block_start", "content_block": {"type": "thinking"}}

event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "thinking_delta", "thinking": "Let me"}}

event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "thinking_delta", "thinking": " analyze..."}}

event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "signature_delta", "signature": "EqoB..."}}

event: content_block_stop
data: {"type": "content_block_stop"}

Bifrost Stream Response

// Thinking delta
{
  "choices": [{
    "delta": {
      "reasoning_details": [{
        "index": 0,
        "type": "text",
        "text": "Let me analyze..."
      }]
    }
  }]
}

// Signature delta
{
  "choices": [{
    "delta": {
      "reasoning_details": [{
        "index": 0,
        "signature": "EqoB..."
      }]
    }
  }]
}

Caveats Summary

Severity: High Behavior: reasoning.max_tokens must be >= 1024 Impact: Requests with lower values fail with error Workaround: Always set max_tokens >= 1024 for Anthropic/Bedrock
Severity: Medium Behavior: reasoning.max_tokens = -1 converted to 1024 Impact: Dynamic budgeting not available on Anthropic/Bedrock Workaround: Set explicit token budget
Severity: Low Behavior: OpenAI’s minimal converted to low when routing to other providers Impact: Slightly different reasoning behavior
Severity: Low Behavior: signature field only present in Anthropic/Bedrock responses Impact: Signature-based verification only available for these providers
Severity: Low Behavior: Anthropic’s thinking.type always set to "enabled" regardless of effort Impact: Cannot disable thinking once reasoning param is present

Complete Provider Comparison

Reasoning Model

ProviderModel TypeBudget TypeMin BudgetSignature Support
OpenAIEffort-basedEffort-basedNone
AnthropicThinking blocksToken budget1024
Bedrock (Anthropic)Reasoning configToken budget1024
Bedrock (Nova)Reasoning configEffort-basedNone
GeminiThinking configToken-basedNone

Parameter Support

Providereffortmax_tokenssummaryStreaming
OpenAI✅ (4 levels)
Anthropic❌ (binary)
Bedrock (Anthropic)❌ (binary)
Bedrock (Nova)✅ (3 levels)⚠️ (ignored)
Gemini✅ (implicit)

Troubleshooting

Anthropic: “reasoning.max_tokens must be >= 1024”

Cause: Attempting to use reasoning with max_tokens < 1024 Solution: Ensure reasoning.max_tokens >= 1024 for Anthropic/Bedrock Anthropic models
// ❌ Invalid
{"reasoning": {"effort": "high", "max_tokens": 500}}

// ✅ Valid
{"reasoning": {"effort": "high", "max_tokens": 1024}}

OpenAI: Model doesn’t support reasoning

Cause: Using an older model that doesn’t support reasoning (e.g., gpt-4-turbo) Solution: Use models with reasoning support: gpt-4o, gpt-4o-mini (o1 series with native reasoning)

Bedrock Nova: max_tokens parameter being ignored

Expected Behavior: Bedrock Nova uses effort-based reasoning only Solution: Provide effort parameter instead of max_tokens for Nova models
// ✅ Correct for Nova
{"reasoning": {"effort": "high"}}