Overview
Reasoning (also called “thinking” in some providers) allows AI models to show their step-by-step thought process before providing a final answer. This feature is available across multiple providers with different implementations.Bifrost normalizes all provider-specific reasoning formats to a consistent OpenAI-compatible structure using
reasoning in requests and reasoning_details in responses.Provider Support Matrix
| Provider | Request Field | Response Field | Min Budget | Effort Levels | Streaming |
|---|---|---|---|---|---|
| OpenAI | reasoning | reasoning_details | None | minimal, low, medium, high | ✅ |
| Anthropic | thinking | Content blocks | 1024 tokens | enabled only | ✅ |
| Bedrock (Anthropic) | thinking | Content blocks | 1024 tokens | enabled only | ✅ |
| Gemini | thinking_config | thought parts | None | off, low, medium, high | ✅ |
Request Configuration
Chat Completions API
- JSON
- Go SDK
Responses API
- JSON
- Go SDK
Responses API supports both
effort + max_tokens (like Chat Completions) and adds the optional summary parameter for output summarization.Parameter Reference
Chat Completions API Parameters
| Parameter | Type | Description |
|---|---|---|
effort | string | Reasoning intensity level |
max_tokens | int | Maximum tokens for reasoning (budget) |
Responses API Parameters
| Parameter | Type | Description |
|---|---|---|
effort | string | Reasoning intensity level |
max_tokens | int | Maximum tokens for reasoning (budget) |
summary | string | Summary level: brief, detailed, or json |
Responses API accepts the same
effort and max_tokens parameters as Chat Completions, but adds an optional summary parameter for reasoning output summarization.Provider-Specific Conversions
OpenAI
OpenAI uses effort-based reasoning only. Bifrost applies priority logic:- If
reasoning.effortis provided → use it directly - Else if
reasoning.max_tokensis provided → estimate effort from it - The
max_tokensfield is cleared before sending to OpenAI
- Effort (JSON)
- Effort (Go)
- Max Tokens (JSON)
- Max Tokens (Go)
minimal, low, medium, high
When
minimal is encountered, it’s converted to low for non-OpenAI providers. OpenAI receives only: low, medium, high.Anthropic
Anthropic uses athinking parameter with different structure.
- Request Conversion (JSON)
- Request Conversion (Go)
- Response Conversion (JSON)
- Response Conversion (Go)
| Bifrost | Anthropic | Notes |
|---|---|---|
reasoning.effort | thinking.type | Always mapped to "enabled" |
reasoning.max_tokens | thinking.budget_tokens | Token budget for reasoning |
| Input Value | Converted To |
|---|---|
-1 (dynamic) | 1024 (minimum default) |
< 1024 | Error |
>= 1024 | Pass-through |
core/providers/anthropic/chat.go:104-134
Bedrock (Anthropic Models)
Bedrock uses the same structure as Anthropic for Claude models.- Request (JSON)
- Request (Go)
The same 1024 minimum token budget constraint applies to Bedrock Anthropic models. Attempts to set
max_tokens below 1024 will result in an error.core/providers/bedrock/utils.go:34-47
Bedrock (Nova Models)
Bedrock Nova models use an effort-based approach similar to OpenAI.- Request Conversion (JSON)
- Request Conversion (Go)
- Effort Levels
- No minimum token budget constraint
- Uses effort levels instead of token budgets
- High effort mode automatically clears conflicting parameters
core/providers/bedrock/utils.go:48-89
Gemini
Gemini usesthinking_config with effort-based configuration.
- Request Conversion (JSON)
- Request Conversion (Go)
- Response Conversion (JSON)
- Response Conversion (Go)
| Bifrost Effort | Gemini Mode |
|---|---|
| Not set | off |
low | Uses budget |
medium | Uses budget |
high | Uses budget |
core/providers/gemini/chat.go
Two Reasoning Methods: Effort vs. Max Tokens
Bifrost supports two distinct reasoning models across different providers:Reasoning Model Types
| Model | Providers | Request Field | Native Format |
|---|---|---|---|
| Effort-Based | OpenAI, AWS Bedrock Nova | reasoning.effort | reasoning_effort (Chat) / effort (Responses) |
| Max-Tokens-Based | Anthropic, Cohere, Gemini | reasoning.max_tokens | thinking.budget_tokens |
Priority Logic: Native vs. Estimated
When botheffort and max_tokens are present in a request, Bifrost prioritizes the native compatible field for the target provider:
For Max-Tokens-Based Providers (Anthropic, Cohere, Gemini)
max_tokens: 2000 directly, ignores effort
For Effort-Based Providers (OpenAI, AWS Bedrock Nova)
effort: "high" directly, strips max_tokens from JSON
Why Priority Matters
Why Priority Matters
Reason 1: Accuracy - Native fields provide direct control without estimation lossReason 2: Consistency - Using native fields ensures the exact user intent is preservedReason 3: Performance - Avoids unnecessary conversions when native field is already provided
Estimator Functions
Bifrost provides two estimator functions to convert between reasoning methods. These are used when the native field is not available.Function 1: Effort → Max Tokens
Function:GetBudgetTokensFromReasoningEffort()
File: core/providers/utils/utils.go:1350-1387
Signature:
minBudgetTokens=1024, maxTokens=4096):
| Effort | Ratio | Calculation | Result |
|---|---|---|---|
minimal | 2.5% | 1024 + 0.025 × 3072 | 1101 → 1024* |
low | 15% | 1024 + 0.15 × 3072 | 1485 |
medium | 42.5% | 1024 + 0.425 × 3072 | 2330 |
high | 80% | 1024 + 0.80 × 3072 | 3482 |
*When result is below minimum, clamped to minBudgetTokens (for Anthropic minimum of 1024)
Function 2: Max Tokens → Effort
Function:GetReasoningEffortFromBudgetTokens()
File: core/providers/utils/utils.go:1308-1345
Signature:
minBudgetTokens=1024, maxTokens=4096):
| Budget Tokens | Ratio | Effort |
|---|---|---|
| 1024 | 0% | low |
| 1101 | 2.5% | low |
| 1500 | 15.6% | low |
| 1900 | 28.6% | medium |
| 2500 | 48.1% | medium |
| 3000 | 64.5% | high |
| 3400 | 77.6% | high |
Provider-Specific Constants
Different providers have different constraints on reasoning budget:Min Budget Constants
| Provider | File | MinBudgetTokens | Reason |
|---|---|---|---|
| Anthropic | core/providers/anthropic/types.go | 1024 | Anthropic API requirement |
| Bedrock Anthropic | core/providers/bedrock/types.go | 1024 | Same as Anthropic |
| Bedrock Nova | core/providers/bedrock/types.go | 1 | More flexible |
| Cohere | core/providers/cohere/types.go | 1 | Flexible |
| Gemini | core/providers/gemini/types.go | 1 | Flexible |
Default Completion Tokens (for ratio calculation)
Whenmax_completion_tokens is not provided, these defaults are used for ratio calculations:
| Provider | Default | File |
|---|---|---|
| All providers | 4096 | core/providers/*/types.go |
Effort-to-Token Conversion Examples
Example 1: Estimate tokens from effort (Anthropic)
- JSON
- Go SDK
Input:Conversion Process:
effort = "high"→ratio = 0.80minBudgetTokens = 1024(Anthropic)maxCompletionTokens = 2000budget = 1024 + (0.80 × (2000 - 1024))budget = 1024 + (0.80 × 976)budget = 1024 + 780- Result: 1804 tokens
Example 2: Estimate effort from tokens (Bedrock Nova)
- JSON
- Go SDK
Input:Conversion Process:
budgetTokens = 2000minBudgetTokens = 1(Nova)maxCompletionTokens = 4096ratio = (2000 - 1) / (4096 - 1)ratio = 1999 / 4095ratio = 0.488(48.8%)- Since
0.25 < 0.488 ≤ 0.60→ Result: “medium”
Example 3: Both fields provided (priority used)
- JSON
- Go SDK
Input:Logic for Max-Tokens-Based Provider:Note: The
- Check: Is
max_tokensprovided? → YES - Use
max_tokensdirectly (ignoreeffort) - Validate:
2500 >= 1024? → YES
effort: "medium" is completely ignored because max_tokens takes priority.Response Format
Bifrost Standard Response
All providers return reasoning in a normalizedreasoning_details array:
Reasoning Details Fields
| Field | Type | Description | Present In |
|---|---|---|---|
index | int | Position in reasoning sequence | All |
type | string | Content type (text, encrypted, summary) | All |
text | string | Reasoning content | Chat Completions |
summary | string | Reasoning summary | Responses API |
signature | string | Cryptographic signature for verification | Anthropic, Bedrock |
Type Mappings
| Reasoning Type | When Used | Source |
|---|---|---|
reasoning.text | Direct thinking/reasoning content | Anthropic, Gemini, Bedrock |
reasoning.encrypted | Signature-verified reasoning | Anthropic, Bedrock Nova |
reasoning.summary | Summarized reasoning (Responses API) | All providers |
OpenAI Implementation: OpenAI (both Chat Completions and Responses API) is effort-based, following the standard priority logic: if
effort is provided, it’s used directly; if only max_tokens is provided, effort is estimated from it. The max_tokens field is then cleared before JSON serialization via MarshalJSON (core/providers/openai/types.go:383-453), since OpenAI’s APIs don’t accept it.Streaming
Stream Event Types
| Provider | Reasoning Event | Signature Event |
|---|---|---|
| OpenAI | reasoning (top-level) | N/A |
| Anthropic | thinking_delta | signature_delta |
| Bedrock | thinking_delta | signature_delta |
| Gemini | thought (in content) | thought_signature |
Anthropic Streaming Example
Bifrost Stream Response
Caveats Summary
Minimum Budget (Anthropic/Bedrock)
Minimum Budget (Anthropic/Bedrock)
Severity: High
Behavior:
reasoning.max_tokens must be >= 1024
Impact: Requests with lower values fail with error
Workaround: Always set max_tokens >= 1024 for Anthropic/BedrockDynamic Budget Not Supported
Dynamic Budget Not Supported
Severity: Medium
Behavior:
reasoning.max_tokens = -1 converted to 1024
Impact: Dynamic budgeting not available on Anthropic/Bedrock
Workaround: Set explicit token budgetEffort Level Normalization
Effort Level Normalization
Severity: Low
Behavior: OpenAI’s
minimal converted to low when routing to other providers
Impact: Slightly different reasoning behaviorSignature Field Provider-Specific
Signature Field Provider-Specific
Severity: Low
Behavior:
signature field only present in Anthropic/Bedrock responses
Impact: Signature-based verification only available for these providersThinking Type Always Enabled
Thinking Type Always Enabled
Severity: Low
Behavior: Anthropic’s
thinking.type always set to "enabled" regardless of effort
Impact: Cannot disable thinking once reasoning param is presentComplete Provider Comparison
Reasoning Model
| Provider | Model Type | Budget Type | Min Budget | Signature Support |
|---|---|---|---|---|
| OpenAI | Effort-based | Effort-based | None | ❌ |
| Anthropic | Thinking blocks | Token budget | 1024 | ✅ |
| Bedrock (Anthropic) | Reasoning config | Token budget | 1024 | ✅ |
| Bedrock (Nova) | Reasoning config | Effort-based | None | ❌ |
| Gemini | Thinking config | Token-based | None | ✅ |
Parameter Support
| Provider | effort | max_tokens | summary | Streaming |
|---|---|---|---|---|
| OpenAI | ✅ (4 levels) | ✅ | ❌ | ✅ |
| Anthropic | ❌ (binary) | ✅ | ✅ | ✅ |
| Bedrock (Anthropic) | ❌ (binary) | ✅ | ✅ | ✅ |
| Bedrock (Nova) | ✅ (3 levels) | ⚠️ (ignored) | ❌ | ✅ |
| Gemini | ✅ (implicit) | ✅ | ❌ | ✅ |
Troubleshooting
Anthropic: “reasoning.max_tokens must be >= 1024”
Cause: Attempting to use reasoning withmax_tokens < 1024
Solution: Ensure reasoning.max_tokens >= 1024 for Anthropic/Bedrock Anthropic models
OpenAI: Model doesn’t support reasoning
Cause: Using an older model that doesn’t support reasoning (e.g.,gpt-4-turbo)
Solution: Use models with reasoning support: gpt-4o, gpt-4o-mini (o1 series with native reasoning)
Bedrock Nova: max_tokens parameter being ignored
Expected Behavior: Bedrock Nova uses effort-based reasoning only
Solution: Provide effort parameter instead of max_tokens for Nova models

