Overview
Reasoning (also called “thinking” in some providers) allows AI models to show their step-by-step thought process before providing a final answer. This feature is available across multiple providers with different implementations.Bifrost normalizes all provider-specific reasoning formats to a consistent OpenAI-compatible structure using
reasoning in requests and reasoning_details in responses.Provider Support Matrix
| Provider | Request Field | Response Field | Min Budget | Effort Levels | Streaming |
|---|---|---|---|---|---|
| OpenAI | reasoning | reasoning_details | None | minimal, low, medium, high | ✅ |
| Anthropic | thinking | Content blocks | 1024 tokens | enabled only | ✅ |
| Bedrock (Anthropic) | thinking | Content blocks | 1024 tokens | enabled only | ✅ |
| Gemini 2.5+ | thinking_config | thought parts | 1024 | Budget-only | ✅ |
| Gemini 3.0+ | thinking_config | thought parts | 1024 | minimal, low, medium, high + Budget | ✅ |
Request Configuration
Chat Completions API
- JSON
- Go SDK
Responses API
- JSON
- Go SDK
Responses API supports both
effort + max_tokens (like Chat Completions) and adds the optional summary parameter for output summarization.Parameter Reference
Chat Completions API Parameters
| Parameter | Type | Description |
|---|---|---|
effort | string | Reasoning intensity level |
max_tokens | int | Maximum tokens for reasoning (budget) |
Responses API Parameters
| Parameter | Type | Description |
|---|---|---|
effort | string | Reasoning intensity level |
max_tokens | int | Maximum tokens for reasoning (budget) |
summary | string | Summary level: brief, detailed, or json |
Responses API accepts the same
effort and max_tokens parameters as Chat Completions, but adds an optional summary parameter for reasoning output summarization.Provider-Specific Conversions
OpenAI
OpenAI uses effort-based reasoning only. Bifrost applies priority logic:- If
reasoning.effortis provided → use it directly - Else if
reasoning.max_tokensis provided → estimate effort from it - The
max_tokensfield is cleared before sending to OpenAI
- Effort (JSON)
- Effort (Go)
- Max Tokens (JSON)
- Max Tokens (Go)
minimal, low, medium, high
When
minimal is encountered, it’s converted to low for non-OpenAI providers. OpenAI receives only: low, medium, high.Anthropic
Anthropic uses athinking parameter with different structure.
- Request Conversion (JSON)
- Request Conversion (Go)
- Response Conversion (JSON)
- Response Conversion (Go)
| Bifrost | Anthropic | Notes |
|---|---|---|
reasoning.effort | thinking.type | Always mapped to "enabled" |
reasoning.max_tokens | thinking.budget_tokens | Token budget for reasoning |
| Input Value | Converted To |
|---|---|
-1 (dynamic) | 1024 (minimum default) |
< 1024 | Error |
>= 1024 | Pass-through |
core/providers/anthropic/chat.go:104-134
Bedrock (Anthropic Models)
Bedrock uses the same structure as Anthropic for Claude models.- Request (JSON)
- Request (Go)
The same 1024 minimum token budget constraint applies to Bedrock Anthropic models. Attempts to set
max_tokens below 1024 will result in an error.core/providers/bedrock/utils.go:34-47
Bedrock (Nova Models)
Bedrock Nova models use an effort-based approach similar to OpenAI.- Request Conversion (JSON)
- Request Conversion (Go)
- Effort Levels
- No minimum token budget constraint
- Uses effort levels instead of token budgets
- High effort mode automatically clears conflicting parameters
core/providers/bedrock/utils.go:48-89
Gemini
Gemini usesthinking_config with dual support for both token budgets and effort levels, depending on the model version.
Model Version Support
| Gemini Version | thinkingBudget | thinkingLevel | Notes |
|---|---|---|---|
| 2.5+ | ✅ | ❌ | Budget-only models |
| 3.0+ | ✅ | ✅ | Support both budget and level |
Priority Rules
When bothreasoning.max_tokens and reasoning.effort are present:
- Budget Priority (JSON)
- Effort to Level (Gemini 3.0+)
- Effort to Budget (Gemini 2.5)
Model-Specific Level Conversions
Gemini Pro models have stricter constraints on thinking levels:| Bifrost Effort | Non-Pro Models | Pro Models | Notes |
|---|---|---|---|
"none" | Empty string | Empty string | Disables thinking |
"minimal" | "minimal" | "low" | Pro doesn’t support minimal |
"low" | "low" | "low" | Supported on all |
"medium" | "medium" | "high" | Pro doesn’t support medium |
"high" | "high" | "high" | Supported on all |
Special Values
| Value | Field | Behavior | Use Case |
|---|---|---|---|
0 | max_tokens | thinking_budget: 0, include_thoughts: false | Explicitly disable reasoning |
-1 | max_tokens | thinking_budget: -1 | Dynamic budget (Gemini decides) |
"none" | effort | thinking_budget: 0, include_thoughts: false | Disable reasoning |
- Dynamic Budget (JSON)
- Disable Reasoning (JSON)
- Go SDK Examples
Response Conversion
- Response (JSON)
- Response (Go)
Conversion Summary
Bifrost → Gemini (Request):| Input | Gemini 2.5 | Gemini 3.0+ | Note |
|---|---|---|---|
max_tokens: 4096 | thinking_budget: 4096 | thinking_budget: 4096 | Direct pass-through |
max_tokens: -1 | thinking_budget: -1 | thinking_budget: -1 | Dynamic budget |
max_tokens: 0 | thinking_budget: 0 | thinking_budget: 0 | Disabled |
effort: "high" only | thinking_budget: 3482* | thinking_level: "high" | Estimated or native |
effort: "medium" only | thinking_budget: 2330* | thinking_level: "medium" or "high"** | Estimated or native |
Both effort + max_tokens | Uses max_tokens | Uses max_tokens | Priority rule |
max_completion_tokens: 8192 (default), uses estimation formula** Pro models convert
"medium" to "high"
Gemini → Bifrost (Response):
| Gemini Field | Bifrost Field | Conversion |
|---|---|---|
thinking_budget | reasoning.max_tokens | Direct mapping |
thinking_level | reasoning.effort | Level → effort mapping |
thought: true parts | reasoning_details[] | Array of reasoning blocks |
core/providers/gemini/utils.go(Chat Completions)core/providers/gemini/responses.go(Responses API)core/providers/gemini/types.go(Constants)
Two Reasoning Methods: Effort vs. Max Tokens
Bifrost supports two distinct reasoning models across different providers:Reasoning Model Types
| Model | Providers | Request Field | Native Format |
|---|---|---|---|
| Effort-Based | OpenAI, AWS Bedrock Nova | reasoning.effort | reasoning_effort (Chat) / effort (Responses) |
| Max-Tokens-Based | Anthropic, Cohere, Gemini | reasoning.max_tokens | thinking.budget_tokens |
Priority Logic: Native vs. Estimated
When botheffort and max_tokens are present in a request, Bifrost prioritizes the native compatible field for the target provider:
For Max-Tokens-Based Providers (Anthropic, Cohere, Gemini)
max_tokens: 2000 directly, ignores effort
For Effort-Based Providers (OpenAI, AWS Bedrock Nova)
effort: "high" directly, strips max_tokens from JSON
Why Priority Matters
Why Priority Matters
Reason 1: Accuracy - Native fields provide direct control without estimation lossReason 2: Consistency - Using native fields ensures the exact user intent is preservedReason 3: Performance - Avoids unnecessary conversions when native field is already provided
Estimator Functions
Bifrost provides two estimator functions to convert between reasoning methods. These are used when the native field is not available.Function 1: Effort → Max Tokens
Function:GetBudgetTokensFromReasoningEffort()
File: core/providers/utils/utils.go:1350-1387
Signature:
minBudgetTokens=1024, maxTokens=4096):
| Effort | Ratio | Calculation | Result |
|---|---|---|---|
minimal | 2.5% | 1024 + 0.025 × 3072 | 1101 → 1024* |
low | 15% | 1024 + 0.15 × 3072 | 1485 |
medium | 42.5% | 1024 + 0.425 × 3072 | 2330 |
high | 80% | 1024 + 0.80 × 3072 | 3482 |
*When result is below minimum, clamped to minBudgetTokens (for Anthropic minimum of 1024)
Function 2: Max Tokens → Effort
Function:GetReasoningEffortFromBudgetTokens()
File: core/providers/utils/utils.go:1308-1345
Signature:
minBudgetTokens=1024, maxTokens=4096):
| Budget Tokens | Ratio | Effort |
|---|---|---|
| 1024 | 0% | low |
| 1101 | 2.5% | low |
| 1500 | 15.6% | low |
| 1900 | 28.6% | medium |
| 2500 | 48.1% | medium |
| 3000 | 64.5% | high |
| 3400 | 77.6% | high |
Provider-Specific Constants
Different providers have different constraints on reasoning budget:Min Budget Constants
| Provider | File | MinBudgetTokens | Reason |
|---|---|---|---|
| Anthropic | core/providers/anthropic/types.go | 1024 | Anthropic API requirement |
| Bedrock Anthropic | core/providers/bedrock/types.go | 1024 | Same as Anthropic |
| Bedrock Nova | core/providers/bedrock/types.go | 1 | More flexible |
| Cohere | core/providers/cohere/types.go | 1 | Flexible |
| Gemini | core/providers/gemini/types.go | 1024 | Default minimum for conversions |
Default Completion Tokens (for ratio calculation)
Whenmax_completion_tokens is not provided, these defaults are used for ratio calculations:
| Provider | Default | File |
|---|---|---|
| OpenAI, Anthropic, Cohere, Bedrock | 4096 | core/providers/*/types.go |
| Gemini | 8192 | core/providers/gemini/types.go |
Effort-to-Token Conversion Examples
Example 1: Estimate tokens from effort (Anthropic)
- JSON
- Go SDK
Input:Conversion Process:
effort = "high"→ratio = 0.80minBudgetTokens = 1024(Anthropic)maxCompletionTokens = 2000budget = 1024 + (0.80 × (2000 - 1024))budget = 1024 + (0.80 × 976)budget = 1024 + 780- Result: 1804 tokens
Example 2: Estimate effort from tokens (Bedrock Nova)
- JSON
- Go SDK
Input:Conversion Process:
budgetTokens = 2000minBudgetTokens = 1(Nova)maxCompletionTokens = 4096ratio = (2000 - 1) / (4096 - 1)ratio = 1999 / 4095ratio = 0.488(48.8%)- Since
0.25 < 0.488 ≤ 0.60→ Result: “medium”
Example 3: Both fields provided (priority used)
- JSON
- Go SDK
Input:Logic for Max-Tokens-Based Provider:Note: The
- Check: Is
max_tokensprovided? → YES - Use
max_tokensdirectly (ignoreeffort) - Validate:
2500 >= 1024? → YES
effort: "medium" is completely ignored because max_tokens takes priority.Response Format
Bifrost Standard Response
All providers return reasoning in a normalizedreasoning_details array:
Reasoning Details Fields
| Field | Type | Description | Present In |
|---|---|---|---|
index | int | Position in reasoning sequence | All |
type | string | Content type (text, encrypted, summary) | All |
text | string | Reasoning content | Chat Completions |
summary | string | Reasoning summary | Responses API |
signature | string | Cryptographic signature for verification | Anthropic, Bedrock |
Type Mappings
| Reasoning Type | When Used | Source |
|---|---|---|
reasoning.text | Direct thinking/reasoning content | Anthropic, Gemini, Bedrock |
reasoning.encrypted | Signature-verified reasoning | Anthropic, Bedrock Nova |
reasoning.summary | Summarized reasoning (Responses API) | All providers |
OpenAI Implementation: OpenAI (both Chat Completions and Responses API) is effort-based, following the standard priority logic: if
effort is provided, it’s used directly; if only max_tokens is provided, effort is estimated from it. The max_tokens field is then cleared before JSON serialization via MarshalJSON (core/providers/openai/types.go:383-453), since OpenAI’s APIs don’t accept it.Streaming
Stream Event Types
| Provider | Reasoning Event | Signature Event |
|---|---|---|
| OpenAI | reasoning (top-level) | N/A |
| Anthropic | thinking_delta | signature_delta |
| Bedrock | thinking_delta | signature_delta |
| Gemini | thought (in content) | thought_signature |
Anthropic Streaming Example
Bifrost Stream Response
Caveats Summary
Minimum Budget (Anthropic/Bedrock)
Minimum Budget (Anthropic/Bedrock)
Severity: High
Behavior:
reasoning.max_tokens must be >= 1024
Impact: Requests with lower values fail with error
Workaround: Always set max_tokens >= 1024 for Anthropic/BedrockDynamic Budget Not Supported
Dynamic Budget Not Supported
Severity: Medium
Behavior:
reasoning.max_tokens = -1 converted to 1024
Impact: Dynamic budgeting not available on Anthropic/Bedrock
Workaround: Set explicit token budgetEffort Level Normalization
Effort Level Normalization
Severity: Low
Behavior: OpenAI’s
minimal converted to low when routing to other providers
Impact: Slightly different reasoning behaviorSignature Field Provider-Specific
Signature Field Provider-Specific
Severity: Low
Behavior:
signature field only present in Anthropic/Bedrock responses
Impact: Signature-based verification only available for these providersThinking Type Always Enabled
Thinking Type Always Enabled
Severity: Low
Behavior: Anthropic’s
thinking.type always set to "enabled" regardless of effort
Impact: Cannot disable thinking once reasoning param is presentGemini: Only One Parameter Sent
Gemini: Only One Parameter Sent
Severity: Medium
Behavior: When both
effort and max_tokens are provided, only thinkingBudget is sent to Gemini (effort is dropped)
Impact: Effort value is completely ignored when max_tokens is present
Workaround: Provide only the parameter you want to useGemini: Model Version Differences
Gemini: Model Version Differences
Severity: Medium
Behavior: Gemini 2.5 only supports
thinkingBudget, while 3.0+ supports both thinkingBudget and thinkingLevel
Impact: Effort-only requests on 2.5 are converted to budget; on 3.0+ they use native levels
Note: Bifrost automatically detects version and uses appropriate conversionGemini Pro: Limited Level Support
Gemini Pro: Limited Level Support
Severity: Low
Behavior: Pro models only support “low” and “high” thinking levels
Impact:
"minimal" → "low", "medium" → "high" for Pro models
Note: Non-Pro models support all four levels: minimal, low, medium, highComplete Provider Comparison
Reasoning Model
| Provider | Model Type | Budget Type | Min Budget | Signature Support |
|---|---|---|---|---|
| OpenAI | Effort-based | Effort-based | None | ❌ |
| Anthropic | Thinking blocks | Token budget | 1024 | ✅ |
| Bedrock (Anthropic) | Reasoning config | Token budget | 1024 | ✅ |
| Bedrock (Nova) | Reasoning config | Effort-based | None | ❌ |
| Gemini 2.5+ | Thinking config | Token budget | 1024 | ✅ |
| Gemini 3.0+ | Thinking config | Dual (budget + level) | 1024 | ✅ |
Parameter Support
| Provider | effort | max_tokens | summary | Streaming |
|---|---|---|---|---|
| OpenAI | ✅ (4 levels) | ✅ | ❌ | ✅ |
| Anthropic | ❌ (binary) | ✅ | ✅ | ✅ |
| Bedrock (Anthropic) | ❌ (binary) | ✅ | ✅ | ✅ |
| Bedrock (Nova) | ✅ (3 levels) | ⚠️ (ignored) | ❌ | ✅ |
| Gemini 2.5+ | ⚠️ (converts to budget) | ✅ | ❌ | ✅ |
| Gemini 3.0+ | ✅ (4 levels) | ✅ | ❌ | ✅ |
Troubleshooting
Anthropic: “reasoning.max_tokens must be >= 1024”
Cause: Attempting to use reasoning withmax_tokens < 1024
Solution: Ensure reasoning.max_tokens >= 1024 for Anthropic/Bedrock Anthropic models
OpenAI: Model doesn’t support reasoning
Cause: Using an older model that doesn’t support reasoning (e.g.,gpt-4-turbo)
Solution: Use models with reasoning support: gpt-4o, gpt-4o-mini (o1 series with native reasoning)
Bedrock Nova: max_tokens parameter being ignored
Expected Behavior: Bedrock Nova uses effort-based reasoning only
Solution: Provide effort parameter instead of max_tokens for Nova models

