When an LLM returns tool calls in its response, Bifrost does not automatically execute them. Instead, your application explicitly calls the tool execution API, giving you full control over:
Which tool calls to execute
User approval workflows
Security validation
Audit logging
The basic flow is: Chat Request → Review Tool Calls → Execute Tools → Continue Conversation. For detailed architecture diagrams, see the MCP Architecture documentation.
The /v1/mcp/tool/execute endpoint uses the same authentication as other inference endpoints like /v1/chat/completions:
Auth Configuration
Behavior
disable_auth_on_inference: true
No auth required
disable_auth_on_inference: false
Auth required
Virtual keys and authentication are independent layers that work together. For details on how to use virtual keys with authentication, see Authentication and Virtual Keys.
LLMs often request multiple tools in a single response. Execute them in sequence or parallel:
Sequential
Parallel
for _, toolCall := range *response.Choices[0].Message.ToolCalls { result, err := client.ExecuteChatMCPTool(ctx, toolCall) if err != nil { // Handle error continue } history = append(history, *result)}
toolCalls := *response.Choices[0].Message.ToolCallsresults := make([]*schemas.ChatMessage, len(toolCalls))var wg sync.WaitGroupfor i, toolCall := range toolCalls { wg.Add(1) go func(idx int, tc schemas.ChatAssistantMessageToolCall) { defer wg.Done() result, err := client.ExecuteChatMCPTool(ctx, tc) if err == nil { results[idx] = result } }(i, toolCall)}wg.Wait()for _, result := range results { if result != nil { history = append(history, *result) }}
Tool execution responses are designed to be directly appended to your conversation history:
// Tool result is already in the correct formattoolResult, _ := client.ExecuteChatMCPTool(ctx, toolCall)// Just append it directlyhistory = append(history, *toolResult)