> ## Documentation Index > Fetch the complete documentation index at: https://docs.getbifrost.ai/llms.txt > Use this file to discover all available pages before exploring further. # Bifrost AI Gateway > The fastest way to build AI applications that never go down. A high-performance AI gateway unifying 20+ providers through a single OpenAI-compatible API. Bifrost is a high-performance AI gateway that unifies access to 20+ providers OpenAI, Anthropic, AWS Bedrock, Google Vertex, Azure, and more, through a unified API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade governance. In sustained benchmarks at 5,000 requests per second, Bifrost adds only **11 µs** of overhead per request. Bifrost architecture diagram

## Get started Deploy the HTTP API gateway with a built-in web UI for visual configuration and real-time monitoring Integrate directly into your Go application for maximum performance and control *** ## Open source features Replace existing AI SDK connections by changing just the base URL. Keep your code, gain fallbacks and governance. Seamless failover between providers and models. When your primary provider fails, Bifrost switches to backups automatically. Intelligent API key distribution with weighted load balancing, model-specific filtering, and automatic failover. The primary governance entity. Control access permissions, budgets, rate limits, and routing per consumer. Direct requests to specific models, providers, and keys. Implement weighted strategies and automatic fallbacks. Hierarchical cost control with budgets and rate limits at virtual key, team, and customer levels. Control which MCP tools are available per virtual key with strict allow-lists. Intelligent response caching based on semantic similarity. Reduce costs and latency for similar queries. Monitor every AI request in real-time. Track performance, debug issues, and analyze usage patterns. Native Prometheus metrics via scraping or Push Gateway for monitoring and alerting. OTLP integration for distributed tracing with Grafana, New Relic, Honeycomb, and more. Built-in Prometheus-based monitoring tracking HTTP-level and upstream provider metrics. Extensible middleware architecture. Build Go or WASM plugins for custom logic. Mock AI provider responses for testing, development, and simulation. *** ## MCP Gateway Enable AI models to discover and execute external tools dynamically via the **Model Context Protocol**. Bifrost acts as both an MCP client and server, connecting to external tool servers and exposing tools to clients like Claude Desktop. Learn how Bifrost integrates MCP to transform static chat models into action-capable agents. Execute MCP tools with full control over approval, security validation, and conversation flow. Autonomous tool execution with configurable auto-approval for trusted operations. Let AI write Python to orchestrate multiple tools - 50% less tokens, 40% lower latency. Five auth types — None, Headers, OAuth 2.0, Per-User OAuth, Per-User Headers. Lazy auth for per-user. Expose Bifrost itself as an MCP server so Claude Desktop, Cursor, and other MCP clients can use your configured tools. Register custom tools directly in your application and expose them via MCP. *** ## Enterprise features Advanced capabilities for teams running production AI systems at scale. Enterprise deployments include private networking, custom security controls, and governance features designed for enterprise-grade reliability. Content safety with AWS Bedrock Guardrails, Azure Content Safety, Google Model Armor, and Patronus AI for real-time protection. Predictive scaling with real-time health monitoring, automatically optimizing traffic across providers. High-availability with automatic service discovery, gossip-based sync, and zero-downtime deployments. OpenID Connect integration, user-level governance, team sync, and compliance frameworks. Fine-grained permissions with custom roles controlling access across all Bifrost resources. Deploy within your private cloud infrastructure with VPC isolation and enhanced security controls. Immutable audit trails for SOC 2, GDPR, HIPAA, and ISO 27001 compliance. Native Datadog integration for APM traces, LLM Observability, and metrics. Automated export of request logs and telemetry to storage systems and data lakes. *** ## SDK integrations Use Bifrost as a drop-in replacement for popular AI SDKs with zero code changes - just update the base URL. Drop-in replacement for the OpenAI Python and Node.js SDKs. Drop-in replacement for the Anthropic Python and TypeScript SDKs. Native AWS Bedrock SDK integration with full model support. Drop-in replacement for the Google GenAI SDK. Compatibility with LiteLLM proxy and SDK for unified model access. Integration with the LangChain framework for building AI applications. Integration with PydanticAI for type-safe AI agent development. *** ## Supported providers Bifrost supports 20+ AI providers through a single unified API. Configure multiple providers and Bifrost handles routing, failover, and load balancing automatically. See the [full provider support matrix](/providers/supported-providers/overview) for detailed capability comparisons. GPT-4o, o1, GPT-4, and more with full feature support. Claude 4, Claude 3.5, and Claude 3 model family. Multi-model access with native AWS authentication. Gemini and PaLM models with OAuth2 authentication. OpenAI models via Azure with deployment management. Gemini models with vision, audio, and embeddings. Ultra-fast inference with LPU hardware acceleration. Mistral and Mixtral models with tool support. Command models with chat, embeddings, and reasoning. High-speed inference with full streaming support. Local inference with OpenAI-compatible format. Inference API with chat, vision, TTS, and STT. Route to multiple providers with reasoning support. Web search integration with reasoning support. Text-to-speech and speech-to-text models. OpenAI-compatible with streaming and embeddings. Grok models with vision and reasoning support. Chat and streaming with tool calling support. Prediction-based architecture with async modes. SGLang runtime with streaming and embeddings. Self-hosted OpenAI-compatible inference with chat, embeddings, and STT.