> ## Documentation Index > Fetch the complete documentation index at: https://docs.getbifrost.ai/llms.txt > Use this file to discover all available pages before exploring further. # Run Your Own Benchmarks > Step-by-step guide to benchmark Bifrost in your own environment using the official benchmarking tool. ## Overview Want to see Bifrost's performance in your specific environment? The [**Bifrost Benchmarking Repository**](https://github.com/maximhq/bifrost-benchmarking) provides everything you need to conduct comprehensive performance tests tailored to your infrastructure and workload requirements. **What You Can Test:** * **Custom Instance Sizes** - Test on your preferred AWS/GCP/Azure instances * **Your Workload Patterns** - Use your actual request/response sizes * **Different Configurations** - Compare various Bifrost settings * **Provider Comparisons** - Benchmark against other AI gateways or raw OpenAI * **Load Scenarios** - Test burst loads, sustained traffic, and endurance The repo also ships two companion tools: * **[mocker](https://github.com/maximhq/bifrost-benchmarking/tree/main/mocker)** — a mock LLM provider server with configurable latency, failures, and rate limits. Point your gateways at it to measure pure gateway overhead with zero API costs. * **[hitter](https://github.com/maximhq/bifrost-benchmarking/tree/main/hitter)** — a load generator for stress-testing a single Bifrost deployment with realistic multi-model/streaming traffic. > **💡 Open Source**: The benchmarking tool is completely open source! Feel free to submit pull requests if you think anything is missing or could be improved. *** ## Prerequisites Before running benchmarks, ensure you have: * **Go 1.24+** installed on your testing machine * **Bifrost instance** running and accessible * **Target providers** configured in Bifrost (real providers, or the [mocker](https://github.com/maximhq/bifrost-benchmarking/tree/main/mocker) for cost-free runs) * **Network access** between benchmark tool and Bifrost * **Sufficient resources** on the testing machine to generate load *** ## Quick Start ### **1. Clone the Repository** ```bash theme={null} git clone https://github.com/maximhq/bifrost-benchmarking.git cd bifrost-benchmarking ``` ### **2. Build the Benchmark Tool** ```bash theme={null} go build benchmark.go ``` This creates a `benchmark` executable (or `benchmark.exe` on Windows). ### **3. Configure Gateway Ports** Create a `.env` file in the repo root with the port of each gateway you plan to benchmark — the tool reads ports from here, not from flags: ```env theme={null} BIFROST_PORT=8080 OPENAI_API_KEY=sk-... # only needed when benchmarking raw OpenAI ``` To compare against other gateways, add their port variables too — the [repo README](https://github.com/maximhq/bifrost-benchmarking#readme) lists every supported gateway and its `.env` variable. ### **4. Run Your First Benchmark** Either `-rate` (fixed RPS) or `-users` (fixed concurrency) is required: ```bash theme={null} # Basic benchmark: 500 RPS for 10 seconds ./benchmark -provider bifrost -rate 500 # Custom benchmark: 1000 RPS for 30 seconds ./benchmark -provider bifrost -rate 1000 -duration 30 -output my_results.json ``` > **⚠️ Note**: Omitting `-provider` benchmarks **all** providers sequentially — including `openai`, which sends real requests to `api.openai.com` using your `OPENAI_API_KEY`. *** ## Configuration Options ### **Basic Configuration** | Flag | Required | Description | Default | | --------------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- | | `-rate ` | ✅\* | Requests per second (mutually exclusive with `-users`) | None | | `-users ` | ✅\* | Concurrent users to maintain (mutually exclusive with `-rate`) | None | | `-provider ` | ❌ | Gateway to benchmark: `bifrost`, `openai`, or another supported gateway (full list in the [repo README](https://github.com/maximhq/bifrost-benchmarking#readme)); empty runs all | None (all) | | `-duration ` | ❌ | Test duration in seconds | `10` | | `-output ` | ❌ | Results output file | `results.json` | | `-big-payload` | ❌ | Use a \~10KB request payload instead of the \~200B default | `false` | \* Exactly one of `-rate` or `-users` must be provided. ### **Advanced Configuration** | Flag | Description | Default | | ----------------------------- | ----------------------------------------------------------------- | ------------------ | | `-timeout ` | Request timeout — set to duration + expected backend latency | `300` | | `-cooldown ` | Cooldown between provider tests | `60` | | `-model ` | Model to put in the request payload | `gpt-4o-mini` | | `-host

` | Host address of the gateway servers | `localhost` | | `-path ` | API path to hit (e.g. `chat/completions`, `embeddings`) | `chat/completions` | | `-suffix ` | URL route suffix prepended to the path | `v1` | | `-request-type ` | `chat` or `embedding` — controls payload shape | `chat` | | `-prompt-file ` | File whose content is used as the prompt (for large-prompt tests) | `""` | | `-ramp-up` | Gradually ramp users up (only with `-users`) | `false` | | `-ramp-up-duration ` | Seconds to ramp from 1 to `-users` users | `0` | | `-debug` | Detailed logging and periodic status updates | `false` | ### **Rate vs. Users Mode** * **`-rate`** sends requests at a constant RPS regardless of response times — best for measuring throughput capacity and latency under a known load. * **`-users`** keeps exactly N requests in flight at all times; as one completes, the next is dispatched. Throughput becomes ≈ `users / avg_latency` — best for simulating connection pools and realistic client behavior. *** ## Benchmark Scenarios ### **1. Basic Performance Test** Test standard performance with typical request sizes: ```bash theme={null} ./benchmark -provider bifrost -rate 1000 -duration 60 -output basic_test.json ``` **Use Case**: General performance validation ### **2. High-Load Stress Test** Push your instance to its limits: ```bash theme={null} ./benchmark -provider bifrost -rate 5000 -duration 120 -output stress_test.json ``` **Use Case**: Capacity planning and SLA validation ### **3. Large Payload Test** Test with bigger request/response sizes: ```bash theme={null} ./benchmark -provider bifrost -rate 500 -duration 60 -big-payload -output large_payload.json ``` **Use Case**: Document processing, code generation workloads ### **4. Endurance Test** Long-running stability test: ```bash theme={null} ./benchmark -provider bifrost -rate 1000 -duration 1800 -timeout 2100 -output endurance_test.json ``` **Use Case**: Production readiness validation (30-minute test) ### **5. Concurrent Users with Ramp-Up** Simulate realistic traffic that gradually builds: ```bash theme={null} ./benchmark -provider bifrost -users 500 -duration 600 -ramp-up -ramp-up-duration 120 -output rampup_test.json ``` **Use Case**: Realistic user behavior — ramps from 1 to 500 concurrent users over 2 minutes, then holds ### **6. Comparative Benchmarking** Compare Bifrost against other gateways (each gateway's port comes from `.env`): ```bash theme={null} # Test Bifrost ./benchmark -provider bifrost -rate 1000 -duration 60 -output bifrost_results.json # Test another gateway (its port configured in .env — supported gateways listed in the repo README) ./benchmark -provider -rate 1000 -duration 60 -output gateway_results.json # Test direct OpenAI (needs OPENAI_API_KEY in .env; note the explicit path) ./benchmark -provider openai -path v1/chat/completions -rate 100 -duration 60 -output openai_results.json ``` *** ## Understanding Results The benchmark tool writes per-provider metrics to the output file (keyed by provider, latest run per provider): ### **Key Metrics Explained** ```json theme={null} { "bifrost": { "requests": 30000, "rate": 500.12, "success_rate": 99.8, "mean_latency_ms": 45.2, "p50_latency_ms": 42.1, "p99_latency_ms": 156.7, "max_latency_ms": 203.4, "throughput_rps": 498.5, "timestamp": "2025-01-14T10:30:00Z", "status_code_counts": { "200": 29940, "500": 60 }, "server_peak_memory_mb": 256.7, "server_avg_memory_mb": 189.3, "drop_reasons": { "HTTP 500": 60 } } } ``` ### **Critical Performance Indicators** **Success Rate:** * **Target**: >99.9% for production readiness * **Excellent**: 100% (perfect reliability) **Latency Metrics:** * **P50 (Median)**: Typical user experience * **P99**: Worst-case user experience * **Mean**: Overall average performance **Memory Usage:** * **Peak / Average**: server-side RSS sampled during the run — the tool finds the gateway process by its configured port, so run the benchmark on the same machine as the gateway to capture memory stats **Drop Reasons:** * Categorized failure analysis (timeouts, HTTP errors, connection failures) *** ## Instance Sizing Recommendations Based on your benchmark results, use these guidelines for production sizing: ### **Resource Planning Matrix** | Target RPS | Memory Usage | Recommended Instance | Notes | | ----------------- | ------------ | -------------------- | ------------------------------ | | **\< 1,000** | \< 1GB | t3.small | Cost-effective for light loads | | **1,000 - 3,000** | 1-2GB | t3.medium | Balanced performance/cost | | **3,000 - 5,000** | 2-4GB | t3.large | High-performance production | | **5,000+** | 3-6GB | t3.xlarge+ | Enterprise/mission-critical | ### **Configuration Tuning Based on Results** **If seeing high latency:** * Increase `initial_pool_size` * Increase `buffer_size` * Consider larger instance **If memory usage is high:** * Decrease `initial_pool_size` * Optimize `buffer_size` * Monitor for memory leaks **If success rate \< 100%:** * Reduce request rate * Increase timeout settings * Check provider limits *** ## Advanced Testing Scenarios ### **Burst Load Testing** Simulate traffic spikes: ```bash theme={null} # Normal load ./benchmark -provider bifrost -rate 1000 -duration 300 -output normal_load.json # Burst load (simulate 5x spike) ./benchmark -provider bifrost -rate 5000 -duration 60 -output burst_load.json ``` ### **Multi-Instance Testing** Test horizontal scaling — environment variables override `.env`, so you can target multiple instances in parallel: ```bash theme={null} # Instance 1 BIFROST_PORT=8080 ./benchmark -provider bifrost -rate 2500 -duration 120 -output instance_1.json & # Instance 2 BIFROST_PORT=8081 ./benchmark -provider bifrost -rate 2500 -duration 120 -output instance_2.json & # Wait for both to complete wait ``` ### **Embeddings Benchmarking** Benchmark embeddings endpoints, optionally with very large prompts from a file: ```bash theme={null} ./benchmark -provider bifrost -request-type embedding -path embeddings \ -model text-embedding-3-small -prompt-file 10kbprompt.txt -rate 10 -duration 30 ``` The repo root includes `10kbprompt.txt` and `50kbprompt.txt` as ready-made fixtures. *** ## Continuous Benchmarking ### **Automated Testing Pipeline** Set up regular performance regression testing: ```bash theme={null} #!/bin/bash # daily_benchmark.sh DATE=$(date +%Y%m%d_%H%M%S) OUTPUT_DIR="benchmarks/$DATE" mkdir -p $OUTPUT_DIR # Run standard benchmarks ./benchmark -provider bifrost -rate 1000 -duration 300 -output "$OUTPUT_DIR/standard.json" ./benchmark -provider bifrost -rate 3000 -duration 180 -output "$OUTPUT_DIR/high_load.json" ./benchmark -provider bifrost -rate 500 -duration 600 -big-payload -output "$OUTPUT_DIR/large_payload.json" echo "Benchmarks completed: $OUTPUT_DIR" ``` ### **Performance Monitoring Integration** Monitor key metrics over time: * **Success rate trends** * **Latency percentile changes** * **Memory usage patterns** * **Throughput capacity** *** ## Troubleshooting ### **Common Issues** **"Either --rate or --users flag must be provided":** * Exactly one of `-rate` or `-users` is required; they are mutually exclusive. **Connection Refused:** ```bash theme={null} # Check if Bifrost is running curl http://localhost:8080/health # Verify port configuration netstat -an | grep 8080 ``` * Check the provider's port (e.g. `BIFROST_PORT`) is defined in the `.env` file at the repo root. **"No process found on port":** * The gateway isn't running, or the `.env` port is wrong. The benchmark still runs; only memory stats are skipped. **"Attack for \[Provider] timed out":** * Raise `-timeout`; it must cover `duration + backend latency`. **High Error Rates:** * Check provider API key limits * Verify Bifrost configuration * Monitor upstream provider status * Reduce request rate for baseline test **Inconsistent Results:** * Run multiple test iterations * Account for network variability * Use longer test durations (60+ seconds) * Isolate testing environment * Point the gateway at the repo's [mock provider](https://github.com/maximhq/bifrost-benchmarking/tree/main/mocker) to eliminate upstream variability *** ## Next Steps ### **After Running Benchmarks** 1. **Analyze Results**: Compare against [official benchmarks](./getting-started) 2. **Optimize Configuration**: Tune based on your specific results 3. **Plan Capacity**: Size instances based on measured performance 4. **Set Up Monitoring**: Track key metrics in production ### **Compare Results** * **[t3.medium Performance](./t3.medium)** - Compare against medium instance results * **[t3.xlarge Performance](./t3.xl)** - Compare against high-performance configuration **Ready to benchmark? Clone the [repository](https://github.com/maximhq/bifrost-benchmarking) and start testing!**