Creates a completion for the provided messages. Supports streaming via SSE.
Bearer token authentication. Use your provider API key or Bifrost authentication token.
Virtual keys (prefixed with sk-bf-) can also be passed here.
Model in provider/model format (e.g., openai/gpt-4)
"openai/gpt-4"
List of messages in the conversation
Fallback models in provider/model format
Whether to stream the response
-2 <= x <= 2-2 <= x <= 2Format for the response
0 <= x <= 2none, auto, required Deterministic sampling seed
Nucleus sampling parameter
0 <= x <= 1Number of most likely tokens to return at each position
0 <= x <= 20Up to 4 sequences where the API will stop generating tokens
Predicted output content for the model to reference (OpenAI only). Can reduce latency.
Prompt cache retention policy
in-memory, 24h Web search options for chat completions (OpenAI only)
low, medium, high