Skip to main content

Overview

The Prompts plugin connects the Prompt Repository to inference. It loads committed prompt versions from the config store and prepends their messages to Chat Completions and Responses requests. It also merges model parameters from the stored version with the incoming request (request values take precedence). What it does:
  • Resolves which prompt and version to apply per request (default: HTTP headers).
  • Injects the version’s message history before the client’s messages.
  • Applies the version’s model parameters as defaults, then overrides with whatever the client sent for the same parameters.

Prerequisites

  • Config store with Prompt Repository tables (typically PostgreSQL). File-backed config alone does not store prompts.
  • Prompts authored and committed as versions in the UI or via the /api/prompt-repo/... HTTP API (see docs/openapi/openapi.yaml in the repository).
  • A prompt ID (UUID) for each prompt you reference at runtime. You can read it from the repository API or the playground.

How it works

  1. Transport (HTTP): Incoming headers bf-prompt-id and bf-prompt-version are copied onto the Bifrost context (header name matching is case-insensitive).
  2. Resolve: The plugin looks up the prompt and the requested version. If bf-prompt-version is omitted, the prompt’s latest committed version is used.
  3. Parameters: Version model parameters are merged into the request; any field already set on the request wins.
  4. Messages: Messages from the committed version are prepended to messages (chat) or input (responses). Your request body adds the user turn(s) after the template.
If the prompt ID is missing, the plugin does nothing and the request passes through unchanged.

HTTP headers (gateway)

HeaderRequiredDescription
bf-prompt-idYes, to enable injectionUUID of the prompt in the repository.
bf-prompt-versionNoInteger version number (e.g. 3 for v3). If omitted, the latest committed version for that prompt is used.
Invalid or unknown IDs / versions are logged as warnings; the request is not failed by the plugin (it proceeds without template injection).

Example: Chat Completions

Use the same JSON body as a normal chat request. Only the headers select the template.
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "bf-prompt-id: YOUR-PROMPT-UUID" \
  -H "x-bf-vk: sk-bf-your-virtual-key" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [
      {
        "role": "user",
        "content": "Tell me about Bifrost Gateway?"
      }
    ]
  }'
Commit Version with Stream enabled in the playground When you commit a version from the playground, Stream is saved in that version’s model parameters. The example curl above does not set "stream": true in the JSON body, but if the committed version was saved with streaming enabled (as in the screenshot), the merged parameters still include stream: true, so the request is handled as streaming even though the client did not send stream explicitly. LLM log for the same request showing Type: Chat Stream In Logs, that run shows Type: Chat Stream and the full conversation: the committed system template, your user message from the request body, and the assistant reply. The provider receives the stored messages from the prompt version, checks if the request is streaming or non-streaming, applies the additional model parameters from the request and prepends the messages from the prompt version followed by your user message.

Example: Responses API

curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -H "bf-prompt-id: YOUR-PROMPT-UUID" \
  -H "bf-prompt-version: 4" \
  -H "x-bf-vk: sk-bf-your-virtual-key" \
  -d '{
    "model": "openai/gpt-5-nano-2025-08-07",
    "input": "What is Pale Blue Dot?"
  }'

Streaming

If the committed version’s model parameters include "stream": true, the plugin may set streaming on the HTTP transport so behavior matches the saved version. Client-side stream flags still interact with the merged parameters as usual.

Cache and updates

The plugin keeps an in-memory cache of prompts and versions (loaded with a small number of store queries at startup). When you create, update, or delete prompts or versions through the gateway APIs, the server reloads that cache so new commits are visible without a full process restart.

Go SDK and custom resolution

For embedded Bifrost (Go SDK), register the plugin with prompts.Init and a config store that implements the prompt tables API. The default resolver reads the same logical keys from BifrostContext:
  • prompts.PromptIDKey (bf-prompt-id)
  • prompts.PromptVersionKey (bf-prompt-version)
Set them on the context you pass to ChatCompletion / Responses if you are not going through the HTTP transport hooks. For advanced routing (for example, choosing a prompt from governance metadata), implement prompts.PromptResolver in plugins/prompts/main.go and use prompts.InitWithResolver.
  • Playground — create folders, prompts, sessions, and committed versions.
  • Writing Go plugins — plugin interfaces and lifecycle.
  • Built-in plugin name in code: prompts (github.com/maximhq/bifrost/plugins/prompts).