Overview
The Prompts plugin connects the Prompt Repository to inference. It loads committed prompt versions from the config store and prepends their messages to Chat Completions and Responses requests. It also merges model parameters from the stored version with the incoming request (request values take precedence). What it does:- Resolves which prompt and version to apply per request (default: HTTP headers).
- Injects the version’s message history before the client’s messages.
- Applies the version’s
modelparameters as defaults, then overrides with whatever the client sent for the same parameters.
Prerequisites
- Config store with Prompt Repository tables (typically PostgreSQL). File-backed config alone does not store prompts.
- Prompts authored and committed as versions in the UI or via the
/api/prompt-repo/...HTTP API (seedocs/openapi/openapi.yamlin the repository). - A prompt ID (UUID) for each prompt you reference at runtime. You can read it from the repository API or the playground.
How it works
- Transport (HTTP): Incoming headers
bf-prompt-idandbf-prompt-versionare copied onto the Bifrost context (header name matching is case-insensitive). - Resolve: The plugin looks up the prompt and the requested version. If
bf-prompt-versionis omitted, the prompt’s latest committed version is used. - Parameters: Version
modelparameters are merged into the request; any field already set on the request wins. - Messages: Messages from the committed version are prepended to
messages(chat) orinput(responses). Your request body adds the user turn(s) after the template.
HTTP headers (gateway)
| Header | Required | Description |
|---|---|---|
bf-prompt-id | Yes, to enable injection | UUID of the prompt in the repository. |
bf-prompt-version | No | Integer version number (e.g. 3 for v3). If omitted, the latest committed version for that prompt is used. |
Example: Chat Completions
Use the same JSON body as a normal chat request. Only the headers select the template.
curl above does not set "stream": true in the JSON body, but if the committed version was saved with streaming enabled (as in the screenshot), the merged parameters still include stream: true, so the request is handled as streaming even though the client did not send stream explicitly.

Example: Responses API
Streaming
If the committed version’s model parameters include"stream": true, the plugin may set streaming on the HTTP transport so behavior matches the saved version. Client-side stream flags still interact with the merged parameters as usual.
Cache and updates
The plugin keeps an in-memory cache of prompts and versions (loaded with a small number of store queries at startup). When you create, update, or delete prompts or versions through the gateway APIs, the server reloads that cache so new commits are visible without a full process restart.Go SDK and custom resolution
For embedded Bifrost (Go SDK), register the plugin withprompts.Init and a config store that implements the prompt tables API. The default resolver reads the same logical keys from BifrostContext:
prompts.PromptIDKey(bf-prompt-id)prompts.PromptVersionKey(bf-prompt-version)
ChatCompletion / Responses if you are not going through the HTTP transport hooks.
For advanced routing (for example, choosing a prompt from governance metadata), implement prompts.PromptResolver in plugins/prompts/main.go and use prompts.InitWithResolver.
Related
- Playground — create folders, prompts, sessions, and committed versions.
- Writing Go plugins — plugin interfaces and lifecycle.
- Built-in plugin name in code:
prompts(github.com/maximhq/bifrost/plugins/prompts).

