Skip to main content
POST
/
v1
/
async
/
chat
/
completions
Create async chat completion
curl --request POST \
  --url http://localhost:8080/v1/async/chat/completions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "openai/gpt-4",
  "messages": [
    {
      "role": "assistant",
      "name": "<string>",
      "content": "<string>",
      "tool_call_id": "<string>",
      "refusal": "<string>",
      "audio": {
        "id": "<string>",
        "data": "<string>",
        "expires_at": 123,
        "transcript": "<string>"
      },
      "reasoning": "<string>",
      "reasoning_details": [
        {
          "id": "<string>",
          "index": 123,
          "type": "reasoning.summary",
          "summary": "<string>",
          "text": "<string>",
          "signature": "<string>",
          "data": "<string>"
        }
      ],
      "annotations": [
        {
          "type": "<string>",
          "url_citation": {
            "start_index": 123,
            "end_index": 123,
            "title": "<string>",
            "url": "<string>",
            "sources": {},
            "type": "<string>"
          }
        }
      ],
      "tool_calls": [
        {
          "function": {
            "name": "<string>",
            "arguments": "<string>"
          },
          "index": 123,
          "type": "<string>",
          "id": "<string>"
        }
      ]
    }
  ],
  "fallbacks": [
    "<string>"
  ],
  "stream": true,
  "frequency_penalty": 0,
  "logit_bias": {},
  "logprobs": true,
  "max_completion_tokens": 123,
  "metadata": {},
  "modalities": [
    "<string>"
  ],
  "parallel_tool_calls": true,
  "presence_penalty": 0,
  "prompt_cache_key": "<string>",
  "reasoning": {
    "effort": "none",
    "max_tokens": 123
  },
  "response_format": {},
  "safety_identifier": "<string>",
  "service_tier": "<string>",
  "stream_options": {
    "include_obfuscation": true,
    "include_usage": true
  },
  "store": true,
  "temperature": 1,
  "tool_choice": "none",
  "tools": [
    {
      "type": "function",
      "custom": {},
      "cache_control": {
        "type": "ephemeral",
        "ttl": "<string>"
      }
    }
  ],
  "seed": 123,
  "top_p": 0.5,
  "top_logprobs": 10,
  "stop": "<string>",
  "prediction": {
    "type": "<string>",
    "content": "<string>"
  },
  "prompt_cache_retention": "in-memory",
  "web_search_options": {
    "search_context_size": "low",
    "user_location": {
      "type": "<string>",
      "approximate": {
        "city": "<string>",
        "country": "<string>",
        "region": "<string>",
        "timezone": "<string>"
      }
    }
  },
  "truncation": "<string>",
  "user": "<string>",
  "verbosity": "low"
}
'
{
  "id": "<string>",
  "status": "pending",
  "created_at": "2023-11-07T05:31:56Z",
  "expires_at": "2023-11-07T05:31:56Z",
  "completed_at": "2023-11-07T05:31:56Z",
  "status_code": 123,
  "result": "<unknown>",
  "error": {
    "event_id": "<string>",
    "type": "<string>",
    "is_bifrost_error": true,
    "status_code": 123,
    "error": {
      "type": "<string>",
      "code": "<string>",
      "message": "<string>",
      "param": "<string>",
      "event_id": "<string>"
    },
    "extra_fields": {
      "provider": "openai",
      "model_requested": "<string>",
      "request_type": "<string>"
    }
  }
}

Headers

x-bf-async-job-result-ttl
integer
default:3600

Time-to-live in seconds for the job result after completion. Defaults to 3600 (1 hour). After expiry, the job result is automatically cleaned up.

Body

application/json
model
string
required

Model in provider/model format (e.g., openai/gpt-4)

Example:

"openai/gpt-4"

messages
object[]
required

List of messages in the conversation

fallbacks
string[]

Fallback models in provider/model format

stream
boolean

Whether to stream the response

frequency_penalty
number
Required range: -2 <= x <= 2
logit_bias
object
logprobs
boolean
max_completion_tokens
integer
metadata
object
modalities
string[]
parallel_tool_calls
boolean
presence_penalty
number
Required range: -2 <= x <= 2
prompt_cache_key
string
reasoning
object
response_format
object

Format for the response

safety_identifier
string
service_tier
string
stream_options
object
store
boolean
temperature
number
Required range: 0 <= x <= 2
tool_choice
Available options:
none,
auto,
required
tools
object[]
seed
integer

Deterministic sampling seed

top_p
number

Nucleus sampling parameter

Required range: 0 <= x <= 1
top_logprobs
integer

Number of most likely tokens to return at each position

Required range: 0 <= x <= 20
stop

Up to 4 sequences where the API will stop generating tokens

prediction
object

Predicted output content for the model to reference (OpenAI only). Can reduce latency.

prompt_cache_retention
enum<string>

Prompt cache retention policy

Available options:
in-memory,
24h
web_search_options
object

Web search options for chat completions (OpenAI only)

truncation
string
user
string
verbosity
enum<string>
Available options:
low,
medium,
high

Response

Job accepted for processing

Response returned when creating or polling an async job

id
string
required

Unique identifier for the async job

status
enum<string>
required

The status of an async job

Available options:
pending,
processing,
completed,
failed
created_at
string<date-time>
required

When the job was created

expires_at
string<date-time>

When the job result expires and will be cleaned up

completed_at
string<date-time>

When the job completed (successfully or with failure)

status_code
integer

HTTP status code of the completed operation

result
any

The result of the completed operation (shape depends on the request type)

error
object

Error response from Bifrost