> ## Documentation Index > Fetch the complete documentation index at: https://docs.getbifrost.ai/llms.txt > Use this file to discover all available pages before exploring further. # Async Inference > Submit inference requests asynchronously and poll for results later. ## Overview Async inference uses a fire-and-forget pattern for gateway requests: submit a normal inference payload to an async endpoint, get a `job_id` immediately, and poll later for the final result. This is a gateway-only feature and is not available in the Go SDK and requires a Logs Store to be configured. ## How It Works ```mermaid theme={null} sequenceDiagram participant Client participant Gateway as Bifrost Gateway participant Worker as Async Worker participant Provider Client->>Gateway: POST /v1/async/chat/completions Gateway-->>Client: 202 Accepted + {id, status: "pending"} Gateway->>Worker: Queue async job Worker->>Provider: Execute inference request Provider-->>Worker: Response or error Client->>Gateway: GET /v1/async/chat/completions/{job_id} alt Job pending or processing Gateway-->>Client: 202 Accepted + status else Job completed or failed Gateway-->>Client: 200 OK + result/error end ``` ## Supported Endpoints Streaming is not supported on async endpoints. | Request Type | Submit (POST) | Poll (GET) | | ----------------- | -------------------------------- | ----------------------------------------- | | Text completions | `/v1/async/completions` | `/v1/async/completions/{job_id}` | | Chat completions | `/v1/async/chat/completions` | `/v1/async/chat/completions/{job_id}` | | Responses API | `/v1/async/responses` | `/v1/async/responses/{job_id}` | | Embeddings | `/v1/async/embeddings` | `/v1/async/embeddings/{job_id}` | | Speech | `/v1/async/audio/speech` | `/v1/async/audio/speech/{job_id}` | | Transcriptions | `/v1/async/audio/transcriptions` | `/v1/async/audio/transcriptions/{job_id}` | | Image generations | `/v1/async/images/generations` | `/v1/async/images/generations/{job_id}` | | Image edits | `/v1/async/images/edits` | `/v1/async/images/edits/{job_id}` | | Image variations | `/v1/async/images/variations` | `/v1/async/images/variations/{job_id}` | | OCR | `/v1/async/ocr` | `/v1/async/ocr/{job_id}` | | Rerank | `/v1/async/rerank` | `/v1/async/rerank/{job_id}` | ## Submitting a Request Use the same JSON body as the synchronous endpoint, but switch to the `/v1/async/` path. ```bash theme={null} curl -X POST http://localhost:8080/v1/async/chat/completions \ -H "Content-Type: application/json" \ -H "x-bf-vk: sk-bf-your-virtual-key" \ -H "x-bf-async-job-result-ttl: 3600" \ -d '{ "model": "openai/gpt-4o-mini", "messages": [ { "role": "user", "content": "Summarize the latest release notes in 3 bullets" } ] }' ``` **Response (`202 Accepted`)** ```json theme={null} { "id": "1e89b165-d4fe-49e8-beb2-3e157f2df02f", "status": "pending", "created_at": "2026-02-19T08:10:17.831Z" } ``` ## Polling for Results Use `GET` on the matching endpoint with the returned `job_id`. ```bash theme={null} curl -X GET http://localhost:8080/v1/async/chat/completions/1e89b165-d4fe-49e8-beb2-3e157f2df02f \ -H "x-bf-vk: sk-bf-your-virtual-key" ``` **Response codes:** * `202 Accepted`: job is still `pending` or `processing` * `200 OK`: job is `completed` or `failed` **Pending example (`202`)** ```json theme={null} { "id": "1e89b165-d4fe-49e8-beb2-3e157f2df02f", "status": "pending", "created_at": "2026-02-19T08:10:17.831Z" } ``` **Completed example (`200`)** ```json theme={null} { "id": "1e89b165-d4fe-49e8-beb2-3e157f2df02f", "status": "completed", "created_at": "2026-02-19T08:10:17.831Z", "completed_at": "2026-02-19T08:10:19.412Z", "expires_at": "2026-02-19T09:10:19.412Z", "status_code": 200, "result": { "id": "chatcmpl-123", "object": "chat.completion" } } ``` **Failed example (`200`)** ```json theme={null} { "id": "1e89b165-d4fe-49e8-beb2-3e157f2df02f", "status": "failed", "created_at": "2026-02-19T08:10:17.831Z", "completed_at": "2026-02-19T08:10:19.412Z", "expires_at": "2026-02-19T09:10:19.412Z", "status_code": 429, "error": { "error": { "message": "rate limit exceeded", "type": "rate_limit_error" } } } ``` ## Job Lifecycle | Status | Meaning | Transition Trigger | | ------------ | ---------------------------------------- | ------------------------------------- | | `pending` | Job record is created and queued | Immediate status on submit | | `processing` | Background worker has picked up the job | Worker starts execution | | `completed` | Operation succeeded and result is stored | Provider call completes successfully | | `failed` | Operation failed and error is stored | Provider call returns a Bifrost error | ## Result TTL and Expiration * Default TTL is **3600 seconds (1 hour)**. * TTL starts from **completion time**, not submission time. * Server default is configured in `client.async_job_result_ttl`. * Per-request override uses `x-bf-async-job-result-ttl`. * If the header is invalid or `<= 0`, Bifrost falls back to the default TTL. * Expired jobs return `404 Job not found or expired`. * Expired async jobs are cleaned up every minute. ## Virtual Key Authorization * If a job is created with a virtual key, the job stores that virtual key identity. * Polling must use the same virtual key value. * Missing or mismatched virtual keys fail lookup and return `404 Job not found or expired`. * Jobs created without a virtual key are not virtual-key scoped, so they can be polled by any caller that passes your gateway auth/middleware checks. ## Observability * Async executions are logged like synchronous requests. * The logging metadata includes `isAsyncRequest: true`, which appears as an **Async** badge in the Logs UI. * Background execution still uses Bifrost request APIs, so LLM plugin hooks (governance, logging, cost tracking, etc.) are executed for the actual inference run. ## Limitations * Gateway-only feature (not available in Go SDK). * Streaming is not supported on async endpoints. * Requires Logs Store to register async routes. * Jobs stuck in `processing` are not auto-expired by TTL cleanup. Cleanup only deletes jobs with `expires_at` set (completed/failed).