Skip to main content

Overview

SGL (SGLang) is an OpenAI-compatible local/remote inference engine used for serving models with high throughput. Bifrost delegates all operations to the OpenAI provider implementation. Key features:
  • OpenAI API compatibility - Identical request/response format
  • Full streaming support - Server-Sent Events with usage tracking
  • Tool calling - Complete function definition and execution
  • Text embeddings - Support for embedding models
  • Parameter filtering - Removes unsupported fields for compatibility

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/v1/chat/completions
Responses API/v1/chat/completions
Text Completions/v1/completions
Embeddings-/v1/embeddings
List Models-/v1/models
Speech (TTS)-
Transcriptions (STT)-
Files-
Batch-
Unsupported Operations (❌): Speech, Transcriptions, Files, and Batch are not supported by the upstream SGL API. These return UnsupportedOperationError.SGL is typically self-hosted. Ensure BaseURL is configured correctly pointing to your SGL instance (e.g., http://localhost:8000).

1. Chat Completions

Request Parameters

SGL supports all standard OpenAI chat completion parameters. For full parameter reference and behavior, see OpenAI Chat Completions.

Filtered Parameters

Removed for SGL compatibility:
  • prompt_cache_key - Not supported
  • verbosity - Anthropic-specific
  • store - Not supported
  • service_tier - OpenAI-specific
SGL supports all standard OpenAI message types, tools, responses, and streaming formats. For details on message handling, tool conversion, responses, and streaming, refer to OpenAI Chat Completions.

2. Responses API

Fallback to Chat Completions with format conversion:
ResponsesRequest → ChatRequest → Response conversion
Same parameter support as Chat Completions.

3. Text Completions

SGL supports legacy text completion format:
ParameterMapping
promptDirect pass-through
max_tokensmax_tokens
temperature, top_pDirect pass-through
frequency_penalty, presence_penaltySupported

4. Embeddings

SGL supports text embeddings for vector generation:
ParameterNotes
inputText or array of texts
modelEmbedding model name
encoding_format”float” or “base64”
dimensionsModel-specific dimension count
Response returns embedding vectors with usage information.

5. List Models

Lists available models from SGL server with capabilities.

Unsupported Features

FeatureReason
Speech/TTSNot offered by SGL API
Transcription/STTNot offered by SGL API
Batch OperationsNot offered by SGL API
File ManagementNot offered by SGL API

SGL requires BaseURL configuration pointing to your SGL instance (e.g., http://localhost:8000 for local, https://sgl.example.com for remote).

Caveats

Severity: High Behavior: BaseURL must be explicitly configured Impact: Requests fail without proper configuration Code: Validated in NewSGLProvider
Severity: Medium Behavior: Cache control directives are removed from messages Impact: Prompt caching features don’t work Code: Stripped during JSON marshaling
Severity: Low Behavior: OpenAI-specific fields filtered out Impact: prompt_cache_key, verbosity, store removed Code: filterOpenAISpecificParameters
Severity: Low Behavior: User field > 64 characters silently dropped Impact: Longer user identifiers are lost Code: SanitizeUserField enforces 64-char max