Skip to main content

Overview

Perplexity is an OpenAI-compatible API with built-in web search capabilities and reasoning support. Bifrost performs conversions including:
  • OpenAI-compatible base - Uses OpenAI’s chat format as foundation
  • Web search parameters - Search mode, domain filters, recency filters, and location-based search
  • Reasoning effort mapping - reasoning.effort mapped to Perplexity’s reasoning_effort with special handling for “minimal”
  • Search results inclusion - Citations, search results, and videos included in response
  • Special usage tracking - Citation tokens, search queries, and reasoning tokens tracked separately

Supported Operations

OperationNon-StreamingStreamingEndpoint
Chat Completions/chat/completions
Responses API/chat/completions
Text Completions-
Embeddings-
Speech (TTS)-
Transcriptions (STT)-
Files-
Batch-
List Models-
Unsupported Operations (❌): Text Completions, Embeddings, Speech, Transcriptions, Files, Batch, and List Models are not supported by the upstream Perplexity API. These return UnsupportedOperationError.

1. Chat Completions

Request Parameters

Perplexity supports most OpenAI chat completion parameters. For standard parameter reference, see OpenAI Chat Completions.

Perplexity-Specific Constraints

  • No function calling: tools and tool_choice are silently dropped
  • Dropped parameters: stop, logit_bias, logprobs, top_logprobs, seed, parallel_tool_calls, service_tier
  • Reasoning: Uses reasoning_effort instead of reasoning object (see Reasoning & Effort)

Perplexity-Specific Parameters

Use extra_params (SDK) or pass directly in request body (Gateway) for Perplexity-specific search and configuration fields:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sonar",
    "messages": [{"role": "user", "content": "What is the latest news?"}],
    "search_mode": "web",
    "language_preference": "en",
    "return_images": true,
    "return_related_questions": true,
    "disable_search": false,
    "search_domain_filter": ["news.example.com"],
    "search_recency_filter": "week"
  }'

Search Parameters

ParameterTypeDescription
search_modestringSearch mode: "web", "academic", "news", etc.
language_preferencestringLanguage preference (e.g., "en", "fr")
search_domain_filterstring[]Restrict search to specific domains
return_imagesbooleanInclude images in search results
return_related_questionsbooleanReturn related questions
search_recency_filterstringRecency filter: "hour", "day", "week", "month", "year"
search_after_date_filterstringSearch results after date (ISO format)
search_before_date_filterstringSearch results before date (ISO format)
last_updated_after_filterstringContent last updated after date
last_updated_before_filterstringContent last updated before date
disable_searchbooleanDisable web search entirely
enable_search_classifierbooleanEnable search classifier
top_kintegerTop-k results to use

Media Parameters

ParameterTypeDescription
web_search_optionsobject[]Array of web search option configurations with user location support
media_response.overrides.return_videosbooleanReturn videos in results
media_response.overrides.return_imagesbooleanReturn images in results

Web Search Options

Configure detailed search behavior including location:
{
  "web_search_options": [
    {
      "search_context_size": "high",
      "user_location": {
        "latitude": 40.7128,
        "longitude": -74.0060,
        "city": "New York",
        "country": "US",
        "region": "NY"
      },
      "image_search_relevance_enhanced": true
    }
  ]
}

Reasoning & Effort

Parameter Mapping

  • reasoning.effortreasoning_effort
  • Supported efforts: "low", "medium", "high"
  • Special conversion: "minimal""low" (Perplexity normalizes to low/medium/high)
  • reasoning.max_tokens is silently dropped (Perplexity doesn’t support token budget control)

Example

// Request
{"reasoning": {"effort": "high"}}

// Perplexity conversion
{"reasoning_effort": "high"}

// Special case: "minimal" effort
{"reasoning": {"effort": "minimal"}}
→ {"reasoning_effort": "low"}

Response Conversion

Search Results Inclusion

Perplexity responses include additional fields for search integration:
  • citations[] - Source citations from search
  • search_results[] - Full search results with metadata
  • videos[] - Video results from search
These fields are preserved in the Bifrost response for client use.

Usage Details

Extended usage tracking specific to Perplexity:
FieldSourceDescription
completion_tokens_details.citation_tokensusage.citation_tokensTokens used for citations
completion_tokens_details.num_search_queriesusage.num_search_queriesNumber of web search queries performed
completion_tokens_details.reasoning_tokensusage.reasoning_tokensTokens consumed by reasoning process
usage.costusage.costCost of the request

Example Response

{
  "id": "...",
  "choices": [...],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 150,
    "total_tokens": 250,
    "completion_tokens_details": {
      "citation_tokens": 25,
      "num_search_queries": 3,
      "reasoning_tokens": 40
    },
    "cost": { "prompt_cost": 0.001, "completion_cost": 0.002 }
  },
  "citations": ["https://example.com/article1", "https://example.com/article2"],
  "search_results": [
    {
      "title": "...",
      "url": "...",
      "snippet": "...",
      "date": "2025-01-15"
    }
  ],
  "videos": [
    {
      "title": "...",
      "url": "...",
      "duration": 300
    }
  ]
}

Streaming

Perplexity uses OpenAI-compatible streaming format. Event sequence:
  • chat.completion.chunk events with delta updates
  • Standard OpenAI finish reason mapping
Streaming with web search may return search results in final chunks.

Caveats

Severity: High Behavior: Tool-related parameters are silently dropped Impact: Function calling not available Code: chat.go:8-36
Severity: Medium Behavior: "minimal" effort is mapped to "low" (Perplexity only supports low/medium/high) Impact: Requested minimal effort becomes low effort Code: chat.go:30-36, responses.go:25-30
Severity: Low Behavior: reasoning.max_tokens is silently dropped Impact: No control over reasoning token budget Code: chat.go:29-36
Severity: Low Behavior: stop parameter is silently dropped Impact: Stop sequences not enforced Code: chat.go:8-36

2. Responses API

The Responses API is adapted for Perplexity by converting to the Chat Completions format internally and returning results in Responses format.

Request Parameters

Parameter Mapping

ParameterTransformation
max_output_tokensDirect pass-through to max_tokens
temperature, top_pDirect pass-through
instructionsConverted to system message (prepended)
reasoning.effortMapped to reasoning_effort (see Reasoning & Effort)
text.formatPassed through as response_format
input (string/array)Converted to messages

Extra Parameters

Same Perplexity-specific search and configuration parameters as Chat Completions (see Perplexity-Specific Parameters).
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sonar",
    "instructions": "You are a helpful assistant with web search capabilities",
    "input": "What is the latest news in technology?",
    "search_mode": "news",
    "return_images": true
  }'

Conversion Details

  • instructions becomes a system message prepended to input messages
  • input (string or array) converted to user message(s)
  • Response converted to Responses API format with same search results and extended usage details

Response Format

Same as Chat Completions with search results, citations, and extended usage tracking preserved.

Streaming

Responses streaming uses the same OpenAI-compatible streaming as Chat Completions, with results adapted to Responses format.