OpenAI Compatible Responses

curl --request POST \
  --url http://localhost:8080/openai/deployments/{deployment-id}/responses \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "input": "<string>",
  "fallbacks": [
    "<string>"
  ],
  "stream": true,
  "background": true,
  "conversation": "<string>",
  "include": [
    "<string>"
  ],
  "instructions": "<string>",
  "max_output_tokens": 123,
  "max_tool_calls": 123,
  "metadata": {},
  "parallel_tool_calls": true,
  "previous_response_id": "<string>",
  "prompt_cache_key": "<string>",
  "reasoning": {
    "effort": "<string>"
  },
  "safety_identifier": "<string>",
  "service_tier": "<string>",
  "stream_options": {
    "include_usage": true
  },
  "store": true,
  "temperature": 123,
  "text": {
    "max_tokens": 123
  },
  "top_logprobs": 123,
  "top_p": 123,
  "tool_choice": "auto",
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "<string>",
        "arguments": {}
      }
    }
  ],
  "truncation": "<string>"
}
'

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "user",
        "content": "Hello, how are you?",
        "tool_call_id": "<string>",
        "tool_calls": [
          {
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"San Francisco, CA\"}"
            },
            "id": "tool_123",
            "type": "function"
          }
        ],
        "refusal": "<string>",
        "annotations": [
          {
            "type": "<string>",
            "url_citation": {
              "start_index": 123,
              "end_index": 123,
              "title": "<string>",
              "url": "<string>",
              "sources": "<unknown>",
              "type": "<string>"
            }
          }
        ],
        "thought": "<string>"
      },
      "finish_reason": "stop",
      "stop": "<string>",
      "log_probs": {
        "content": [
          {
            "logprob": -0.123,
            "token": "hello",
            "bytes": [
              123
            ],
            "top_logprobs": [
              {
                "logprob": -0.456,
                "token": "world",
                "bytes": [
                  123
                ]
              }
            ]
          }
        ],
        "refusal": [
          {
            "logprob": -0.456,
            "token": "world",
            "bytes": [
              123
            ]
          }
        ]
      }
    }
  ],
  "data": [
    {
      "index": 123,
      "object": "<string>",
      "embedding": [
        123
      ]
    }
  ],
  "speech": {
    "usage": {
      "characters": 123
    },
    "audio": "aSDinaTvuI8gbWludGxpZnk="
  },
  "transcribe": {
    "text": "<string>",
    "logprobs": [
      {
        "token": "<string>",
        "log_prob": 123
      }
    ],
    "usage": {
      "prompt_tokens": 123,
      "completion_tokens": 123,
      "total_tokens": 123
    }
  },
  "messages": [
    {
      "role": "user",
      "content": "<string>"
    }
  ],
  "conversation_id": "<string>",
  "finish_reason": "<string>",
  "stop_reason": "<string>",
  "stop_sequence": "<string>",
  "prompt_cache": {
    "status": "<string>"
  },
  "model": "gpt-4o",
  "created": 1677652288,
  "service_tier": "<string>",
  "system_fingerprint": "<string>",
  "usage": {
    "prompt_tokens": 56,
    "completion_tokens": 31,
    "total_tokens": 87,
    "completion_tokens_details": {
      "reasoning_tokens": 123,
      "audio_tokens": 123,
      "accepted_prediction_tokens": 123,
      "rejected_prediction_tokens": 123
    }
  },
  "extra_fields": {
    "provider": "openai",
    "request_type": "list_models",
    "model_requested": "<string>",
    "model_params": {
      "temperature": 0.7,
      "top_p": 0.9,
      "top_k": 40,
      "max_tokens": 1000,
      "stop_sequences": [
        "\n\n",
        "END"
      ],
      "presence_penalty": 0,
      "frequency_penalty": 0,
      "tools": [
        {
          "type": "function",
          "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
              "type": "object",
              "description": "<string>",
              "properties": {},
              "required": [
                "<string>"
              ],
              "enum": [
                "<string>"
              ]
            }
          },
          "id": "<string>"
        }
      ],
      "tool_choice": {
        "type": "auto",
        "function": {
          "name": "get_weather"
        }
      },
      "parallel_tool_calls": true
    },
    "latency": 1234,
    "billed_usage": {
      "prompt_tokens": 123,
      "completion_tokens": 123,
      "search_units": 123,
      "classifications": 123
    },
    "raw_response": {}
  }
}

POST

openai

deployments

{deployment-id}

responses

OpenAI Compatible Responses

curl --request POST \
  --url http://localhost:8080/openai/deployments/{deployment-id}/responses \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "input": "<string>",
  "fallbacks": [
    "<string>"
  ],
  "stream": true,
  "background": true,
  "conversation": "<string>",
  "include": [
    "<string>"
  ],
  "instructions": "<string>",
  "max_output_tokens": 123,
  "max_tool_calls": 123,
  "metadata": {},
  "parallel_tool_calls": true,
  "previous_response_id": "<string>",
  "prompt_cache_key": "<string>",
  "reasoning": {
    "effort": "<string>"
  },
  "safety_identifier": "<string>",
  "service_tier": "<string>",
  "stream_options": {
    "include_usage": true
  },
  "store": true,
  "temperature": 123,
  "text": {
    "max_tokens": 123
  },
  "top_logprobs": 123,
  "top_p": 123,
  "tool_choice": "auto",
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "<string>",
        "arguments": {}
      }
    }
  ],
  "truncation": "<string>"
}
'

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "user",
        "content": "Hello, how are you?",
        "tool_call_id": "<string>",
        "tool_calls": [
          {
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"San Francisco, CA\"}"
            },
            "id": "tool_123",
            "type": "function"
          }
        ],
        "refusal": "<string>",
        "annotations": [
          {
            "type": "<string>",
            "url_citation": {
              "start_index": 123,
              "end_index": 123,
              "title": "<string>",
              "url": "<string>",
              "sources": "<unknown>",
              "type": "<string>"
            }
          }
        ],
        "thought": "<string>"
      },
      "finish_reason": "stop",
      "stop": "<string>",
      "log_probs": {
        "content": [
          {
            "logprob": -0.123,
            "token": "hello",
            "bytes": [
              123
            ],
            "top_logprobs": [
              {
                "logprob": -0.456,
                "token": "world",
                "bytes": [
                  123
                ]
              }
            ]
          }
        ],
        "refusal": [
          {
            "logprob": -0.456,
            "token": "world",
            "bytes": [
              123
            ]
          }
        ]
      }
    }
  ],
  "data": [
    {
      "index": 123,
      "object": "<string>",
      "embedding": [
        123
      ]
    }
  ],
  "speech": {
    "usage": {
      "characters": 123
    },
    "audio": "aSDinaTvuI8gbWludGxpZnk="
  },
  "transcribe": {
    "text": "<string>",
    "logprobs": [
      {
        "token": "<string>",
        "log_prob": 123
      }
    ],
    "usage": {
      "prompt_tokens": 123,
      "completion_tokens": 123,
      "total_tokens": 123
    }
  },
  "messages": [
    {
      "role": "user",
      "content": "<string>"
    }
  ],
  "conversation_id": "<string>",
  "finish_reason": "<string>",
  "stop_reason": "<string>",
  "stop_sequence": "<string>",
  "prompt_cache": {
    "status": "<string>"
  },
  "model": "gpt-4o",
  "created": 1677652288,
  "service_tier": "<string>",
  "system_fingerprint": "<string>",
  "usage": {
    "prompt_tokens": 56,
    "completion_tokens": 31,
    "total_tokens": 87,
    "completion_tokens_details": {
      "reasoning_tokens": 123,
      "audio_tokens": 123,
      "accepted_prediction_tokens": 123,
      "rejected_prediction_tokens": 123
    }
  },
  "extra_fields": {
    "provider": "openai",
    "request_type": "list_models",
    "model_requested": "<string>",
    "model_params": {
      "temperature": 0.7,
      "top_p": 0.9,
      "top_k": 40,
      "max_tokens": 1000,
      "stop_sequences": [
        "\n\n",
        "END"
      ],
      "presence_penalty": 0,
      "frequency_penalty": 0,
      "tools": [
        {
          "type": "function",
          "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
              "type": "object",
              "description": "<string>",
              "properties": {},
              "required": [
                "<string>"
              ],
              "enum": [
                "<string>"
              ]
            }
          },
          "id": "<string>"
        }
      ],
      "tool_choice": {
        "type": "auto",
        "function": {
          "name": "get_weather"
        }
      },
      "parallel_tool_calls": true
    },
    "latency": 1234,
    "billed_usage": {
      "prompt_tokens": 123,
      "completion_tokens": 123,
      "search_units": 123,
      "classifications": 123
    },
    "raw_response": {}
  }
}

Path Parameters

deployment-id

string

required

Azure deployment ID

Body

application/json

model

string

required

Model identifier in 'provider/model' format

input

required

Simple text input for the response

fallbacks

string[]

stream

boolean

background

boolean

conversation

string

include

string[]

instructions

string

max_output_tokens

integer

max_tool_calls

integer

metadata

object

parallel_tool_calls

boolean

previous_response_id

string

prompt_cache_key

string

reasoning

object

Show child attributes

reasoning.effort

string

safety_identifier

string

service_tier

string

stream_options

object

Show child attributes

stream_options.include_usage

boolean

store

boolean

temperature

number

text

object

Show child attributes

text.max_tokens

integer

top_logprobs

integer

top_p

number

tool_choice

Available options:

auto,

any,

none,

required,

tool

tools

object[]

Show child attributes

tools.type

enum<string>

Available options:

function,

code_interpreter,

file_search,

web_search

tools.function

object

Show child attributes

tools.function.name

string

tools.function.arguments

object

truncation

string

Response

OpenAI-compatible responses response

string

Unique response identifier

Example:

"chatcmpl-123"

object

enum<string>

Response type

Available options:

text.completion,

chat.completion,

embedding,

speech,

transcribe,

responses.completion

Example:

"chat.completion"

choices

object[]

Array of completion choices for chat and text completions. Not present for responses type.

Show child attributes

choices.index

integer

required

Choice index

Example:

0

choices.message

object

required

Show child attributes

choices.message.role

enum<string>

required

Role of the message sender

Available options:

user,

assistant,

system,

tool

Example:

"user"

choices.message.content

Message content - can be simple text or structured content with text and images

Example:

"Hello, how are you?"

choices.message.tool_call_id

string

ID of the tool call (for tool messages)

choices.message.tool_calls

object[]

Tool calls made by assistant

Show child attributes

choices.message.tool_calls.function

object

required

Show child attributes

choices.message.tool_calls.function.name

string

required

Function name

Example:

"get_weather"

choices.message.tool_calls.function.arguments

string

required

JSON string of function arguments

Example:

"{\"location\": \"San Francisco, CA\"}"

choices.message.tool_calls.id

string

Unique tool call identifier

Example:

"tool_123"

choices.message.tool_calls.type

enum<string>

Tool call type

Available options:

function

Example:

"function"

choices.message.refusal

string

Refusal message from assistant

choices.message.annotations

object[]

Message annotations

Show child attributes

choices.message.annotations.type

string

required

Annotation type

choices.message.annotations.url_citation

object

required

Show child attributes

choices.message.annotations.url_citation.start_index

integer

required

Start index in the text

choices.message.annotations.url_citation.end_index

integer

required

End index in the text

choices.message.annotations.url_citation.title

string

required

Citation title

choices.message.annotations.url_citation.url

string

Citation URL

choices.message.annotations.url_citation.sources

any

Citation sources

choices.message.annotations.url_citation.type

string

Citation type

choices.message.thought

string

Assistant's internal thought process

choices.finish_reason

enum<string>

Reason completion stopped

Available options:

stop,

length,

tool_calls,

content_filter,

function_call

Example:

"stop"

choices.stop

string

Stop sequence that ended generation

choices.log_probs

object

Show child attributes

choices.log_probs.content

object[]

Log probabilities for content

Show child attributes

choices.log_probs.content.logprob

number

required

Log probability

Example:

-0.123

choices.log_probs.content.token

string

required

Token

Example:

"hello"

choices.log_probs.content.bytes

integer[]

Byte representation

choices.log_probs.content.top_logprobs

object[]

Top log probabilities

Show child attributes

choices.log_probs.content.top_logprobs.logprob

number

required

Log probability

Example:

-0.456

choices.log_probs.content.top_logprobs.token

string

required

Token

Example:

"world"

choices.log_probs.content.top_logprobs.bytes

integer[]

Byte representation

choices.log_probs.refusal

object[]

Log probabilities for refusal

Show child attributes

choices.log_probs.refusal.logprob

number

required

Log probability

Example:

-0.456

choices.log_probs.refusal.token

string

required

Token

Example:

"world"

choices.log_probs.refusal.bytes

integer[]

Byte representation

data

object[]

Array of embedding objects

Show child attributes

data.index

integer

data.object

string

data.embedding

speech

object

Show child attributes

speech.usage

object

Show child attributes

speech.usage.characters

integer

speech.audio

string<byte>

The audio data as a base64-encoded string.

transcribe

object

Show child attributes

transcribe.text

string

transcribe.logprobs

object[]

Show child attributes

transcribe.logprobs.token

string

transcribe.logprobs.log_prob

number

transcribe.usage

object

Show child attributes

transcribe.usage.prompt_tokens

integer

transcribe.usage.completion_tokens

integer

transcribe.usage.total_tokens

integer

messages

object[]

Array of messages for responses type.

Show child attributes

messages.role

enum<string>

required

Role of the message sender

Available options:

user,

assistant,

system,

tool

Example:

"user"

messages.content

required

conversation_id

string

The conversation ID.

finish_reason

string

The reason the model stopped generating tokens.

stop_reason

string

The reason the model stopped generating tokens.

stop_sequence

string

The stop sequence that was generated.

prompt_cache

object

Show child attributes

prompt_cache.status

string

model

string

Model used for generation

Example:

"gpt-4o"

created

integer

Unix timestamp of creation

Example:

1677652288

service_tier

string

Service tier used

system_fingerprint

string

System fingerprint

usage

object

Show child attributes

usage.prompt_tokens

integer

Tokens in the prompt

Example:

56

usage.completion_tokens

integer

Tokens in the completion

Example:

31

usage.total_tokens

integer

Total tokens used

Example:

87

usage.completion_tokens_details

object

Show child attributes

usage.completion_tokens_details.reasoning_tokens

integer

Tokens used for reasoning

usage.completion_tokens_details.audio_tokens

integer

Tokens used for audio

usage.completion_tokens_details.accepted_prediction_tokens

integer

Accepted prediction tokens

usage.completion_tokens_details.rejected_prediction_tokens

integer

Rejected prediction tokens

extra_fields

object

Show child attributes

extra_fields.provider

enum<string>

AI model provider

Available options:

openai,

anthropic,

azure,

bedrock,

cohere,

vertex,

mistral,

ollama,

gemini,

groq,

openrouter,

sgl,

parasail,

elevenlabs,

perplexity,

cerebras

Example:

"openai"

extra_fields.request_type

enum<string>

Request type

Available options:

list_models,

text_completion,

chat_completion,

chat_completion_stream,

responses,

responses_stream,

embedding,

speech,

speech_stream,

transcription,

transcription_stream

extra_fields.model_requested

string

Model requested

extra_fields.model_params

object

Show child attributes

extra_fields.model_params.temperature

number

Controls randomness in the output

Required range: 0 <= x <= 2

Example:

0.7

extra_fields.model_params.top_p

number

Nucleus sampling parameter

Required range: 0 <= x <= 1

Example:

0.9

extra_fields.model_params.top_k

integer

Top-k sampling parameter

Required range: x >= 1

Example:

40

extra_fields.model_params.max_tokens

integer

Maximum number of tokens to generate

Required range: x >= 1

Example:

1000

extra_fields.model_params.stop_sequences

string[]

Sequences that stop generation

Example:

["\n\n", "END"]

extra_fields.model_params.presence_penalty

number

Penalizes repeated tokens

Required range: -2 <= x <= 2

Example:

0

extra_fields.model_params.frequency_penalty

number

Penalizes frequent tokens

Required range: -2 <= x <= 2

Example:

0

extra_fields.model_params.tools

object[]

Available tools for the model

Show child attributes

extra_fields.model_params.tools.type

enum<string>

required

Tool type

Available options:

function

Example:

"function"

extra_fields.model_params.tools.function

object

required

Show child attributes

extra_fields.model_params.tools.function.name

string

required

Function name

Example:

"get_weather"

extra_fields.model_params.tools.function.description

string

required

Function description

Example:

"Get current weather for a location"

extra_fields.model_params.tools.function.parameters

object

required

Show child attributes

extra_fields.model_params.tools.function.parameters.type

string

required

Parameter type

Example:

"object"

extra_fields.model_params.tools.function.parameters.description

string

Parameter description

extra_fields.model_params.tools.function.parameters.properties

object

Parameter properties (JSON Schema)

extra_fields.model_params.tools.function.parameters.required

string[]

Required parameter names

extra_fields.model_params.tools.function.parameters.enum

string[]

Enum values for parameters

extra_fields.model_params.tools.id

string

Unique tool identifier

extra_fields.model_params.tool_choice

object

Show child attributes

extra_fields.model_params.tool_choice.type

enum<string>

required

How tools should be chosen

Available options:

none,

auto,

any,

function,

required

Example:

"auto"

extra_fields.model_params.tool_choice.function

object

Show child attributes

extra_fields.model_params.tool_choice.function.name

string

required

Name of the function to call

Example:

"get_weather"

extra_fields.model_params.parallel_tool_calls

boolean

Enable parallel tool execution

Example:

true

extra_fields.latency

number

Request latency in milliseconds

Example:

1234

extra_fields.billed_usage

object

Show child attributes

extra_fields.billed_usage.prompt_tokens

number

Billed prompt tokens

extra_fields.billed_usage.completion_tokens

number

Billed completion tokens

extra_fields.billed_usage.search_units

number

Billed search units

extra_fields.billed_usage.classifications

number

Billed classifications

extra_fields.raw_response

object

Raw provider response

OpenAI Compatible Responses (alternative path)OpenAI Compatible Embeddings

⌘I

API Reference

Path Parameters

Body

Response