Azure OpenAI Compatible Text Completions

curl --request POST \
  --url http://localhost:8080/openai/deployments/{deployment-id}/completions \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "anthropic/claude-2.1",
  "prompt": "The benefits of artificial intelligence include",
  "stream": false,
  "best_of": 123,
  "echo": false,
  "frequency_penalty": 0,
  "logit_bias": {},
  "logprobs": 123,
  "max_tokens": 1000,
  "n": 1,
  "presence_penalty": 0,
  "seed": 123,
  "stop": "<string>",
  "stream_options": {
    "include_usage": true
  },
  "suffix": "<string>",
  "temperature": 1,
  "top_p": 0.5,
  "user": "<string>",
  "fallbacks": [
    "anthropic/claude-3-sonnet-20240229",
    "openai/gpt-4o"
  ]
}'

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "user",
        "content": "Hello, how are you?",
        "tool_call_id": "<string>",
        "tool_calls": [
          {
            "id": "tool_123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"San Francisco, CA\"}"
            }
          }
        ],
        "refusal": "<string>",
        "annotations": [
          {
            "type": "<string>",
            "url_citation": {
              "start_index": 123,
              "end_index": 123,
              "title": "<string>",
              "url": "<string>",
              "sources": "<any>",
              "type": "<string>"
            }
          }
        ],
        "thought": "<string>"
      },
      "finish_reason": "stop",
      "stop": "<string>",
      "log_probs": {
        "content": [
          {
            "bytes": [
              123
            ],
            "logprob": -0.123,
            "token": "hello",
            "top_logprobs": [
              {
                "bytes": [
                  123
                ],
                "logprob": -0.456,
                "token": "world"
              }
            ]
          }
        ],
        "refusal": [
          {
            "bytes": [
              123
            ],
            "logprob": -0.456,
            "token": "world"
          }
        ]
      }
    }
  ],
  "data": [
    {
      "index": 123,
      "object": "<string>",
      "embedding": [
        123
      ]
    }
  ],
  "speech": {
    "usage": {
      "characters": 123
    },
    "audio": "aSDinaTvuI8gbWludGxpZnk="
  },
  "transcribe": {
    "text": "<string>",
    "logprobs": [
      {
        "token": "<string>",
        "log_prob": 123
      }
    ],
    "usage": {
      "prompt_tokens": 123,
      "completion_tokens": 123,
      "total_tokens": 123
    }
  },
  "messages": [
    {
      "role": "user",
      "content": "<string>"
    }
  ],
  "conversation_id": "<string>",
  "finish_reason": "<string>",
  "stop_reason": "<string>",
  "stop_sequence": "<string>",
  "prompt_cache": {
    "status": "<string>"
  },
  "model": "gpt-4o",
  "created": 1677652288,
  "service_tier": "<string>",
  "system_fingerprint": "<string>",
  "usage": {
    "prompt_tokens": 56,
    "completion_tokens": 31,
    "total_tokens": 87,
    "completion_tokens_details": {
      "reasoning_tokens": 123,
      "audio_tokens": 123,
      "accepted_prediction_tokens": 123,
      "rejected_prediction_tokens": 123
    }
  },
  "extra_fields": {
    "provider": "openai",
    "request_type": "list_models",
    "model_requested": "<string>",
    "model_params": {
      "temperature": 0.7,
      "top_p": 0.9,
      "top_k": 40,
      "max_tokens": 1000,
      "stop_sequences": [
        "\n\n",
        "END"
      ],
      "presence_penalty": 0,
      "frequency_penalty": 0,
      "tools": [
        {
          "id": "<string>",
          "type": "function",
          "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
              "type": "object",
              "description": "<string>",
              "properties": {},
              "required": [
                "<string>"
              ],
              "enum": [
                "<string>"
              ]
            }
          }
        }
      ],
      "tool_choice": {
        "type": "auto",
        "function": {
          "name": "get_weather"
        }
      },
      "parallel_tool_calls": true
    },
    "latency": 1234,
    "billed_usage": {
      "prompt_tokens": 123,
      "completion_tokens": 123,
      "search_units": 123,
      "classifications": 123
    },
    "raw_response": {}
  }
}

POST

openai

deployments

{deployment-id}

completions

Azure OpenAI Compatible Text Completions

curl --request POST \
  --url http://localhost:8080/openai/deployments/{deployment-id}/completions \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "anthropic/claude-2.1",
  "prompt": "The benefits of artificial intelligence include",
  "stream": false,
  "best_of": 123,
  "echo": false,
  "frequency_penalty": 0,
  "logit_bias": {},
  "logprobs": 123,
  "max_tokens": 1000,
  "n": 1,
  "presence_penalty": 0,
  "seed": 123,
  "stop": "<string>",
  "stream_options": {
    "include_usage": true
  },
  "suffix": "<string>",
  "temperature": 1,
  "top_p": 0.5,
  "user": "<string>",
  "fallbacks": [
    "anthropic/claude-3-sonnet-20240229",
    "openai/gpt-4o"
  ]
}'

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "user",
        "content": "Hello, how are you?",
        "tool_call_id": "<string>",
        "tool_calls": [
          {
            "id": "tool_123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"San Francisco, CA\"}"
            }
          }
        ],
        "refusal": "<string>",
        "annotations": [
          {
            "type": "<string>",
            "url_citation": {
              "start_index": 123,
              "end_index": 123,
              "title": "<string>",
              "url": "<string>",
              "sources": "<any>",
              "type": "<string>"
            }
          }
        ],
        "thought": "<string>"
      },
      "finish_reason": "stop",
      "stop": "<string>",
      "log_probs": {
        "content": [
          {
            "bytes": [
              123
            ],
            "logprob": -0.123,
            "token": "hello",
            "top_logprobs": [
              {
                "bytes": [
                  123
                ],
                "logprob": -0.456,
                "token": "world"
              }
            ]
          }
        ],
        "refusal": [
          {
            "bytes": [
              123
            ],
            "logprob": -0.456,
            "token": "world"
          }
        ]
      }
    }
  ],
  "data": [
    {
      "index": 123,
      "object": "<string>",
      "embedding": [
        123
      ]
    }
  ],
  "speech": {
    "usage": {
      "characters": 123
    },
    "audio": "aSDinaTvuI8gbWludGxpZnk="
  },
  "transcribe": {
    "text": "<string>",
    "logprobs": [
      {
        "token": "<string>",
        "log_prob": 123
      }
    ],
    "usage": {
      "prompt_tokens": 123,
      "completion_tokens": 123,
      "total_tokens": 123
    }
  },
  "messages": [
    {
      "role": "user",
      "content": "<string>"
    }
  ],
  "conversation_id": "<string>",
  "finish_reason": "<string>",
  "stop_reason": "<string>",
  "stop_sequence": "<string>",
  "prompt_cache": {
    "status": "<string>"
  },
  "model": "gpt-4o",
  "created": 1677652288,
  "service_tier": "<string>",
  "system_fingerprint": "<string>",
  "usage": {
    "prompt_tokens": 56,
    "completion_tokens": 31,
    "total_tokens": 87,
    "completion_tokens_details": {
      "reasoning_tokens": 123,
      "audio_tokens": 123,
      "accepted_prediction_tokens": 123,
      "rejected_prediction_tokens": 123
    }
  },
  "extra_fields": {
    "provider": "openai",
    "request_type": "list_models",
    "model_requested": "<string>",
    "model_params": {
      "temperature": 0.7,
      "top_p": 0.9,
      "top_k": 40,
      "max_tokens": 1000,
      "stop_sequences": [
        "\n\n",
        "END"
      ],
      "presence_penalty": 0,
      "frequency_penalty": 0,
      "tools": [
        {
          "id": "<string>",
          "type": "function",
          "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
              "type": "object",
              "description": "<string>",
              "properties": {},
              "required": [
                "<string>"
              ],
              "enum": [
                "<string>"
              ]
            }
          }
        }
      ],
      "tool_choice": {
        "type": "auto",
        "function": {
          "name": "get_weather"
        }
      },
      "parallel_tool_calls": true
    },
    "latency": 1234,
    "billed_usage": {
      "prompt_tokens": 123,
      "completion_tokens": 123,
      "search_units": 123,
      "classifications": 123
    },
    "raw_response": {}
  }
}

Path Parameters

deployment-id

string

required

Azure deployment ID

Body

application/json

model

string

required

Model identifier in 'provider/model' format (e.g., 'anthropic/claude-2.1')

Example:

"anthropic/claude-2.1"

prompt

Text prompt for completion

Example:

"The benefits of artificial intelligence include"

stream

boolean

default:false

If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available.

best_of

integer

Generates best_of completions server-side and returns the 'best' one. See n for comparison.

echo

boolean

default:false

Echo back the prompt in addition to the completion.

frequency_penalty

number

Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

Required range: -2 <= x <= 2

logit_bias

object

Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens to an associated bias value from -100 to 100.

Show child attributes

logprobs

integer

Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens.

max_tokens

integer

Maximum number of tokens to generate

Required range: x >= 1

Example:

1000

integer

default:1

How many completions to generate for each prompt.

presence_penalty

number

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

Required range: -2 <= x <= 2

seed

integer

This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.

stop

Up to 4 sequences where the API will stop generating further tokens.

stream_options

object

Show child attributes

suffix

string

The suffix that comes after a completion of inserted text.

temperature

number

Controls randomness in the output. Higher values make the output more random, while lower values make it more deterministic.

Required range: 0 <= x <= 2

top_p

number

Controls diversity via nucleus sampling. 0.5 means half of all likelihood-weighted options are considered.

Required range: 0 <= x <= 1

user

string

A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.

fallbacks

string[]

Fallback model names in 'provider/model' format

Example:

[
  "anthropic/claude-3-sonnet-20240229",
  "openai/gpt-4o"
]

Response

Azure OpenAI-compatible text completion response

string

Unique response identifier

Example:

"chatcmpl-123"

object

enum<string>

Response type

Available options:

text.completion,

chat.completion,

embedding,

speech,

transcribe,

responses.completion

Example:

"chat.completion"

choices

object[]

Array of completion choices for chat and text completions. Not present for responses type.

Show child attributes

data

object[]

Array of embedding objects

Show child attributes

speech

object

Show child attributes

transcribe

object

Show child attributes

messages

object[]

Array of messages for responses type.

Show child attributes

conversation_id

string

The conversation ID.

finish_reason

string

The reason the model stopped generating tokens.

stop_reason

string

The reason the model stopped generating tokens.

stop_sequence

string

The stop sequence that was generated.

prompt_cache

object

Show child attributes

model

string

Model used for generation

Example:

"gpt-4o"

created

integer

Unix timestamp of creation

Example:

1677652288

service_tier

string

Service tier used

system_fingerprint

string

System fingerprint

usage

object

Show child attributes

extra_fields

object

Show child attributes

OpenAI Compatible Text Completions (alternative path)OpenAI Compatible Chat Completions

⌘I

API Reference

Path Parameters

Body

Response