Body
Model identifier in 'provider/model' format (e.g., 'openai/gpt-4o-mini', 'anthropic/claude-3-sonnet-20240229')
"openai/gpt-4o-mini"
Array of chat messages
1Maximum number of tokens to generate. Note: this is an alias for max_completion_tokens and will be overridden by it if both are present.
x >= 11000
Controls randomness in the output. Higher values make the output more random, while lower values make it more deterministic.
0 <= x <= 2Controls diversity via nucleus sampling. 0.5 means half of all likelihood-weighted options are considered.
0 <= x <= 1Number of chat completion choices to generate for each input message.
If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available.
Up to 4 sequences where the API will stop generating further tokens.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
-2 <= x <= 2Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
-2 <= x <= 2Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens to an associated bias value from -100 to 100.
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20. The number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
0 <= x <= 20The maximum number of tokens that can be generated in the chat completion.
A set of key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format.
A list of modalities to use for the response.
Whether to enable parallel tool calls. If set to true, the model will be able to call multiple tools in a single response.
A key to use for caching the prompt.
The reasoning effort to use for the response.
A unique identifier for the safety settings to use for the response.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
The service tier to use for the response. Can be auto or default.
auto, default Whether to store the request and response in the log store.
Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. required means the model must call a function. Specifying a particular function via {"type": "function", "function": {"name": "my_function"}} forces the model to call that function.
none, auto, required A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.
The verbosity level of the response.
Fallback model names in 'provider/model' format
[
"anthropic/claude-3-sonnet-20240229",
"openai/gpt-4o"
]Response
Anthropic-compatible message response
Unique response identifier
"chatcmpl-123"
Response type
text.completion, chat.completion, embedding, speech, transcribe, responses.completion "chat.completion"
Array of completion choices for chat and text completions. Not present for responses type.
Array of embedding objects
Array of messages for responses type.
The conversation ID.
The reason the model stopped generating tokens.
The reason the model stopped generating tokens.
The stop sequence that was generated.
Model used for generation
"gpt-4o"
Unix timestamp of creation
1677652288
Service tier used
System fingerprint

