Skip to main content
POST
/
litellm
/
v1
/
audio
/
transcriptions
Create transcription (LiteLLM - OpenAI Whisper)
curl --request POST \
  --url http://localhost:8080/litellm/v1/audio/transcriptions \
  --header 'Content-Type: multipart/form-data' \
  --form model=whisper-1 \
  --form file='@example-file' \
  --form 'language=<string>' \
  --form 'prompt=<string>' \
  --form response_format=json \
  --form temperature=0.5 \
  --form timestamp_granularities=word \
  --form stream=true \
  --form 'fallbacks=<string>'
{
  "duration": 123,
  "language": "<string>",
  "logprobs": [
    {
      "bytes": [
        123
      ],
      "logprob": 123,
      "token": "<string>"
    }
  ],
  "segments": [
    {
      "id": 123,
      "seek": 123,
      "start": 123,
      "end": 123,
      "text": "<string>",
      "tokens": [
        123
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ],
  "task": "<string>",
  "text": "<string>",
  "usage": {
    "type": "tokens",
    "input_tokens": 123,
    "input_token_details": {
      "text_tokens": 123,
      "audio_tokens": 123
    },
    "output_tokens": 123,
    "total_tokens": 123,
    "seconds": 123
  },
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "extra_fields": {
    "request_type": "<string>",
    "provider": "openai",
    "model_requested": "<string>",
    "model_deployment": "<string>",
    "latency": 123,
    "chunk_index": 123,
    "raw_request": {},
    "raw_response": {},
    "cache_debug": {
      "cache_hit": true,
      "cache_id": "<string>",
      "hit_type": "<string>",
      "provider_used": "<string>",
      "model_used": "<string>",
      "input_tokens": 123,
      "threshold": 123,
      "similarity": 123
    }
  }
}

Body

multipart/form-data
model
string
required

Model identifier (e.g., whisper-1)

Example:

"whisper-1"

file
file
required

Audio file to transcribe

language
string

Language of the audio (ISO 639-1)

prompt
string

Prompt to guide transcription

response_format
enum<string>
Available options:
json,
text,
srt,
verbose_json,
vtt
temperature
number
Required range: 0 <= x <= 1
timestamp_granularities
enum<string>[]
Available options:
word,
segment
stream
boolean
fallbacks
string[]

Response

Successful response

duration
number
language
string
logprobs
object[]
segments
object[]
task
string
text
string
usage
object
words
object[]
extra_fields
object

Additional fields included in responses