Create transcription (LiteLLM - OpenAI Whisper)

curl --request POST \
  --url http://localhost:8080/litellm/v1/audio/transcriptions \
  --header 'Content-Type: multipart/form-data' \
  --form model=whisper-1 \
  --form file='@example-file' \
  --form 'language=<string>' \
  --form 'prompt=<string>' \
  --form response_format=json \
  --form temperature=0.5 \
  --form timestamp_granularities=word \
  --form stream=true \
  --form 'fallbacks=<string>'

{
  "duration": 123,
  "language": "<string>",
  "logprobs": [
    {
      "bytes": [
        123
      ],
      "logprob": 123,
      "token": "<string>"
    }
  ],
  "segments": [
    {
      "id": 123,
      "seek": 123,
      "start": 123,
      "end": 123,
      "text": "<string>",
      "tokens": [
        123
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ],
  "task": "<string>",
  "text": "<string>",
  "usage": {
    "type": "tokens",
    "input_tokens": 123,
    "input_token_details": {
      "text_tokens": 123,
      "audio_tokens": 123
    },
    "output_tokens": 123,
    "total_tokens": 123,
    "seconds": 123
  },
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "extra_fields": {
    "request_type": "<string>",
    "provider": "openai",
    "model_requested": "<string>",
    "model_deployment": "<string>",
    "latency": 123,
    "chunk_index": 123,
    "raw_request": {},
    "raw_response": {},
    "cache_debug": {
      "cache_hit": true,
      "cache_id": "<string>",
      "hit_type": "<string>",
      "provider_used": "<string>",
      "model_used": "<string>",
      "input_tokens": 123,
      "threshold": 123,
      "similarity": 123
    }
  }
}

POST

litellm

audio

transcriptions

Create transcription (LiteLLM - OpenAI Whisper)

curl --request POST \
  --url http://localhost:8080/litellm/v1/audio/transcriptions \
  --header 'Content-Type: multipart/form-data' \
  --form model=whisper-1 \
  --form file='@example-file' \
  --form 'language=<string>' \
  --form 'prompt=<string>' \
  --form response_format=json \
  --form temperature=0.5 \
  --form timestamp_granularities=word \
  --form stream=true \
  --form 'fallbacks=<string>'

{
  "duration": 123,
  "language": "<string>",
  "logprobs": [
    {
      "bytes": [
        123
      ],
      "logprob": 123,
      "token": "<string>"
    }
  ],
  "segments": [
    {
      "id": 123,
      "seek": 123,
      "start": 123,
      "end": 123,
      "text": "<string>",
      "tokens": [
        123
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ],
  "task": "<string>",
  "text": "<string>",
  "usage": {
    "type": "tokens",
    "input_tokens": 123,
    "input_token_details": {
      "text_tokens": 123,
      "audio_tokens": 123
    },
    "output_tokens": 123,
    "total_tokens": 123,
    "seconds": 123
  },
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "extra_fields": {
    "request_type": "<string>",
    "provider": "openai",
    "model_requested": "<string>",
    "model_deployment": "<string>",
    "latency": 123,
    "chunk_index": 123,
    "raw_request": {},
    "raw_response": {},
    "cache_debug": {
      "cache_hit": true,
      "cache_id": "<string>",
      "hit_type": "<string>",
      "provider_used": "<string>",
      "model_used": "<string>",
      "input_tokens": 123,
      "threshold": 123,
      "similarity": 123
    }
  }
}

Body

multipart/form-data

model

string

required

Model identifier (e.g., whisper-1)

Example:

"whisper-1"

file

required

Audio file to transcribe

language

string

Language of the audio (ISO 639-1)

prompt

string

Prompt to guide transcription

response_format

enum<string>

Available options:

json,

text,

srt,

verbose_json,

vtt

temperature

number

Required range: 0 <= x <= 1

timestamp_granularities

enum<string>[]

Available options:

word,

segment

stream

boolean

fallbacks

string[]

Response

Successful response

duration

number

language

string

logprobs

object[]

Show child attributes

segments

object[]

Show child attributes

task

string

text

string

usage

object

Show child attributes

words

object[]

Show child attributes

extra_fields

object

Additional fields included in responses

Show child attributes

Create speech (LiteLLM - OpenAI TTS)Create message (LiteLLM - Anthropic format)

API Reference

Body

Response