Create transcription

curl --request POST \
  --url http://localhost:8080/v1/audio/transcriptions \
  --header 'Content-Type: multipart/form-data' \
  --form 'model=<string>' \
  --form file='@example-file' \
  --form 'fallbacks=<string>' \
  --form stream=true \
  --form 'language=<string>' \
  --form 'prompt=<string>' \
  --form response_format=json \
  --form 'file_format=<string>'

{
  "duration": 123,
  "language": "<string>",
  "logprobs": [
    {
      "bytes": [
        123
      ],
      "logprob": 123,
      "token": "<string>"
    }
  ],
  "segments": [
    {
      "id": 123,
      "seek": 123,
      "start": 123,
      "end": 123,
      "text": "<string>",
      "tokens": [
        123
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ],
  "task": "<string>",
  "text": "<string>",
  "usage": {
    "type": "tokens",
    "input_tokens": 123,
    "input_token_details": {
      "text_tokens": 123,
      "audio_tokens": 123
    },
    "output_tokens": 123,
    "total_tokens": 123,
    "seconds": 123
  },
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "extra_fields": {
    "request_type": "<string>",
    "provider": "openai",
    "model_requested": "<string>",
    "model_deployment": "<string>",
    "latency": 123,
    "chunk_index": 123,
    "raw_request": {},
    "raw_response": {},
    "cache_debug": {
      "cache_hit": true,
      "cache_id": "<string>",
      "hit_type": "<string>",
      "provider_used": "<string>",
      "model_used": "<string>",
      "input_tokens": 123,
      "threshold": 123,
      "similarity": 123
    }
  }
}

POST

audio

transcriptions

Create transcription

curl --request POST \
  --url http://localhost:8080/v1/audio/transcriptions \
  --header 'Content-Type: multipart/form-data' \
  --form 'model=<string>' \
  --form file='@example-file' \
  --form 'fallbacks=<string>' \
  --form stream=true \
  --form 'language=<string>' \
  --form 'prompt=<string>' \
  --form response_format=json \
  --form 'file_format=<string>'

{
  "duration": 123,
  "language": "<string>",
  "logprobs": [
    {
      "bytes": [
        123
      ],
      "logprob": 123,
      "token": "<string>"
    }
  ],
  "segments": [
    {
      "id": 123,
      "seek": 123,
      "start": 123,
      "end": 123,
      "text": "<string>",
      "tokens": [
        123
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ],
  "task": "<string>",
  "text": "<string>",
  "usage": {
    "type": "tokens",
    "input_tokens": 123,
    "input_token_details": {
      "text_tokens": 123,
      "audio_tokens": 123
    },
    "output_tokens": 123,
    "total_tokens": 123,
    "seconds": 123
  },
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "extra_fields": {
    "request_type": "<string>",
    "provider": "openai",
    "model_requested": "<string>",
    "model_deployment": "<string>",
    "latency": 123,
    "chunk_index": 123,
    "raw_request": {},
    "raw_response": {},
    "cache_debug": {
      "cache_hit": true,
      "cache_id": "<string>",
      "hit_type": "<string>",
      "provider_used": "<string>",
      "model_used": "<string>",
      "input_tokens": 123,
      "threshold": 123,
      "similarity": 123
    }
  }
}

Body

multipart/form-data

model

string

required

Model in provider/model format

file

required

Audio file to transcribe

fallbacks

string[]

stream

boolean

language

string

prompt

string

response_format

enum<string>

Available options:

json,

text,

srt,

verbose_json,

vtt

file_format

string

Response

Successful response

duration

number

language

string

logprobs

object[]

Show child attributes

segments

object[]

Show child attributes

task

string

text

string

usage

object

Show child attributes

words

object[]

Show child attributes

extra_fields

object

Additional fields included in responses

Show child attributes

Create speech Generate image

⌘I

API Reference

Body

Response