Convert speech to text using file upload with the Lightning STT POST API
application/octet-stream) - Send raw audio data with all parameters as query parametersapplication/json) - Provide only a URL to an audio file in the JSON body, with all other parameters as query parameters| Method | Content Type | Use Case | Parameters |
|---|---|---|---|
| Raw Bytes | application/octet-stream | Streaming audio data, real-time processing | Query parameters |
| Audio URL | application/json | Remote audio files, webhook processing | Query parameters |
multi to
enable automatic language detection from the supported list. The default is
en (English).API key authentication using Bearer token format.
Include your API key in the Authorization header as: Bearer YOUR_API_KEY
The ASR model to use for transcription
lightning "lightning"
Language of the audio file. Use multi for automatic language detection
it, es, en, pt, hi, de, fr, uk, ru, kn, ml, pl, mr, gu, cs, sk, te, or, nl, bn, lv, et, ro, pa, fi, sv, bg, ta, hu, da, lt, mt, multi URL to the webhook to receive the transcription results
"https://example.com/webhook"
Extra parameters to pass to the transcription. These will be added to the request body as a JSON object. Add comma separated key-value pairs to the query string. eg "custom_key:custom_value,custom_key2:custom_value2"
"custom_key:custom_value,custom_key2:custom_value2"
Whether to include word and utterance level timestamps in the response
Whether to perform speaker diarization
Whether to predict age group of the speaker
true, false Whether to predict the gender of the speaker
true, false Whether to predict speaker emotions
true, false Raw audio bytes. Content-Type header should specify the audio format (e.g., audio/wav, audio/mp3). All parameters are passed as query parameters.
Speech transcribed successfully
Status of the transcription request
"success"
The transcribed text from the audio file
"Hello world."
Duration of the audio file in seconds
1.7
Word-level timestamps in seconds.
List of utterances with start and end times
Predicted age group of the speaker (e.g., infant, teenager, adult, old)
infant, teenager, adult, old "adult"
Predicted gender of the speaker if requested
male, female "male"
Predicted emotions of the speaker if requested
Metadata about the transcription