- Raw Audio Bytes (
application/octet-stream) - Send raw audio data with all parameters as query parameters - Audio URL (
application/json) - Provide only a URL to an audio file in the JSON body, with all other parameters as query parameters
Authentication
This endpoint requires authentication using a Bearer token in the Authorization header:Input Methods
Choose the input method that best fits your use case:| Method | Content Type | Use Case | Parameters |
|---|---|---|---|
| Raw Bytes | application/octet-stream | Streaming audio data, real-time processing | Query parameters |
| Audio URL | application/json | Remote audio files, webhook processing | Query parameters |
Code Examples
Method 1: Raw Audio Bytes (application/octet-stream)
Method 2: Audio URL (application/json)
Supported Languages
The Lightning ASR model supports automatic language detection and transcription across 30+ languages. For the full list of supported languages, please check ASR Supported Languages.multi to
enable automatic language detection from the supported list. The default is
en (English).Authorizations
API key authentication using Bearer token format.
Include your API key in the Authorization header as: Bearer YOUR_API_KEY
Query Parameters
The ASR model to use for transcription
lightning "lightning"
Language of the audio file. Use multi for automatic language detection
it, es, en, pt, hi, de, fr, uk, ru, kn, ml, pl, mr, gu, cs, sk, te, or, nl, bn, lv, et, ro, pa, fi, sv, bg, ta, hu, da, lt, mt, multi Whether to include word-level timestamps in the response
Whether to predict age group of the speaker
true, false Whether to predict the gender of the speaker
true, false Whether to predict speaker emotions
true, false Body
Raw audio bytes. Content-Type header should specify the audio format (e.g., audio/wav, audio/mp3). All parameters are passed as query parameters.
Response
Speech transcribed successfully
Status of the transcription request
"success"
The transcribed text from the audio file
"Hello world."
Duration of the audio file in seconds
1.7
Word-level timestamps in seconds.
Predicted age group of the speaker (e.g., infant, teenager, adult, old)
infant, teenager, adult, old "adult"
Predicted gender of the speaker if requested
male, female "male"
Predicted emotions of the speaker if requested
Metadata about the transcription

