Lightning

The ASR POST API allows you to convert speech to text using two different input methods:

Raw Audio Bytes (application/octet-stream) - Send raw audio data with all parameters as query parameters
Audio URL (application/json) - Provide only a URL to an audio file in the JSON body, with all other parameters as query parameters

Both methods use our Lightning ASR model with automatic language detection across 30+ languages.

Authentication

This endpoint requires authentication using a Bearer token in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Input Methods

Choose the input method that best fits your use case:

Method	Content Type	Use Case	Parameters
Raw Bytes	`application/octet-stream`	Streaming audio data, real-time processing	Query parameters
Audio URL	`application/json`	Remote audio files, webhook processing	Query parameters

Code Examples

Method 1: Raw Audio Bytes (application/octet-stream)

curl --request POST \
  --url "https://waves-api.smallest.ai/api/v1/lightning/get_text?model=lightning&language=en&word_timestamps=true&age_detection=true&gender_detection=true&emotion_detection=true" \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: audio/wav' \
  --data-binary '@/path/to/your/audio.wav'

Method 2: Audio URL (application/json)

curl --request POST \
  --url "https://waves-api.smallest.ai/api/v1/lightning/get_text?model=lightning&language=en&word_timestamps=true&age_detection=true&gender_detection=true&emotion_detection=true" \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "https://example.com/audio.mp3"
  }'

Supported Languages

The Lightning ASR model supports automatic language detection and transcription across 30+ languages. For the full list of supported languages, please check ASR Supported Languages.

Specify the language of the input audio using its ISO 639-1 code. Use multi to enable automatic language detection from the supported list. The default is en (English).

Authorizations

Authorization

string

header

required

API key authentication using Bearer token format. Include your API key in the Authorization header as: Bearer YOUR_API_KEY

Query Parameters

model

enum<string>

required

The ASR model to use for transcription

Available options:

lightning

Example:

"lightning"

language

enum<string>

default:en

Language of the audio file. Use multi for automatic language detection

Available options:

it,

es,

en,

pt,

hi,

de,

fr,

uk,

ru,

kn,

ml,

pl,

mr,

gu,

cs,

sk,

te,

or,

nl,

bn,

lv,

et,

ro,

pa,

fi,

sv,

bg,

ta,

hu,

da,

lt,

mt,

multi

word_timestamps

boolean

default:false

Whether to include word-level timestamps in the response

age_detection

enum<string>

default:false

Whether to predict age group of the speaker

Available options:

true,

false

gender_detection

enum<string>

default:false

Whether to predict the gender of the speaker

Available options:

true,

false

emotion_detection

enum<string>

default:false

Whether to predict speaker emotions

Available options:

true,

false

Body

Raw audio bytes. Content-Type header should specify the audio format (e.g., audio/wav, audio/mp3). All parameters are passed as query parameters.

Response

Speech transcribed successfully

status

string

Status of the transcription request

Example:

"success"

transcription

string

The transcribed text from the audio file

Example:

"Hello world."

audio_length

number

Duration of the audio file in seconds

Example:

1.7

word_timestamps

object[]

Word-level timestamps in seconds.

Show child attributes

age

enum<string>

Predicted age group of the speaker (e.g., infant, teenager, adult, old)

Available options:

infant,

teenager,

adult,

old

Example:

"adult"

gender

enum<string>

Predicted gender of the speaker if requested

Available options:

male,

female

Example:

"male"

emotions

object

Predicted emotions of the speaker if requested

Show child attributes

metadata

object

Metadata about the transcription

Show child attributes

API References

Text to Speech

Speech to Text

Voices

Voice Cloning

Pronunciations dicts

Authentication

Input Methods

Code Examples

Method 1: Raw Audio Bytes (application/octet-stream)

Method 2: Audio URL (application/json)

Supported Languages

Authorizations

Query Parameters

Body

Response

API References

Text to Speech

Speech to Text

Voices

Voice Cloning

Pronunciations dicts

​Authentication

​Input Methods

​Code Examples

​Method 1: Raw Audio Bytes (application/octet-stream)

​Method 2: Audio URL (application/json)

​Supported Languages

Authorizations

Query Parameters

Body

Response

Authentication

Input Methods

Code Examples

Method 1: Raw Audio Bytes (application/octet-stream)

Method 2: Audio URL (application/json)

Supported Languages