Lightning v3.1

Generate speech from text (Lightning V3.1)

curl --request POST \
  --url https://waves-api.smallest.ai/api/v1/lightning-v3.1/get_speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "text": "<string>",
  "voice_id": "<string>",
  "sample_rate": 44100,
  "speed": 1,
  "language": "en",
  "output_format": "pcm",
  "pronunciation_dicts": [
    "<string>"
  ]
}
'

"<string>"

POST

api

lightning-v3.1

get_speech

Generate speech from text (Lightning V3.1)

curl --request POST \
  --url https://waves-api.smallest.ai/api/v1/lightning-v3.1/get_speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "text": "<string>",
  "voice_id": "<string>",
  "sample_rate": 44100,
  "speed": 1,
  "language": "en",
  "output_format": "pcm",
  "pronunciation_dicts": [
    "<string>"
  ]
}
'

"<string>"

Overview

Lightning v3.1 is a 44 kHz text-to-speech model that delivers natural, expressive, and realistic speech synthesis.

Key Features

Voice Cloning Support: Compatible with cloned voices
Ultra-Low Latency: Optimized for real-time applications
Multi-Language: Supports English (en) and Hindi (hi)
Multiple Output Formats: PCM, MP3, WAV, and mulaw
Flexible Sample Rates: 8000 Hz to 44100 Hz
Speed Control: Adjustable from 0.5x to 2x speed

Available Voices

Voice ID	Gender	Accent
sophia	Female	US
sandra	Female	US
rachel	Female	US
lauren	Female	US
hannah	Female	US
vanessa	Female	US
brooke	Female	US
megan	Female	US
robert	Male	US
johnny	Male	US
ethan	Male	US
lucas	Male	US
daniel	Male	US
edward	Male	British
vaibhav	Male	Indian
hitesh	Male	Indian
gaurav	Male	Indian
vivaan	Male	Indian
arjun	Male	Indian
kunal	Male	Indian
siddharth	Male	Indian
advika	Female	Indian
aisha	Female	Indian
yuvika	Female	Indian
ishani	Female	Indian
anuja	Female	Indian

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

text

string

required

The text to convert to speech.

voice_id

string

required

The voice identifier to use for speech generation.

sample_rate

enum<integer>

default:44100

The sample rate for the generated audio.

Available options:

8000,

16000,

24000,

44100

speed

number

default:1

The speed of the generated speech.

Required range: 0.5 <= x <= 2

language

enum<string>

default:en

Determines how numbers are spelled out. If set to 'en', numbers will be read in English. If set to 'hi', numbers will be read in Hindi.

Available options:

en,

hi

output_format

enum<string>

default:pcm

The format of the output audio.

Available options:

pcm,

mp3,

wav,

mulaw

pronunciation_dicts

string[]

The IDs of the pronunciation dictionaries to use for speech generation.

The ID of the pronunciation dictionary to use for speech generation.

Response

Synthesized speech retrieved successfully.

A PCM int16 WAV file at the specified sample rate.

WebSocket Lightning v3.1 WebSocket

⌘I

API References

Text to Speech

Speech to Text

Voices

Voice Cloning

Pronunciation Dictionaries

Lightning v3.1

Overview

Key Features

Available Voices

Authorizations

Body

Response

API References

Text to Speech

Speech to Text

Voices

Voice Cloning

Pronunciation Dictionaries

​Overview

​Key Features

​Available Voices

Authorizations

Body

Response

Overview

Key Features

Available Voices