Text to Speech

Generate speech from text (Lightning v2)

curl --request POST \
  --url https://waves-api.smallest.ai/api/v1/lightning-v2/get_speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "text": "<string>",
  "voice_id": "<string>",
  "sample_rate": 24000,
  "speed": 1,
  "consistency": 0.5,
  "similarity": 0,
  "enhancement": 1,
  "language": "auto",
  "output_format": "pcm",
  "pronunciation_dicts": [
    "<string>"
  ]
}
'

"<string>"

POST

api

lightning-v2

get_speech

Generate speech from text (Lightning v2)

curl --request POST \
  --url https://waves-api.smallest.ai/api/v1/lightning-v2/get_speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "text": "<string>",
  "voice_id": "<string>",
  "sample_rate": 24000,
  "speed": 1,
  "consistency": 0.5,
  "similarity": 0,
  "enhancement": 1,
  "language": "auto",
  "output_format": "pcm",
  "pronunciation_dicts": [
    "<string>"
  ]
}
'

"<string>"

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <api_key>, where <api_key> is your api key.

Body

application/json

text

string

required

The text to convert to speech.

voice_id

string

required

The voice identifier to use for speech generation.

sample_rate

integer

default:24000

The sample rate for the generated audio.

Required range: 8000 <= x <= 24000

speed

number

default:1

The speed of the generated speech.

Required range: 0.5 <= x <= 2

consistency

number

default:0.5

This parameter controls word repetition and skipping. Decrease it to prevent skipped words, and increase it to prevent repetition.

Required range: 0 <= x <= 1

similarity

number

default:0

This parameter controls the similarity between the generated speech and the reference audio. Increase it to make the speech more similar to the reference audio.

Required range: 0 <= x <= 1

enhancement

number

default:1

Enhances speech quality at the cost of increased latency.

Required range: 0 <= x <= 2

language

enum<string>

default:auto

Language code for text normalization (e.g., how numbers, dates, and abbreviations are spelled out). Set to 'auto' for automatic language detection, or specify a language code like 'en' or 'hi' to normalize text according to that language's rules.

Available options:

auto,

en,

hi,

ta,

kn,

mr,

bn,

gu,

ar,

he,

fr,

de,

pl,

ru,

it,

nl,

es,

sv,

ml,

te

output_format

enum<string>

default:pcm

The format of the output audio.

Available options:

pcm,

mp3,

wav,

mulaw

pronunciation_dicts

string[]

The IDs of the pronunciation dictionaries to use for speech generation.

The ID of the pronunciation dictionary to use for speech generation.

Response

Synthesized speech retrieved successfully.

A PCM int16 WAV file at the specified sample rate.

WebSocket Text to Speech (SSE)

⌘I

API References

Lightning v2

Lightning Large

Lightning

Voices

Voice Cloning

Text to Speech

Authorizations

Body

Response