Text to Speech

Stream speech from text (Lightning v2)

curl --request POST \
  --url https://waves-api.smallest.ai/api/v1/lightning-v2/stream \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "text": "<string>",
  "voice_id": "<string>",
  "sample_rate": 24000,
  "speed": 1,
  "consistency": 0.5,
  "similarity": 0,
  "enhancement": 1,
  "language": "auto",
  "output_format": "pcm",
  "pronunciation_dicts": [
    "<string>"
  ]
}
'

{
  "data": "event: chunk\ndata: <WAV_DATA>\ndone: false\n"
}

POST

api

lightning-v2

stream

Stream speech from text (Lightning v2)

curl --request POST \
  --url https://waves-api.smallest.ai/api/v1/lightning-v2/stream \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "text": "<string>",
  "voice_id": "<string>",
  "sample_rate": 24000,
  "speed": 1,
  "consistency": 0.5,
  "similarity": 0,
  "enhancement": 1,
  "language": "auto",
  "output_format": "pcm",
  "pronunciation_dicts": [
    "<string>"
  ]
}
'

{
  "data": "event: chunk\ndata: <WAV_DATA>\ndone: false\n"
}

Overview

The Lightning v2 SSE API provides real-time text-to-speech streaming capabilities with high-quality voice synthesis. This API uses Server-Sent Events (SSE) to deliver audio chunks as they’re generated, enabling low-latency audio playback without waiting for the entire audio file to process. For an end-to-end example of how to use the Lightning v2 SSE API, check out Text to Speech (SSE) Example

When to Use

Interactive Applications: Perfect for chatbots, virtual assistants, and other applications requiring immediate voice responses
Long-Form Content: Efficiently stream audio for articles, stories, or other long-form content without buffering delays
Voice User Interfaces: Create natural-sounding voice interfaces with minimal perceived latency
Accessibility Solutions: Provide real-time audio versions of written content for users with visual impairments

How It Works

Make a POST Request: Send your text and voice settings to the API endpoint
Receive Audio Chunks: The API processes your text and streams audio back as base64-encoded chunks with 1024 byte size
Process the Stream: Handle the SSE events to decode and play audio chunks sequentially
End of Stream: The API sends a completion event when all audio has been delivered

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <api_key>, where <api_key> is your api key.

Body

application/json

text

string

required

The text to convert to speech.

voice_id

string

required

The voice identifier to use for speech generation.

sample_rate

integer

default:24000

The sample rate for the generated audio.

Required range: 8000 <= x <= 24000

speed

number

default:1

The speed of the generated speech.

Required range: 0.5 <= x <= 2

consistency

number

default:0.5

This parameter controls word repetition and skipping. Decrease it to prevent skipped words, and increase it to prevent repetition.

Required range: 0 <= x <= 1

similarity

number

default:0

This parameter controls the similarity between the generated speech and the reference audio. Increase it to make the speech more similar to the reference audio.

Required range: 0 <= x <= 1

enhancement

number

default:1

Enhances speech quality at the cost of increased latency.

Required range: 0 <= x <= 2

language

enum<string>

default:auto

Language code for text normalization (e.g., how numbers, dates, and abbreviations are spelled out). Set to 'auto' for automatic language detection, or specify a language code like 'en' or 'hi' to normalize text according to that language's rules.

Available options:

auto,

en,

hi,

ta,

kn,

mr,

bn,

gu,

ar,

he,

fr,

de,

pl,

ru,

it,

nl,

es,

sv,

ml,

te

output_format

enum<string>

default:pcm

The format of the output audio.

Available options:

pcm,

mp3,

wav,

mulaw

pronunciation_dicts

string[]

The IDs of the pronunciation dictionaries to use for speech generation.

The ID of the pronunciation dictionary to use for speech generation.

Response

Synthesized speech retrieved successfully.

Text to Speech Text to Speech (WebSocket)

⌘I

API References

Lightning v2

Lightning Large

Lightning

Voices

Voice Cloning

Text to Speech

Overview

When to Use

How It Works

Authorizations

Body

Response

API References

Lightning v2

Lightning Large

Lightning

Voices

Voice Cloning

​Overview

​When to Use

​How It Works

Authorizations

Body

Response

Overview

When to Use

How It Works