POST
/
api
/
v1
/
{model}
/
get_speech

The Waves API provides advanced text-to-speech capabilities with multiple voice options and customizable sample rates. This API allows you to convert text into natural-sounding speech in various languages and accents.

You can get your API key/Bearer token by logging into the Waves platform and clicking on API key in the left panel.

Models

Waves API supports multiple models for speech synthesis. Currently, we offer:

  • Lightning: Our first and fastest model, optimized for low-latency applications.

To use a specific model, adjust the URL path in your API requests:

https://waves-api.smallest.ai/api/v1/<model>/get_speech

For example, to use the Lightning model:

https://waves-api.smallest.ai/api/v1/lightning/get_speech

We’re continuously working on new models to enhance our speech synthesis capabilities. Check announcements for latest updates.

API Specification and Code Samples

Note: The interactive “Try it out” feature will be available very soon.

If you need an API key or have any questions, please contact our support team at support@smallest.ai.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

Authorization
string
required

Bearer token for authentication. Format is 'Bearer {token}'

Path Parameters

model
enum<string>
required

The model to use for speech synthesis. Currently, only 'lightning' is available.

Available options:
lightning

Body

application/json
text
string
required

The text to be synthesized into speech

voice_id
enum<string>
required

Voice IDS you can use with this API

Available options:
emily,
jasmine,
arman,
james,
mithali,
aravind,
raj,
diya,
raman,
ananya,
isha,
william,
aarav,
monika,
niharika,
deepika,
raghav,
kajal,
radhika,
mansi,
nisha,
saurabh,
pooja,
saina,
sanya
add_wav_header
boolean
default:
true

Whether to add a WAV header or not

sample_rate
enum<integer>
default:
24000

Sample rate of the output audio file. Allowed values are 8000, 16000, or 24000.

Available options:
8000,
16000,
24000
speed
number
default:
1

The speed for generated speech, Allowed range 0.5 - 2.

Required range: 0.5 < x < 2

Response

200 - audio/wav

A PCM int16 WAV file at the specified frequency.