Model Overview

Pricing

Find detailed description of each model along with their capabilities and supported languages.

Models

Waves

API Reference

Client Libraries

Changelog

Home

Support

Community

Blog

v3.0.1

v2.2.0

Waves is a platform to deliver real-time hyper-realistic text to speech.

Introduction

Welcome to the Smallest AI platform! This guide will help you get started quickly with generating your first text-to-speech using Python and Smallest AI API.  

Quickstart

Learn how to authenticate your API requests and manage access keys securely.

Authentication

HTTP vs HTTP Streaming vs Websockets

Learn how to synthesize your text using the Smallest AI API.

How to use Text to Speech

Learn how to convert streaming Text to Speech in Realtime.

How to stream LLM to TTS in Realtime

Learn how to retrieve available voices, models, and languages.

Get available Voices, Models and Languages

Train our model on your voice and generate a high-quality professional voice clone.

Types of Cloning: Instant vs Professional

How to Create an Instant Voice Clone

How to create an Instant Voice Clone using Python SDK

How to delete your Voice Clone using Python SDK

How to Create a Professional Voice Clone

Build LiveKit voice agents using Smallest AI TTS plugin.

LiveKit

Learn how to integrate Smallest AI TTS in Plivo for telephony solutions.

Telephony: Plivo

Learn how to integrate Smallest AI TTS in Vonage for telephony solutions.

Telephony: Vonage

Create and manage your projects in Waves.

Projects

Best Practices for Recording Reference Audio

Voice Cloning - Best Practices

Best practices for recording high-quality reference audio.

Professional Voice Cloning - Best Practices

Learn best practices for text formatting for optimal Audio Generation.

Text to Speech - Best Practices

Learn how to authenticate requests using API keys.

Learn about WebSocket support for our Text-to-Speech (TTS) API, how it works, and when to use it.

WebSocket

WebSocket Support for TTS API

Get speech for given text using the Waves API

Text to Speech

Stream speech for given text using the Lightning v2 SSE API

Text to Speech (SSE)

The Lightning v2 WebSocket API provides real-time text-to-speech streaming capabilities with high-quality voice synthesis. This API uses WebSocket to deliver audio chunks as they're generated, enabling low-latency audio playback without waiting for the entire audio file to process. For an end-to-end example of how to use the Lightning v2 WebSocket API, check out [Text to Speech (WS) Example](https://github.com/smallest-inc/waves-examples/tree/main/lightning_v2/ws_streaming)

Model ID	Description	Languages Supported
lightning	Fastest model with an RTF of 0.01, generating 10 seconds of audio in 100 ms.	English, Hindi
lightning-large	More emotional depth and expressiveness, supports voice cloning, latency under 300 ms.	English, Hindi
lightning-multilingual	Supports 30 languages, currently in beta.	30 languages

Changelog

Lightning

Lightning Large

​Model Overview

​Pricing

Model Overview

Pricing