Python SDK
Smallest AI builds high-speed multi-lingual voice models tailored for real-time applications, achieving ultra-realistic audio generation in as fast as ~100 milliseconds for 10 seconds of audio. With this SDK, you can easily convert text into high-quality audio with humanlike expressiveness.
Currently, the library supports direct synthesis and the ability to synthesize streamed LLM output, both synchronously and asynchronously.
You can access the source code for the Python SDK on our GitHub repository.
Table of Contents
- Installation
- Get the API Key
- Best Practices for Input Text
- Examples
- Available Methods
- Technical Note: WAV Headers in Streaming Audio
Installation
To install the latest version available:
When using an SDK in your application, make sure to pin to at least the major version (e.g., ==1.*
). This ensures your application remains stable and avoids potential issues from breaking changes in future updates.
Get the API Key
- Visit waves.smallest.ai and sign up or log in.
- Navigate to the
API Key
tab in your account dashboard. - Create a new API Key and copy it.
- Export the API Key in your environment with the name
SMALLEST_API_KEY
to allow secure access for authentication.
Best Practices for Input Text
For optimal voice generation results:
- For English, provide the input in Latin script (e.g., “Hello, how are you?”).
- For Hindi, provide the input in Devanagari script (e.g., “नमस्ते, आप कैसे हैं?”).
- For code-mixed input, use Latin script for English and Devanagari script for Hindi (e.g., “Hello, आप कैसे हैं?”).
Note: The
transliterate
parameter is not fully supported and may not perform consistently. It is recommended to avoid relying on this parameter.
Examples
Sync
Synchronous text-to-speech synthesis client.
Basic Usage:
Parameters:
- api_key: Your API key (can be set via SMALLEST_API_KEY environment variable)
- model: TTS model to use (default: “lightning”)
- sample_rate: Audio sample rate (default: 24000)
- voice: Voice ID (default: “emily”)
- speed: Speech speed multiplier (default: 1.0)
- add_wav_header: Include WAV header in output (default: True)
- transliterate: Enable text transliteration (default: False)
- remove_extra_silence: Remove additional silence (default: True)
These parameters are part of the Smallest instance. They can be set when creating the instance (as shown above). However, the synthesize function also accepts kwargs, allowing you to override any of these parameters on a per-request basis.
For example, you can modify the speech speed and sample rate just for a particular synthesis request:
Override Parameters Example:
Async
Asynchronous text-to-speech synthesis client.
Basic Usage:
Parameters:
- api_key: Your API key (can be set via SMALLEST_API_KEY environment variable)
- model: TTS model to use (default: “lightning”)
- sample_rate: Audio sample rate (default: 24000)
- voice: Voice ID (default: “emily”)
- speed: Speech speed multiplier (default: 1.0)
- add_wav_header: Include WAV header in output (default: True)
- transliterate: Enable text transliteration (default: False)
- remove_extra_silence: Remove additional silence (default: True)
These parameters are part of the AsyncSmallest instance. They can be set when creating the instance (as shown above). However, the synthesize function also accepts kwargs, allowing you to override any of these parameters on a per-request basis.
For example, you can modify the speech speed and sample rate just for a particular synthesis request:
Override Parameters Example:
LLM to Speech
The TextToAudioStream
class provides real-time text-to-speech processing, converting streaming text into audio output. It’s useful for applications like voice assistants, live captioning, or chatbots that require immediate audio feedback.
Parameters:
tts_instance: Text-to-speech engine (Smallest or AsyncSmallest)
queue_timeout: Wait time for new text (seconds, default: 5.0)
max_retries: Number of retry attempts for failed synthesis (default: 3)
Output Format:
The processor yields raw audio data chunks without WAV headers for streaming efficiency. These chunks can be:
- Played directly through an audio device,
- Saved to a file.
- Streamed over a network.
- Further processed as needed.
Available Methods
Technical Note: WAV Headers in Streaming Audio
When streaming audio, WAV headers are excluded from individual chunks for efficiency. Reasons include:
- Headers contain metadata for the entire audio file, which isn’t suitable for streaming chunks.
- Including headers may cause playback artifacts when concatenating chunks.
Best Practices for Audio Streaming
- Stream raw PCM audio data without headers.
- Add a WAV header only when saving the complete audio stream or initializing playback.