Real-time Text to Speech Synthesis
TheWavesStreamingTTS class provides high-performance text-to-speech conversion with configurable streaming parameters. This implementation is optimized for low-latency applications where immediate audio feedback is critical, such as voice assistants, live narration, or interactive applications.
Configuration Setup
The streaming TTS system uses aTTSConfig object to manage synthesis parameters:
Basic Text Synthesis
For straightforward text-to-speech conversion, use thesynthesize method:
Streaming Text Input
For real-time applications where text arrives incrementally, usesynthesize_streaming:
Saving Audio to WAV File
Convert the raw PCM audio chunks to a standard WAV file:Configuration Parameters
voice_id: Voice identifier (e.g., “aditi”, “male-1”, “female-2”)api_key: Your Smallest AI API keylanguage: Language code for synthesis (default: “en”)sample_rate: Audio sample rate in Hz (default: 24000)speed: Speech speed multiplier (default: 1.0 - normal speed, 0.5 = half speed, 2.0 = double speed)consistency: Voice consistency parameter (default: 0.5, range: 0.0-1.0)enhancement: Audio enhancement level (default: 1)similarity: Voice similarity parameter (default: 0, range: 0.0-1.0)max_buffer_flush_ms: Maximum buffer time in milliseconds before forcing audio output (default: 0)
Output Format
The streaming TTS returns raw PCM audio data as bytes objects. Each chunk represents a portion of the synthesized audio that can be:- Played directly through audio hardware
- Saved to audio files (WAV, MP3, etc.)
- Streamed over network protocols
- Processed with additional audio effects

