Stream TTS audio in real-time via WebSocket or SSE — first chunk in ~200ms.
Streaming TTS delivers audio chunks as they’re generated — playback starts immediately instead of waiting for the full file. First chunk arrives in ~200ms.Streamed audio output:
Persistent connections for continuous, low-latency audio. Best for conversational AI and real-time apps.Endpoint:wss://api.smallest.ai/waves/v1/lightning-v3.1/get_speech/stream
Copy
import asyncioimport jsonimport base64import waveimport osimport websocketsAPI_KEY = os.environ["SMALLEST_API_KEY"]WS_URL = "wss://api.smallest.ai/waves/v1/lightning-v3.1/get_speech/stream"async def stream_tts(text): audio_chunks = [] async with websockets.connect( WS_URL, extra_headers={"Authorization": f"Bearer {API_KEY}"}, ) as ws: await ws.send(json.dumps({ "text": text, "voice_id": "magnus", "sample_rate": 24000, })) while True: response = await ws.recv() data = json.loads(response) if data["status"] == "chunk": audio = base64.b64decode(data["data"]["audio"]) audio_chunks.append(audio) elif data["status"] == "complete": break # Save as WAV raw = b"".join(audio_chunks) with wave.open("streamed.wav", "wb") as wf: wf.setnchannels(1) wf.setsampwidth(2) wf.setframerate(24000) wf.writeframes(raw) print(f"Saved streamed.wav ({len(audio_chunks)} chunks)")asyncio.run(stream_tts("Streaming delivers audio in real-time for voice assistants and chatbots."))
For real-time applications where text arrives incrementally (e.g., from an LLM), the SDK supports streaming text input:
Copy
from smallestai.waves import TTSConfig, WavesStreamingTTSconfig = TTSConfig(voice_id="magnus", api_key="YOUR_API_KEY", sample_rate=24000)streaming_tts = WavesStreamingTTS(config)def text_stream(): """Simulates text arriving word by word (e.g., from an LLM).""" text = "Streaming synthesis with chunked text input." for word in text.split(): yield word + " "audio_chunks = []for chunk in streaming_tts.synthesize_streaming(text_stream()): audio_chunks.append(chunk) # In a real app, play each chunk immediately
Use WebSocket when sending multiple TTS requests over time (conversations, voice bots). Use SSE for simple one-shot streaming where you don’t need a persistent connection.