Real-time streaming best practices
Follow these recommendations to keep Lightning STT latencies low while preserving transcript fidelity in real-time scenarios.
Chunk Size and Streaming Rate
Recommended Chunk Size
- Optimal: 4096 bytes per chunk
- Range: 1024 to 8192 bytes
- Consistency: Maintain consistent chunk sizes when possible
Sending audio in 4096-byte chunks provides the best balance between latency and processing efficiency.
Streaming Rate
- Interval: Send chunks every 50-100ms
- Avoid: Sending chunks too rapidly (< 20ms) or too slowly (> 200ms)
- Consistency: Maintain regular intervals for predictable latency
// Good: Consistent 50ms intervals
setTimeout(sendChunk, 50);
// Avoid: Variable or very short intervals
setTimeout(sendChunk, Math.random() * 10); // Too fast and inconsistent
Handling Partial vs Final Transcripts
The API sends two types of transcripts:
Partial Transcripts (is_final: false)
- Purpose: Show interim results for immediate user feedback
- Behavior: May change as more audio is processed
- Use case: Display “live” transcription as the user speaks
if (!message.is_final) {
// Show partial transcript with visual indicator (e.g., grayed out)
displayPartialTranscript(message.transcript);
}
Final Transcripts (is_final: true)
- Purpose: Confirmed transcription for a segment
- Behavior: Stable and won’t change
- Use case: Store in database, display as confirmed text
if (message.is_final) {
// Store final transcript
saveTranscript(message.full_transcript);
// Update UI with confirmed text
displayFinalTranscript(message.full_transcript);
}
Audio Preprocessing
Before Streaming
- Convert to correct format: Ensure audio matches the
encoding parameter (linear16, linear32, alaw, mulaw, opus, ogg_opus)
- Set sample rate: Match the
sample_rate parameter in your WebSocket URL
- Mono channel: Downmix stereo/multi-channel to mono
- Normalize levels: Prevent clipping and ensure consistent volume
Example Preprocessing
import numpy as np
import soundfile as sf
def preprocess_audio(input_path, target_sample_rate=16000):
"""Preprocess audio for WebSocket streaming"""
audio, sample_rate = sf.read(input_path)
# Convert to mono
if len(audio.shape) > 1:
audio = np.mean(audio, axis=1)
# Resample if needed
if sample_rate != target_sample_rate:
from scipy import signal
audio = signal.resample(audio, int(len(audio) * target_sample_rate / sample_rate))
# Normalize to prevent clipping
max_val = np.abs(audio).max()
if max_val > 0:
audio = audio / max_val * 0.95
# Convert to 16-bit PCM
audio_int16 = (audio * 32767).astype(np.int16)
return audio_int16, target_sample_rate
Error Handling and Reconnection
Connection Errors
Implement robust error handling for network issues:
let reconnectAttempts = 0;
const maxReconnectAttempts = 5;
function connect() {
const ws = new WebSocket(url.toString());
ws.onerror = (error) => {
console.error("WebSocket error:", error);
};
ws.onclose = (event) => {
if (event.code !== 1000 && reconnectAttempts < maxReconnectAttempts) {
reconnectAttempts++;
const delay = Math.min(1000 * Math.pow(2, reconnectAttempts), 30000);
console.log(`Reconnecting in ${delay}ms...`);
setTimeout(connect, delay);
}
};
ws.onopen = () => {
reconnectAttempts = 0; // Reset on successful connection
};
return ws;
}
Handling Connection Drops
- Detect drops: Monitor connection state and implement heartbeat/ping
- Buffer audio: Store audio chunks during disconnection
- Resume streaming: Continue from where you left off after reconnection
Session Management
Session Lifecycle
- Establish connection: Create WebSocket with proper authentication
- Stream audio: Send chunks at regular intervals
- Handle responses: Process partial and final transcripts
- End session: Send
{"type": "end"} when done
- Close connection: Gracefully close the WebSocket
Graceful Shutdown
To properly close a session, send the end token and wait for the server to respond with is_last=true before closing the WebSocket connection:
function endTranscription(ws) {
// Send end signal
ws.send(JSON.stringify({ type: "end" }));
// Wait for is_last=true response before closing
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.is_last === true) {
ws.close(1000, "Transcription complete");
}
};
}
Do not close the WebSocket immediately after sending the end token. Always wait for the is_last=true response to ensure all audio has been processed and final transcripts are received.
Latency Optimization
Minimize Processing Delays
- Preprocess offline: Convert audio format before streaming
- Use optimal encoding:
linear16 at 16 kHz for best latency/quality balance
- Consistent chunking: Avoid variable chunk sizes that cause processing delays
Network Optimization
- Stable connection: Use reliable network connections
- Monitor bandwidth: Ensure sufficient bandwidth for audio streaming
- Reduce overhead: Minimize unnecessary data in WebSocket messages
Quality Checklist
- Use 16 kHz mono linear16 whenever possible for optimal latency
- Stream in 4096-byte chunks at 50-100ms intervals
- Handle partial transcripts for immediate user feedback
- Store final transcripts for accuracy and persistence
- Implement reconnection logic for production reliability
- Monitor session state to detect and handle errors gracefully
- Test with real audio to validate latency and accuracy
For Low Latency
- Use
linear16 encoding at 16 kHz
- Stream chunks every 50ms
- Process responses asynchronously
- Avoid blocking operations in message handlers
For High Accuracy
- Use higher sample rates (44.1 kHz or 48 kHz) when latency allows
- Enable
word_timestamps for precise timing
- Wait for
is_final=true before committing transcripts
- Use
full_transcript for complete session text
For Production
- Implement connection pooling for multiple sessions
- Add rate limiting to prevent overwhelming the API
- Log session IDs for debugging and support
- Monitor transcription quality and latency metrics