Skip to main content

Real-time streaming best practices

Follow these recommendations to keep Lightning STT latencies low while preserving transcript fidelity in real-time scenarios.

Chunk Size and Streaming Rate

  • Optimal: 4096 bytes per chunk
  • Range: 1024 to 8192 bytes
  • Consistency: Maintain consistent chunk sizes when possible
Sending audio in 4096-byte chunks provides the best balance between latency and processing efficiency.

Streaming Rate

  • Interval: Send chunks every 50-100ms
  • Avoid: Sending chunks too rapidly (< 20ms) or too slowly (> 200ms)
  • Consistency: Maintain regular intervals for predictable latency
// Good: Consistent 50ms intervals
setTimeout(sendChunk, 50);

// Avoid: Variable or very short intervals
setTimeout(sendChunk, Math.random() * 10); // Too fast and inconsistent

Handling Partial vs Final Transcripts

The API sends two types of transcripts:

Partial Transcripts (is_final: false)

  • Purpose: Show interim results for immediate user feedback
  • Behavior: May change as more audio is processed
  • Use case: Display “live” transcription as the user speaks
if (!message.is_final) {
  // Show partial transcript with visual indicator (e.g., grayed out)
  displayPartialTranscript(message.transcript);
}

Final Transcripts (is_final: true)

  • Purpose: Confirmed transcription for a segment
  • Behavior: Stable and won’t change
  • Use case: Store in database, display as confirmed text
if (message.is_final) {
  // Store final transcript
  saveTranscript(message.full_transcript);
  // Update UI with confirmed text
  displayFinalTranscript(message.full_transcript);
}

Audio Preprocessing

Before Streaming

  1. Convert to correct format: Ensure audio matches the encoding parameter (linear16, linear32, alaw, mulaw, opus, ogg_opus)
  2. Set sample rate: Match the sample_rate parameter in your WebSocket URL
  3. Mono channel: Downmix stereo/multi-channel to mono
  4. Normalize levels: Prevent clipping and ensure consistent volume

Example Preprocessing

import numpy as np
import soundfile as sf

def preprocess_audio(input_path, target_sample_rate=16000):
    """Preprocess audio for WebSocket streaming"""
    audio, sample_rate = sf.read(input_path)
    
    # Convert to mono
    if len(audio.shape) > 1:
        audio = np.mean(audio, axis=1)
    
    # Resample if needed
    if sample_rate != target_sample_rate:
        from scipy import signal
        audio = signal.resample(audio, int(len(audio) * target_sample_rate / sample_rate))
    
    # Normalize to prevent clipping
    max_val = np.abs(audio).max()
    if max_val > 0:
        audio = audio / max_val * 0.95
    
    # Convert to 16-bit PCM
    audio_int16 = (audio * 32767).astype(np.int16)
    
    return audio_int16, target_sample_rate

Error Handling and Reconnection

Connection Errors

Implement robust error handling for network issues:
let reconnectAttempts = 0;
const maxReconnectAttempts = 5;

function connect() {
  const ws = new WebSocket(url.toString());
  
  ws.onerror = (error) => {
    console.error("WebSocket error:", error);
  };
  
  ws.onclose = (event) => {
    if (event.code !== 1000 && reconnectAttempts < maxReconnectAttempts) {
      reconnectAttempts++;
      const delay = Math.min(1000 * Math.pow(2, reconnectAttempts), 30000);
      console.log(`Reconnecting in ${delay}ms...`);
      setTimeout(connect, delay);
    }
  };
  
  ws.onopen = () => {
    reconnectAttempts = 0; // Reset on successful connection
  };
  
  return ws;
}

Handling Connection Drops

  • Detect drops: Monitor connection state and implement heartbeat/ping
  • Buffer audio: Store audio chunks during disconnection
  • Resume streaming: Continue from where you left off after reconnection

Session Management

Session Lifecycle

  1. Establish connection: Create WebSocket with proper authentication
  2. Stream audio: Send chunks at regular intervals
  3. Handle responses: Process partial and final transcripts
  4. End session: Send {"type": "end"} when done
  5. Close connection: Gracefully close the WebSocket

Graceful Shutdown

To properly close a session, send the end token and wait for the server to respond with is_last=true before closing the WebSocket connection:
function endTranscription(ws) {
  // Send end signal
  ws.send(JSON.stringify({ type: "end" }));
  
  // Wait for is_last=true response before closing
  ws.onmessage = (event) => {
    const message = JSON.parse(event.data);
    if (message.is_last === true) {
      ws.close(1000, "Transcription complete");
    }
  };
}
Do not close the WebSocket immediately after sending the end token. Always wait for the is_last=true response to ensure all audio has been processed and final transcripts are received.

Latency Optimization

Minimize Processing Delays

  • Preprocess offline: Convert audio format before streaming
  • Use optimal encoding: linear16 at 16 kHz for best latency/quality balance
  • Consistent chunking: Avoid variable chunk sizes that cause processing delays

Network Optimization

  • Stable connection: Use reliable network connections
  • Monitor bandwidth: Ensure sufficient bandwidth for audio streaming
  • Reduce overhead: Minimize unnecessary data in WebSocket messages

Quality Checklist

  1. Use 16 kHz mono linear16 whenever possible for optimal latency
  2. Stream in 4096-byte chunks at 50-100ms intervals
  3. Handle partial transcripts for immediate user feedback
  4. Store final transcripts for accuracy and persistence
  5. Implement reconnection logic for production reliability
  6. Monitor session state to detect and handle errors gracefully
  7. Test with real audio to validate latency and accuracy

Performance Tips

For Low Latency

  • Use linear16 encoding at 16 kHz
  • Stream chunks every 50ms
  • Process responses asynchronously
  • Avoid blocking operations in message handlers

For High Accuracy

  • Use higher sample rates (44.1 kHz or 48 kHz) when latency allows
  • Enable word_timestamps for precise timing
  • Wait for is_final=true before committing transcripts
  • Use full_transcript for complete session text

For Production

  • Implement connection pooling for multiple sessions
  • Add rate limiting to prevent overwhelming the API
  • Log session IDs for debugging and support
  • Monitor transcription quality and latency metrics