Real-time streaming best practices

Follow these recommendations to keep Lightning STT latencies low while preserving transcript fidelity in real-time scenarios.

Chunk Size and Streaming Rate

Recommended Chunk Size

Optimal: 4096 bytes per chunk
Range: 1024 to 8192 bytes
Consistency: Maintain consistent chunk sizes when possible

Sending audio in 4096-byte chunks provides the best balance between latency and processing efficiency.

Streaming Rate

Interval: Send chunks every 50-100ms
Avoid: Sending chunks too rapidly (< 20ms) or too slowly (> 200ms)
Consistency: Maintain regular intervals for predictable latency

// Good: Consistent 50ms intervals
setTimeout(sendChunk, 50);

// Avoid: Variable or very short intervals
setTimeout(sendChunk, Math.random() * 10); // Too fast and inconsistent

Handling Partial vs Final Transcripts

The API sends two types of transcripts:

Partial Transcripts (`is_final: false`)

Purpose: Show interim results for immediate user feedback
Behavior: May change as more audio is processed
Use case: Display “live” transcription as the user speaks

if (!message.is_final) {
  // Show partial transcript with visual indicator (e.g., grayed out)
  displayPartialTranscript(message.transcript);
}

Final Transcripts (`is_final: true`)

Purpose: Confirmed transcription for a segment
Behavior: Stable and won’t change
Use case: Store in database, display as confirmed text

if (message.is_final) {
  // Store final transcript
  saveTranscript(message.full_transcript);
  // Update UI with confirmed text
  displayFinalTranscript(message.full_transcript);
}

Audio Preprocessing

Before Streaming

Convert to correct format: Ensure audio matches the encoding parameter (linear16, linear32, alaw, mulaw, opus, ogg_opus)
Set sample rate: Match the sample_rate parameter in your WebSocket URL
Mono channel: Downmix stereo/multi-channel to mono
Normalize levels: Prevent clipping and ensure consistent volume

Example Preprocessing

import numpy as np
import soundfile as sf

def preprocess_audio(input_path, target_sample_rate=16000):
    """Preprocess audio for WebSocket streaming"""
    audio, sample_rate = sf.read(input_path)
    
    # Convert to mono
    if len(audio.shape) > 1:
        audio = np.mean(audio, axis=1)
    
    # Resample if needed
    if sample_rate != target_sample_rate:
        from scipy import signal
        audio = signal.resample(audio, int(len(audio) * target_sample_rate / sample_rate))
    
    # Normalize to prevent clipping
    max_val = np.abs(audio).max()
    if max_val > 0:
        audio = audio / max_val * 0.95
    
    # Convert to 16-bit PCM
    audio_int16 = (audio * 32767).astype(np.int16)
    
    return audio_int16, target_sample_rate

Error Handling and Reconnection

Connection Errors

Implement robust error handling for network issues:

let reconnectAttempts = 0;
const maxReconnectAttempts = 5;

function connect() {
  const ws = new WebSocket(url.toString());
  
  ws.onerror = (error) => {
    console.error("WebSocket error:", error);
  };
  
  ws.onclose = (event) => {
    if (event.code !== 1000 && reconnectAttempts < maxReconnectAttempts) {
      reconnectAttempts++;
      const delay = Math.min(1000 * Math.pow(2, reconnectAttempts), 30000);
      console.log(`Reconnecting in ${delay}ms...`);
      setTimeout(connect, delay);
    }
  };
  
  ws.onopen = () => {
    reconnectAttempts = 0; // Reset on successful connection
  };
  
  return ws;
}

Handling Connection Drops

Detect drops: Monitor connection state and implement heartbeat/ping
Buffer audio: Store audio chunks during disconnection
Resume streaming: Continue from where you left off after reconnection

Session Management

Session Lifecycle

Establish connection: Create WebSocket with proper authentication
Stream audio: Send chunks at regular intervals
Handle responses: Process partial and final transcripts
End session: Send {"type": "end"} when done
Close connection: Gracefully close the WebSocket

Graceful Shutdown

To properly close a session, send the end token and wait for the server to respond with is_last=true before closing the WebSocket connection:

function endTranscription(ws) {
  // Send end signal
  ws.send(JSON.stringify({ type: "end" }));
  
  // Wait for is_last=true response before closing
  ws.onmessage = (event) => {
    const message = JSON.parse(event.data);
    if (message.is_last === true) {
      ws.close(1000, "Transcription complete");
    }
  };
}

Do not close the WebSocket immediately after sending the end token. Always wait for the is_last=true response to ensure all audio has been processed and final transcripts are received.

Latency Optimization

Minimize Processing Delays

Preprocess offline: Convert audio format before streaming
Use optimal encoding: linear16 at 16 kHz for best latency/quality balance
Consistent chunking: Avoid variable chunk sizes that cause processing delays

Network Optimization

Stable connection: Use reliable network connections
Monitor bandwidth: Ensure sufficient bandwidth for audio streaming
Reduce overhead: Minimize unnecessary data in WebSocket messages

Quality Checklist

Use 16 kHz mono linear16 whenever possible for optimal latency
Stream in 4096-byte chunks at 50-100ms intervals
Handle partial transcripts for immediate user feedback
Store final transcripts for accuracy and persistence
Implement reconnection logic for production reliability
Monitor session state to detect and handle errors gracefully
Test with real audio to validate latency and accuracy

Performance Tips

For Low Latency

Use linear16 encoding at 16 kHz
Stream chunks every 50ms
Process responses asynchronously
Avoid blocking operations in message handlers

For High Accuracy

Use higher sample rates (44.1 kHz or 48 kHz) when latency allows
Enable word_timestamps for precise timing
Wait for is_final=true before committing transcripts
Use full_transcript for complete session text

For Production

Implement connection pooling for multiple sessions
Add rate limiting to prevent overwhelming the API
Log session IDs for debugging and support
Monitor transcription quality and latency metrics

Introduction

Getting Started

Text to Speech

Speech to Text

Voice Cloning

Integrations

Best Practices

Best Practices

Real-time streaming best practices

Chunk Size and Streaming Rate

Recommended Chunk Size

Streaming Rate

Handling Partial vs Final Transcripts

Partial Transcripts (`is_final: false`)

Final Transcripts (`is_final: true`)

Audio Preprocessing

Before Streaming

Example Preprocessing

Error Handling and Reconnection

Connection Errors

Handling Connection Drops

Session Management

Session Lifecycle

Graceful Shutdown

Latency Optimization

Minimize Processing Delays

Network Optimization

Quality Checklist

Performance Tips

For Low Latency

For High Accuracy

For Production

Introduction

Getting Started

Text to Speech

Speech to Text

Voice Cloning

Integrations

Best Practices

​Real-time streaming best practices

​Chunk Size and Streaming Rate

​Recommended Chunk Size

​Streaming Rate

​Handling Partial vs Final Transcripts

​Partial Transcripts (is_final: false)

​Final Transcripts (is_final: true)

​Audio Preprocessing

​Before Streaming

​Example Preprocessing

​Error Handling and Reconnection

​Connection Errors

​Handling Connection Drops

​Session Management

​Session Lifecycle

​Graceful Shutdown

​Latency Optimization

​Minimize Processing Delays

​Network Optimization

​Quality Checklist

​Performance Tips

​For Low Latency

​For High Accuracy

​For Production

Real-time streaming best practices

Chunk Size and Streaming Rate

Recommended Chunk Size

Streaming Rate

Handling Partial vs Final Transcripts

Partial Transcripts (`is_final: false`)

Final Transcripts (`is_final: true`)

Audio Preprocessing

Before Streaming

Example Preprocessing

Error Handling and Reconnection

Connection Errors

Handling Connection Drops

Session Management

Session Lifecycle

Graceful Shutdown

Latency Optimization

Minimize Processing Delays

Network Optimization

Quality Checklist

Performance Tips

For Low Latency

For High Accuracy

For Production