ASR WebSocket Best Practices

This guide covers best practices for implementing, optimizing, and troubleshooting the Waves ASR WebSocket API for production applications.

Audio Quality Optimization

Sample Rate Selection

Choose the optimal sample rate for your use case:

16 kHz (Recommended)

Best for: Speech recognition, real-time applications
  • Optimal balance of quality and performance
  • Lower bandwidth requirements
  • Faster processing times

8 kHz

Best for: Telephony applications
  • Standard for phone call quality
  • Minimal bandwidth usage
  • Good for voice-only content

44.1 kHz

Best for: High-fidelity audio
  • Music or broadcast content
  • Higher accuracy for complex audio
  • Increased bandwidth and processing time

48 kHz

Best for: Video/multimedia
  • Professional audio production
  • Maximum quality requirements
  • Higher resource consumption

Audio Format Guidelines

Recommended Configuration:
const optimalConfig = {
    audioEncoding: 'linear16',
    audioSampleRate: '16000',
    audioChannels: '1',        // Mono for efficiency
    addPunctuation: 'true',
};
Format-Specific Tips:
16-bit Linear PCM (Recommended)
  • Uncompressed, high quality
  • Predictable bandwidth usage
  • Wide compatibility
  • Best accuracy/performance ratio

Audio Preprocessing

Implement client-side audio processing for better results:
// Audio preprocessing example
function preprocessAudio(audioBuffer) {
    const processedBuffer = new Float32Array(audioBuffer.length);
    
    // 1. Normalize audio levels
    const maxValue = Math.max(...audioBuffer.map(Math.abs));
    const normalizationFactor = maxValue > 0 ? 0.8 / maxValue : 1;
    
    // 2. Apply normalization and basic filtering
    for (let i = 0; i < audioBuffer.length; i++) {
        let sample = audioBuffer[i] * normalizationFactor;
        
        // Simple high-pass filter to reduce low-frequency noise
        if (i > 0) {
            sample = sample - 0.95 * processedBuffer[i - 1];
        }
        
        processedBuffer[i] = sample;
    }
    
    return processedBuffer;
}

// Apply in audio processor
processor.onaudioprocess = (e) => {
    const inputData = e.inputBuffer.getChannelData(0);
    const processedData = preprocessAudio(inputData);
    
    // Convert to Int16 and send
    const int16Data = new Int16Array(processedData.length);
    for (let i = 0; i < processedData.length; i++) {
        int16Data[i] = Math.max(-32768, Math.min(32767, processedData[i] * 32768));
    }
    
    if (ws.readyState === WebSocket.OPEN) {
        ws.send(int16Data.buffer);
    }
};

Troubleshooting Guide

Common Issues and Solutions

Symptoms:
  • WebSocket connection fails immediately
  • Connection timeout errors
  • “Failed to connect” messages
Solutions:
  1. Verify API Key: Ensure your API key is valid and properly formatted
  2. Check Subscription: Confirm you have an active Enterprise plan
  3. Network Issues: Test connectivity and check firewall settings
  4. Rate Limiting: Implement exponential backoff for reconnections
Symptoms:
  • WebSocket connects but no transcription responses
  • Audio sent but no text returned
  • Silent failures
Solutions:
  1. Audio Format: Verify audio encoding matches parameters
  2. Audio Quality: Ensure audio contains actual speech
  3. Chunk Size: Check if audio chunks are appropriate size
  4. Parameters: Validate all connection parameters
Symptoms:
  • Slow transcription responses
  • Poor real-time performance
Solutions:
  1. Reduce Chunk Size: Use smaller audio chunks (0.5-1 second)
  2. Optimize Audio Processing: Minimize client-side processing
  3. Network Optimization: Use faster network connection
  4. Parameter Tuning: Adjust speechEndThreshold for faster responses