Waves ASR WebSocket API

The ASR (Automatic Speech Recognition) WebSocket API provides real-time speech-to-text transcription capabilities. This API accepts audio streams and returns transcribed text with support for multiple languages and configurable parameters.

Key Features

  • Real-time Transcription: Stream audio and receive instant transcription results
  • Multi-language Support: English and Hindi with mixed language capabilities
  • Multiple Audio Formats: Support for linear16, FLAC, μ-law, and Opus encoding
  • Configurable Parameters: Customize sample rates, punctuation and more
  • Voice Activity Detection: Optional voice activity events for enhanced control
  • Sensitive Data Redaction: Built-in PCI, SSN, and number redaction capabilities

Endpoint

Production URL: wss://waves-api.smallest.ai/api/v1/asr

Authentication

For authentication details, see the Authentication Guide.

Subscription Requirements

ASR functionality is exclusively available to Enterprise Monthly or Enterprise Yearly subscribers.

Quick Start

  1. Obtain API Key: Get your API key from the Waves platform
  2. Connect: Establish WebSocket connection with authentication
  3. Configure: Set audio parameters via query strings
  4. Stream: Send audio data as binary messages
  5. Receive: Get real-time transcription results

Supported Languages

LanguageCodeNotes
EnglishenHigh accuracy
HindihiSupports mixed English-Hindi

Audio Format Support

FormatDescriptionUse Case
linear1616-bit linear PCMHigh quality, recommended
flacFLAC compressedCompressed audio files
mulawμ-law encodedTelephony applications
opusOpus compressedBrowser-native formats

Response Types

The API provides three types of responses:
  • Final Results: Complete transcriptions for speech segments
  • End of Turn: Indicates completion of a speech turn

Error Handling

The API provides detailed error messages for:
  • Invalid parameters
  • Authentication failures
  • Audio format mismatches
  • Connection timeouts
  • Subscription issues

Pricing

  • Default Rate: $0.025 per minute
  • Billing: Per second of audio processed
  • Custom Rates: Available for Enterprise plans