Waves ASR WebSocket API

The ASR (Automatic Speech Recognition) WebSocket API provides real-time speech-to-text transcription capabilities. This API accepts audio streams and returns transcribed text with support for multiple languages and configurable parameters.

Key Features

  • Real-time Transcription: Stream audio and receive instant transcription results
  • Multi-language Support: English and Hindi with mixed language capabilities
  • Multiple Audio Formats: Support for linear16, FLAC, μ-law, and Opus encoding
  • Configurable Parameters: Customize sample rates, punctuation and more
  • Voice Activity Detection: Optional voice activity events for enhanced control
  • Sensitive Data Redaction: Built-in PCI, SSN, and number redaction capabilities

Endpoint

Production URL: wss://waves-api.smallest.ai/api/v1/asr

Authentication

For authentication details, see the Authentication Guide.

Subscription Requirements

ASR functionality is exclusively available to Enterprise Monthly or Enterprise Yearly subscribers.

Quick Start

  1. Obtain API Key: Get your API key from the Waves platform
  2. Connect: Establish WebSocket connection with authentication
  3. Configure: Set audio parameters via query strings
  4. Stream: Send audio data as binary messages
  5. Receive: Get real-time transcription results

Supported Languages

LanguageCodeNotes
EnglishenHigh accuracy
HindihiSupports mixed English-Hindi
Spanishes-
Frenchfr-
Germande-
Russianru-
Portuguesept-
Japaneseja-
Italianit-
Dutchnl-
Chinese MandarinzhAvailable on request
Chinese Cantonesezh-hkAvailable on request
TurkishtrAvailable on request
VietnameseviAvailable on request
ThaithAvailable on request
IndonesianidAvailable on request
UkrainianukAvailable on request
TamiltaAvailable on request
MarathimrAvailable on request
TeluguteAvailable on request
PolishplAvailable on request
GreekelAvailable on request
HungarianhuAvailable on request
RomanianroAvailable on request
CzechcsAvailable on request
SwedishsvAvailable on request
BulgarianbgAvailable on request
DanishdaAvailable on request
FinnishfiAvailable on request

Audio Format Support

FormatDescriptionUse Case
linear1616-bit linear PCMHigh quality, recommended
flacFLAC compressedCompressed audio files
mulawμ-law encodedTelephony applications
opusOpus compressedBrowser-native formats

Response Types

The API provides three types of responses:
  • Final Results: Complete transcriptions for speech segments
  • End of Turn: Indicates completion of a speech turn

Error Handling

The API provides detailed error messages for:
  • Invalid parameters
  • Authentication failures
  • Audio format mismatches
  • Connection timeouts
  • Subscription issues

Pricing

  • Default Rate: $0.025 per minute
  • Billing: Per second of audio processed
  • Custom Rates: Available for Enterprise plans