Waves ASR WebSocket API

The ASR (Automatic Speech Recognition) WebSocket API provides real-time speech-to-text transcription capabilities. This API accepts audio streams and returns transcribed text with support for multiple languages and configurable parameters.

Key Features

Real-time Transcription: Stream audio and receive instant transcription results
Multi-language Support: English and Hindi with mixed language capabilities
Multiple Audio Formats: Support for linear16, FLAC, μ-law, and Opus encoding
Configurable Parameters: Customize sample rates, punctuation and more
Voice Activity Detection: Optional voice activity events for enhanced control
Sensitive Data Redaction: Built-in PCI, SSN, and number redaction capabilities

Endpoint

Production URL: wss://waves-api.smallest.ai/api/v1/asr

Authentication

For authentication details, see the Authentication Guide.

Subscription Requirements

ASR functionality is exclusively available to Enterprise Monthly or Enterprise Yearly subscribers.

Quick Start

Obtain API Key: Get your API key from the Waves platform
Connect: Establish WebSocket connection with authentication
Configure: Set audio parameters via query strings
Stream: Send audio data as binary messages
Receive: Get real-time transcription results

Supported Languages

Language	Code	Notes
English	`en`	High accuracy
Hindi	`hi`	Supports mixed English-Hindi
Spanish	`es`	-
French	`fr`	-
German	`de`	-
Russian	`ru`	-
Portuguese	`pt`	-
Japanese	`ja`	-
Italian	`it`	-
Dutch	`nl`	-
Chinese Mandarin	`zh`	Available on request
Chinese Cantonese	`zh-hk`	Available on request
Turkish	`tr`	Available on request
Vietnamese	`vi`	Available on request
Thai	`th`	Available on request
Indonesian	`id`	Available on request
Ukrainian	`uk`	Available on request
Tamil	`ta`	Available on request
Marathi	`mr`	Available on request
Telugu	`te`	Available on request
Polish	`pl`	Available on request
Greek	`el`	Available on request
Hungarian	`hu`	Available on request
Romanian	`ro`	Available on request
Czech	`cs`	Available on request
Swedish	`sv`	Available on request
Bulgarian	`bg`	Available on request
Danish	`da`	Available on request
Finnish	`fi`	Available on request

Audio Format Support

Format	Description	Use Case
linear16	16-bit linear PCM	High quality, recommended
flac	FLAC compressed	Compressed audio files
mulaw	μ-law encoded	Telephony applications
opus	Opus compressed	Browser-native formats

Response Types

The API provides three types of responses:

Final Results: Complete transcriptions for speech segments
End of Turn: Indicates completion of a speech turn

Error Handling

The API provides detailed error messages for:

Invalid parameters
Authentication failures
Audio format mismatches
Connection timeouts
Subscription issues

Pricing

Default Rate: $0.025 per minute
Billing: Per second of audio processed
Custom Rates: Available for Enterprise plans

Introduction

Getting Started

Text to Speech

Speech to Text (Automatic Speech Recognition)

Voice Cloning

Integrations

Product

Best Practices

ASR WebSocket API Overview

Waves ASR WebSocket API

Key Features

Endpoint

Authentication

Subscription Requirements

Quick Start

Supported Languages

Audio Format Support

Response Types

Error Handling

Pricing

Introduction

Getting Started

Text to Speech

Speech to Text (Automatic Speech Recognition)

Voice Cloning

Integrations

Product

Best Practices

​Waves ASR WebSocket API

​Key Features

​Endpoint

​Authentication

​Subscription Requirements

​Quick Start

​Supported Languages

​Audio Format Support

​Response Types

​Error Handling

​Pricing

Waves ASR WebSocket API

Key Features

Endpoint

Authentication

Subscription Requirements

Quick Start

Supported Languages

Audio Format Support

Response Types

Error Handling

Pricing