Waves ASR WebSocket API
The ASR (Automatic Speech Recognition) WebSocket API provides real-time speech-to-text transcription capabilities. This API accepts audio streams and returns transcribed text with support for multiple languages and configurable parameters.Key Features
- Real-time Transcription: Stream audio and receive instant transcription results
- Multi-language Support: English and Hindi with mixed language capabilities
- Multiple Audio Formats: Support for linear16, FLAC, μ-law, and Opus encoding
- Configurable Parameters: Customize sample rates, punctuation and more
- Voice Activity Detection: Optional voice activity events for enhanced control
- Sensitive Data Redaction: Built-in PCI, SSN, and number redaction capabilities
Endpoint
Production URL:wss://waves-api.smallest.ai/api/v1/asr
Authentication
For authentication details, see the Authentication Guide.Subscription Requirements
ASR functionality is exclusively available to Enterprise Monthly or Enterprise Yearly subscribers.
Quick Start
- Obtain API Key: Get your API key from the Waves platform
- Connect: Establish WebSocket connection with authentication
- Configure: Set audio parameters via query strings
- Stream: Send audio data as binary messages
- Receive: Get real-time transcription results
Supported Languages
Language | Code | Notes |
---|---|---|
English | en | High accuracy |
Hindi | hi | Supports mixed English-Hindi |
Spanish | es | - |
French | fr | - |
German | de | - |
Russian | ru | - |
Portuguese | pt | - |
Japanese | ja | - |
Italian | it | - |
Dutch | nl | - |
Chinese Mandarin | zh | Available on request |
Chinese Cantonese | zh-hk | Available on request |
Turkish | tr | Available on request |
Vietnamese | vi | Available on request |
Thai | th | Available on request |
Indonesian | id | Available on request |
Ukrainian | uk | Available on request |
Tamil | ta | Available on request |
Marathi | mr | Available on request |
Telugu | te | Available on request |
Polish | pl | Available on request |
Greek | el | Available on request |
Hungarian | hu | Available on request |
Romanian | ro | Available on request |
Czech | cs | Available on request |
Swedish | sv | Available on request |
Bulgarian | bg | Available on request |
Danish | da | Available on request |
Finnish | fi | Available on request |
Audio Format Support
Format | Description | Use Case |
---|---|---|
linear16 | 16-bit linear PCM | High quality, recommended |
flac | FLAC compressed | Compressed audio files |
mulaw | μ-law encoded | Telephony applications |
opus | Opus compressed | Browser-native formats |
Response Types
The API provides three types of responses:- Final Results: Complete transcriptions for speech segments
- End of Turn: Indicates completion of a speech turn
Error Handling
The API provides detailed error messages for:- Invalid parameters
- Authentication failures
- Audio format mismatches
- Connection timeouts
- Subscription issues
Pricing
- Default Rate: $0.025 per minute
- Billing: Per second of audio processed
- Custom Rates: Available for Enterprise plans