ASR WebSocket API Reference
Complete reference documentation for the Waves ASR WebSocket API parameters, responses, and error codes.Endpoint
Production URL:wss://waves-api.smallest.ai/api/v1/asr
Connection Parameters
All parameters are passed as query strings in the WebSocket URL. Parameters are case-sensitive and unknown or invalid parameters will result in an error response and connection closure.Core Parameters
Language of the audio inputCurrently Supported Languages:IN Region:
en
- Englishhi
- Hindi
en
- Englishhi
- Hindies
- Spanishfr
- Frenchde
- Germanru
- Russianpt
- Portugueseja
- Japaneseit
- Italiannl
- Dutch
zh
- Chinese Mandarinzh-hk
- Chinese Cantonesetr
- Turkishvi
- Vietnameseth
- Thaiid
- Indonesianuk
- Ukrainianta
- Tamilmr
- Marathite
- Telugupl
- Polishel
- Greekhu
- Hungarianro
- Romaniancs
- Czechsv
- Swedishbg
- Bulgarianda
- Danishfi
- Finnish
Audio encoding formatValid Values:
linear16
- 16-bit linear PCMflac
- FLAC compressedmulaw
- μ-law encodedopus
- Opus compressed
Sample rate in HzRange: 8000-48000 (integer)
Recommended: 16000 for optimal performance
Number of audio channelsRange: 1+ (integer)
Recommended: 1 (mono) for efficiency
Optional Parameters
Add punctuation to transcriptsValues:
true
, false
Duration in milliseconds to determine end of speechRange: 10-60000 (integer)
Default: 300ms
Emit voice activity detection eventsValues:
true
, false
Redact sensitive data typesValues: Comma-separated array
"pci"
- Payment card information"ssn"
- Social security numbers"numbers"
- Generic number redaction
Speech endpointing behaviorValues:
"true"
- Enable automatic endpointing"false"
- Disable endpointing10-60000
- Custom threshold in milliseconds
URL Construction Examples
Basic Connection
Advanced Configuration
Audio Data Format
Sending Audio
Send audio data as binary messages that match your specified encoding, sample rate, and channels.Format Specifications
16-bit Linear PCM
- Bit depth: 16-bit signed integers
- Byte order: Little-endian
- Sample rate: Match
audioSampleRate
parameter - Channels: Match
audioChannels
parameter - Recommended chunk size: 32,000 bytes (1 second at 16kHz mono)
Recommended Chunk Sizes
For optimal real-time performance:Sample Rate | Bit Depth | Channels | Duration | Chunk Size |
---|---|---|---|---|
16kHz | 16-bit | 1 (mono) | 1 second | 32,000 bytes |
16kHz | 16-bit | 1 (mono) | 2 seconds | 64,000 bytes |
8kHz | 16-bit | 1 (mono) | 1 second | 16,000 bytes |
44.1kHz | 16-bit | 1 (mono) | 1 second | 88,200 bytes |
Response Format
The API returns JSON responses with the following structure:Standard Response
Response Fields
The transcribed text content
Indicates if this marks the end of a speech turn
true
: End of speech segment detectedfalse
: More speech expected
Response Flow Examples
1. End of Turn Detection
Voice Activity Events
WhenemitVoiceActivity=true
, additional events may be sent:
Error Responses
Errors are sent as JSON before closing the connection:Error Format
Common Error Types
Parameter Validation Errors
Invalid Parameter Range
Invalid Parameter Range
Solution: Check parameter constraints and adjust values
Invalid Enum Value
Invalid Enum Value
Solution: Use only supported enum values
Missing Required Parameter
Missing Required Parameter
Solution: Include all required parameters
Authentication Errors
Unauthorized
Unauthorized
Insufficient Subscription
Insufficient Subscription
Solution: Upgrade to Enterprise Monthly or Enterprise Yearly
Insufficient Credits
Insufficient Credits
Solution: Add credits to account or upgrade plan
Connection Errors
Socket Timeout
Socket Timeout
Solution: Ensure continuous audio streaming or implement keep-alive
Rate Limit Exceeded
Rate Limit Exceeded
Solution: Implement connection pooling and respect rate limits
Rate Limits & Quotas
Limit Type | Enterprise Plan | Notes |
---|---|---|
Connection Timeout | 30 seconds | If no audio received |
Concurrent Connections | Plan-dependent | Contact support for limits |
Audio Duration | Plan-dependent | Based on subscription tier |
API Rate | Plan-dependent | Requests per minute limit |