ASR WebSocket API Reference

Complete reference documentation for the Waves ASR WebSocket API parameters, responses, and error codes.

Endpoint

Production URL: wss://waves-api.smallest.ai/api/v1/asr

Connection Parameters

All parameters are passed as query strings in the WebSocket URL. Parameters are case-sensitive and unknown or invalid parameters will result in an error response and connection closure.

Core Parameters

audioLanguage

enum

default:"en"

required

Language of the audio inputCurrently Supported Languages:IN Region:

en - English
hi - Hindi

US Region:

en - English
hi - Hindi
es - Spanish
fr - French
de - German
ru - Russian
pt - Portuguese
ja - Japanese
it - Italian
nl - Dutch

Available on Request (Both Regions):

zh - Chinese Mandarin
zh-hk - Chinese Cantonese
tr - Turkish
vi - Vietnamese
th - Thai
id - Indonesian
uk - Ukrainian
ta - Tamil
mr - Marathi
te - Telugu
pl - Polish
el - Greek
hu - Hungarian
ro - Romanian
cs - Czech
sv - Swedish
bg - Bulgarian
da - Danish
fi - Finnish

audioEncoding

enum

default:"linear16"

required

Audio encoding formatValid Values:

linear16 - 16-bit linear PCM
flac - FLAC compressed
mulaw - μ-law encoded
opus - Opus compressed

audioSampleRate

number

default:"16000"

required

Sample rate in HzRange: 8000-48000 (integer) Recommended: 16000 for optimal performance

audioChannels

number

default:"1"

required

Number of audio channelsRange: 1+ (integer) Recommended: 1 (mono) for efficiency

Optional Parameters

addPunctuation

boolean

Add punctuation to transcriptsValues: true, false

speechEndThreshold

number

Duration in milliseconds to determine end of speechRange: 10-60000 (integer) Default: 300ms

emitVoiceActivity

boolean

Emit voice activity detection eventsValues: true, false

redactSensitiveData

array

Redact sensitive data typesValues: Comma-separated array

"pci" - Payment card information
"ssn" - Social security numbers
"numbers" - Generic number redaction

speechEndpointing

string|number

Speech endpointing behaviorValues:

"true" - Enable automatic endpointing
"false" - Disable endpointing
10-60000 - Custom threshold in milliseconds

URL Construction Examples

Basic Connection

wss://waves-api.smallest.ai/api/v1/asr?api_key=YOUR_API_KEY&audioEncoding=linear16&audioSampleRate=16000&audioChannels=1

Advanced Configuration

wss://waves-api.smallest.ai/api/v1/asr?api_key=YOUR_API_KEY&audioLanguage=en&audioEncoding=linear16&audioSampleRate=16000&audioChannels=1&addPunctuation=true&speechEndThreshold=500&redactSensitiveData=pci,ssn

Audio Data Format

Sending Audio

Send audio data as binary messages that match your specified encoding, sample rate, and channels.

Format Specifications

linear16
flac
mulaw
opus

16-bit Linear PCM

Bit depth: 16-bit signed integers
Byte order: Little-endian
Sample rate: Match audioSampleRate parameter
Channels: Match audioChannels parameter
Recommended chunk size: 32,000 bytes (1 second at 16kHz mono)

Recommended Chunk Sizes

For optimal real-time performance:

Sample Rate	Bit Depth	Channels	Duration	Chunk Size
16kHz	16-bit	1 (mono)	1 second	32,000 bytes
16kHz	16-bit	1 (mono)	2 seconds	64,000 bytes
8kHz	16-bit	1 (mono)	1 second	16,000 bytes
44.1kHz	16-bit	1 (mono)	1 second	88,200 bytes

Response Format

The API returns JSON responses with the following structure:

Standard Response

{
    "text": "transcribed text here",
    "isEndOfTurn": false
}

Response Fields

text

string

The transcribed text content

isEndOfTurn

boolean

Indicates if this marks the end of a speech turn

true: End of speech segment detected
false: More speech expected

Response Flow Examples

1. End of Turn Detection

// Final result with end of turn
{ "text": "Hello, this is the end.", "isEndOfTurn": true }

Voice Activity Events

When emitVoiceActivity=true, additional events may be sent:

{
    "event": "voice_activity",
    "speaking": true,
    "timestamp": 1234567890
}

Error Responses

Errors are sent as JSON before closing the connection:

Error Format

{
    "message": "error message",
    "error": "detailed error info"
}

Common Error Types

Parameter Validation Errors

Invalid Parameter Range

{
    "message": "Invalid input data",
    "error": "audioSampleRate must be at least 8000"
}

Cause: Parameter value outside allowed range
Solution: Check parameter constraints and adjust values

Invalid Enum Value

{
    "message": "Invalid input data", 
    "error": "audioLanguage must be one of the following values: en, hi"
}

Cause: Invalid enum value provided
Solution: Use only supported enum values

Missing Required Parameter

{
    "message": "Invalid input data",
    "error": "audioEncoding is required"
}

Cause: Required parameter not provided
Solution: Include all required parameters

Authentication Errors

Unauthorized

{
    "message": "Unauthorized",
    "error": "Invalid or missing API key"
}

Cause: Invalid, missing, or malformed API key
Solution: Verify API key format and validity

Insufficient Subscription

{
    "message": "No subscription",
    "error": "ASR requires Enterprise plan"
}

Cause: Account lacks required Enterprise subscription
Solution: Upgrade to Enterprise Monthly or Enterprise Yearly

Insufficient Credits

{
    "message": "Insufficient credits",
    "error": "Account balance too low"
}

Cause: Account balance insufficient for operation
Solution: Add credits to account or upgrade plan

Connection Errors

Socket Timeout

{
    "message": "socket timeout"
}

Cause: No audio received for 30 seconds
Solution: Ensure continuous audio streaming or implement keep-alive

Rate Limit Exceeded

{
    "message": "Rate limit exceeded",
    "error": "Too many concurrent connections"
}

Cause: Exceeded concurrent connection limits
Solution: Implement connection pooling and respect rate limits

Rate Limits & Quotas

Limit Type	Enterprise Plan	Notes
Connection Timeout	30 seconds	If no audio received
Concurrent Connections	Plan-dependent	Contact support for limits
Audio Duration	Plan-dependent	Based on subscription tier
API Rate	Plan-dependent	Requests per minute limit

API References

Lightning ASR

Lightning v2

Lightning Large

Lightning

Voices

Voice Cloning

Pronunciations dicts

ASR (Websocket)

ASR WebSocket API Reference

Endpoint

Connection Parameters

Core Parameters

Optional Parameters

URL Construction Examples

Basic Connection

Advanced Configuration

Audio Data Format

Sending Audio

Format Specifications

Recommended Chunk Sizes

Response Format

Standard Response

Response Fields

Response Flow Examples

1. End of Turn Detection

Voice Activity Events

Error Responses

Error Format

Common Error Types

Parameter Validation Errors

Authentication Errors

Connection Errors

Rate Limits & Quotas

API References

Lightning ASR

Lightning v2

Lightning Large

Lightning

Voices

Voice Cloning

Pronunciations dicts

​ASR WebSocket API Reference

​Endpoint

​Connection Parameters

​Core Parameters

​Optional Parameters

​URL Construction Examples

​Basic Connection

​Advanced Configuration

​Audio Data Format

​Sending Audio

​Format Specifications

​Recommended Chunk Sizes

​Response Format

​Standard Response

​Response Fields

​Response Flow Examples

​1. End of Turn Detection

​Voice Activity Events

​Error Responses

​Error Format

​Common Error Types

​Parameter Validation Errors

​Authentication Errors

​Connection Errors

​Rate Limits & Quotas

ASR WebSocket API Reference

Endpoint

Connection Parameters

Core Parameters

Optional Parameters

URL Construction Examples

Basic Connection

Advanced Configuration

Audio Data Format

Sending Audio

Format Specifications

Recommended Chunk Sizes

Response Format

Standard Response

Response Fields

Response Flow Examples

1. End of Turn Detection

Voice Activity Events

Error Responses

Error Format

Common Error Types

Parameter Validation Errors

Authentication Errors

Connection Errors

Rate Limits & Quotas