ASR WebSocket API Reference

Complete reference documentation for the Waves ASR WebSocket API parameters, responses, and error codes.

Endpoint

Production URL: wss://waves-api.smallest.ai/api/v1/asr

Connection Parameters

All parameters are passed as query strings in the WebSocket URL. Parameters are case-sensitive and unknown or invalid parameters will result in an error response and connection closure.

Core Parameters

audioLanguage
enum
default:"en"
required
Language of the audio inputValid Values:
  • en - English
  • hi - Hindi
audioEncoding
enum
default:"linear16"
required
Audio encoding formatValid Values:
  • linear16 - 16-bit linear PCM
  • flac - FLAC compressed
  • mulaw - μ-law encoded
  • opus - Opus compressed
audioSampleRate
number
default:"16000"
required
Sample rate in HzRange: 8000-48000 (integer) Recommended: 16000 for optimal performance
audioChannels
number
default:"1"
required
Number of audio channelsRange: 1+ (integer) Recommended: 1 (mono) for efficiency

Optional Parameters

addPunctuation
boolean
Add punctuation to transcriptsValues: true, false
speechEndThreshold
number
Duration in milliseconds to determine end of speechRange: 10-60000 (integer) Default: 300ms
emitVoiceActivity
boolean
Emit voice activity detection eventsValues: true, false
redactSensitiveData
array
Redact sensitive data typesValues: Comma-separated array
  • "pci" - Payment card information
  • "ssn" - Social security numbers
  • "numbers" - Generic number redaction
speechEndpointing
string|number
Speech endpointing behaviorValues:
  • "true" - Enable automatic endpointing
  • "false" - Disable endpointing
  • 10-60000 - Custom threshold in milliseconds

URL Construction Examples

Basic Connection

wss://waves-api.smallest.ai/api/v1/asr?api_key=YOUR_API_KEY&audioEncoding=linear16&audioSampleRate=16000&audioChannels=1

Advanced Configuration

wss://waves-api.smallest.ai/api/v1/asr?api_key=YOUR_API_KEY&audioLanguage=en&audioEncoding=linear16&audioSampleRate=16000&audioChannels=1&addPunctuation=true&speechEndThreshold=500&redactSensitiveData=pci,ssn

Audio Data Format

Sending Audio

Send audio data as binary messages that match your specified encoding, sample rate, and channels.

Format Specifications

16-bit Linear PCM
  • Bit depth: 16-bit signed integers
  • Byte order: Little-endian
  • Sample rate: Match audioSampleRate parameter
  • Channels: Match audioChannels parameter
  • Recommended chunk size: 32,000 bytes (1 second at 16kHz mono)
For optimal real-time performance:
Sample RateBit DepthChannelsDurationChunk Size
16kHz16-bit1 (mono)1 second32,000 bytes
16kHz16-bit1 (mono)2 seconds64,000 bytes
8kHz16-bit1 (mono)1 second16,000 bytes
44.1kHz16-bit1 (mono)1 second88,200 bytes

Response Format

The API returns JSON responses with the following structure:

Standard Response

{
    "text": "transcribed text here",
    "isEndOfTurn": false
}

Response Fields

text
string
The transcribed text content
isEndOfTurn
boolean
Indicates if this marks the end of a speech turn
  • true: End of speech segment detected
  • false: More speech expected

Response Flow Examples

1. End of Turn Detection

// Final result with end of turn
{ "text": "Hello, this is the end.", "isEndOfTurn": true }

Voice Activity Events

When emitVoiceActivity=true, additional events may be sent:
{
    "event": "voice_activity",
    "speaking": true,
    "timestamp": 1234567890
}

Error Responses

Errors are sent as JSON before closing the connection:

Error Format

{
    "message": "error message",
    "error": "detailed error info"
}

Common Error Types

Parameter Validation Errors

Authentication Errors

Connection Errors

Rate Limits & Quotas

Limit TypeEnterprise PlanNotes
Connection Timeout30 secondsIf no audio received
Concurrent ConnectionsPlan-dependentContact support for limits
Audio DurationPlan-dependentBased on subscription tier
API RatePlan-dependentRequests per minute limit