WSS
wss://waves-api.smallest.ai
/
api
/
v1
/
asr
Messages
bearerAuth
type:http

Bearer token authentication using Smallest AI API key

AudioRequest
type:string
TranscriptionResponse
type:object

ASR WebSocket API Reference

Complete reference documentation for the Waves ASR WebSocket API parameters, responses, and error codes.

Endpoint

Production URL: wss://waves-api.smallest.ai/api/v1/asr

Connection Parameters

All parameters are passed as query strings in the WebSocket URL. Parameters are case-sensitive and unknown or invalid parameters will result in an error response and connection closure.

Core Parameters

audioLanguage
enum
default:"en"
required
Language of the audio inputCurrently Supported Languages:IN Region:
  • en - English
  • hi - Hindi
US Region:
  • en - English
  • hi - Hindi
  • es - Spanish
  • fr - French
  • de - German
  • ru - Russian
  • pt - Portuguese
  • ja - Japanese
  • it - Italian
  • nl - Dutch
Available on Request (Both Regions):
  • zh - Chinese Mandarin
  • zh-hk - Chinese Cantonese
  • tr - Turkish
  • vi - Vietnamese
  • th - Thai
  • id - Indonesian
  • uk - Ukrainian
  • ta - Tamil
  • mr - Marathi
  • te - Telugu
  • pl - Polish
  • el - Greek
  • hu - Hungarian
  • ro - Romanian
  • cs - Czech
  • sv - Swedish
  • bg - Bulgarian
  • da - Danish
  • fi - Finnish
audioEncoding
enum
default:"linear16"
required
Audio encoding formatValid Values:
  • linear16 - 16-bit linear PCM
  • flac - FLAC compressed
  • mulaw - μ-law encoded
  • opus - Opus compressed
audioSampleRate
number
default:"16000"
required
Sample rate in HzRange: 8000-48000 (integer) Recommended: 16000 for optimal performance
audioChannels
number
default:"1"
required
Number of audio channelsRange: 1+ (integer) Recommended: 1 (mono) for efficiency

Optional Parameters

addPunctuation
boolean
Add punctuation to transcriptsValues: true, false
speechEndThreshold
number
Duration in milliseconds to determine end of speechRange: 10-60000 (integer) Default: 300ms
emitVoiceActivity
boolean
Emit voice activity detection eventsValues: true, false
redactSensitiveData
array
Redact sensitive data typesValues: Comma-separated array
  • "pci" - Payment card information
  • "ssn" - Social security numbers
  • "numbers" - Generic number redaction
speechEndpointing
string|number
Speech endpointing behaviorValues:
  • "true" - Enable automatic endpointing
  • "false" - Disable endpointing
  • 10-60000 - Custom threshold in milliseconds

URL Construction Examples

Basic Connection

wss://waves-api.smallest.ai/api/v1/asr?api_key=YOUR_API_KEY&audioEncoding=linear16&audioSampleRate=16000&audioChannels=1

Advanced Configuration

wss://waves-api.smallest.ai/api/v1/asr?api_key=YOUR_API_KEY&audioLanguage=en&audioEncoding=linear16&audioSampleRate=16000&audioChannels=1&addPunctuation=true&speechEndThreshold=500&redactSensitiveData=pci,ssn

Audio Data Format

Sending Audio

Send audio data as binary messages that match your specified encoding, sample rate, and channels.

Format Specifications

16-bit Linear PCM
  • Bit depth: 16-bit signed integers
  • Byte order: Little-endian
  • Sample rate: Match audioSampleRate parameter
  • Channels: Match audioChannels parameter
  • Recommended chunk size: 32,000 bytes (1 second at 16kHz mono)
For optimal real-time performance:
Sample RateBit DepthChannelsDurationChunk Size
16kHz16-bit1 (mono)1 second32,000 bytes
16kHz16-bit1 (mono)2 seconds64,000 bytes
8kHz16-bit1 (mono)1 second16,000 bytes
44.1kHz16-bit1 (mono)1 second88,200 bytes

Response Format

The API returns JSON responses with the following structure:

Standard Response

{
    "text": "transcribed text here",
    "isEndOfTurn": false
}

Response Fields

text
string
The transcribed text content
isEndOfTurn
boolean
Indicates if this marks the end of a speech turn
  • true: End of speech segment detected
  • false: More speech expected

Response Flow Examples

1. End of Turn Detection

// Final result with end of turn
{ "text": "Hello, this is the end.", "isEndOfTurn": true }

Voice Activity Events

When emitVoiceActivity=true, additional events may be sent:
{
    "event": "voice_activity",
    "speaking": true,
    "timestamp": 1234567890
}

Error Responses

Errors are sent as JSON before closing the connection:

Error Format

{
    "message": "error message",
    "error": "detailed error info"
}

Common Error Types

Parameter Validation Errors

{
    "message": "Invalid input data",
    "error": "audioSampleRate must be at least 8000"
}
Cause: Parameter value outside allowed range
Solution: Check parameter constraints and adjust values
{
    "message": "Invalid input data", 
    "error": "audioLanguage must be one of the following values: en, hi"
}
Cause: Invalid enum value provided
Solution: Use only supported enum values
{
    "message": "Invalid input data",
    "error": "audioEncoding is required"
}
Cause: Required parameter not provided
Solution: Include all required parameters

Authentication Errors

{
    "message": "Unauthorized",
    "error": "Invalid or missing API key"
}
Cause: Invalid, missing, or malformed API key
Solution: Verify API key format and validity
{
    "message": "No subscription",
    "error": "ASR requires Enterprise plan"
}
Cause: Account lacks required Enterprise subscription
Solution: Upgrade to Enterprise Monthly or Enterprise Yearly
{
    "message": "Insufficient credits",
    "error": "Account balance too low"
}
Cause: Account balance insufficient for operation
Solution: Add credits to account or upgrade plan

Connection Errors

{
    "message": "socket timeout"
}
Cause: No audio received for 30 seconds
Solution: Ensure continuous audio streaming or implement keep-alive
{
    "message": "Rate limit exceeded",
    "error": "Too many concurrent connections"
}
Cause: Exceeded concurrent connection limits
Solution: Implement connection pooling and respect rate limits

Rate Limits & Quotas

Limit TypeEnterprise PlanNotes
Connection Timeout30 secondsIf no audio received
Concurrent ConnectionsPlan-dependentContact support for limits
Audio DurationPlan-dependentBased on subscription tier
API RatePlan-dependentRequests per minute limit