Audio Specifications

This guide covers supported encoding formats, sample rates, and recommendations for optimal real-time transcription quality.

Supported Encoding Formats

The Lightning STT WebSocket API supports the following audio encoding formats for real-time streaming:

Encoding	Description	Use Case
`linear16`	16-bit linear PCM	Recommended for best quality
`linear32`	32-bit linear PCM	High-fidelity audio
`alaw`	A-law encoding	Telephony systems
`mulaw`	μ-law encoding	Telephony systems (North America)
`opus`	Opus compressed audio	Low bandwidth, high quality
`ogg_opus`	Ogg Opus container	Ogg container with Opus codec

Supported Sample Rates

Sample rate is the number of times the audio signal is measured per second. A higher sample rate naturally implies audio of better detail and higher quality. However it increases the size of the audio file. The WebSocket API supports the following sample rates:

8000 Hz
16000 Hz
22050 Hz
24000 Hz
44100 Hz
48000 Hz

Audio Requirements

Chunk Size

The recommended size is 4096 bytes per chunk. Sending audio in consistent 4096-byte chunks helps maintain optimal latency and processing efficiency. It minimizes the tradeoff between processing latency and network latency, finding the right fit between number of requests and the size of each request.

Channels

Currently, we support only single-channel (mono) transcription. Multi-channel support is coming soon.

Streaming Rate

For optimal real-time performance:

Stream chunks at regular intervals (e.g., every 50-100ms)
Maintain consistent chunk sizes when possible
Avoid sending chunks too rapidly or too slowly

Format Recommendations

Best Quality (Default)

Use 16 kHz mono Linear PCM (linear16) for the optimal mix of accuracy and processing speed:

Encoding: linear16
Sample Rate: 16000 Hz
Channels: Mono
Chunk Size: 4096 bytes

Telephony Quality

Use 8 kHz μ-law or A-law encoding for low bandwidth usage:

Encoding: mulaw or alaw
Sample Rate: 8000 Hz
Channels: Mono
Chunk Size: 4096 bytes

High Fidelity

For broadcast or high-quality scenarios, use higher sample rates:

Encoding: linear16 or linear32
Sample Rate: 44100 or 48000 Hz
Channels: Mono
Chunk Size: 4096 bytes

Audio Preprocessing

Before streaming audio to the WebSocket API, ensure your audio is:

Converted to the correct format: Use the specified encoding (linear16, linear32, alaw, mulaw, opus, or ogg_opus)
Set to the correct sample rate: Match the sample_rate parameter in your WebSocket URL
Mono channel: Downmix stereo or multi-channel audio to mono
Properly chunked: Split audio into 4096-byte chunks for streaming

Example: Converting Audio for Streaming

import numpy as np
import soundfile as sf

# Read audio file
audio, sample_rate = sf.read('input.wav')

# Convert to mono if stereo
if len(audio.shape) > 1:
    audio = np.mean(audio, axis=1)

# Resample to 16 kHz if needed
if sample_rate != 16000:
    from scipy import signal
    audio = signal.resample(audio, int(len(audio) * 16000 / sample_rate))

# Convert to 16-bit PCM
audio_int16 = (audio * 32767).astype(np.int16)

# Split into 4096-byte chunks
chunk_size = 4096
chunks = [audio_int16[i:i+chunk_size//2] for i in range(0, len(audio_int16), chunk_size//2)]

Query Parameters

Specify encoding and sample rate in the WebSocket connection URL:

const url = new URL("wss://waves-api.smallest.ai/api/v1/lightning/get_text");
url.searchParams.append("encoding", "linear16");
url.searchParams.append("sample_rate", "16000");

Introduction

Getting Started

Text to Speech

Speech to Text

Voice Cloning

Integrations

Best Practices

Audio Specifications

Supported Encoding Formats

Supported Sample Rates

Audio Requirements

Chunk Size

Channels

Streaming Rate

Format Recommendations

Best Quality (Default)

Telephony Quality

High Fidelity

Audio Preprocessing

Example: Converting Audio for Streaming

Query Parameters

Introduction

Getting Started

Text to Speech

Speech to Text

Voice Cloning

Integrations

Best Practices

​Supported Encoding Formats

​Supported Sample Rates

​Audio Requirements

​Chunk Size

​Channels

​Streaming Rate

​Format Recommendations

​Best Quality (Default)

​Telephony Quality

​High Fidelity

​Audio Preprocessing

​Example: Converting Audio for Streaming

​Query Parameters

Supported Encoding Formats

Supported Sample Rates

Audio Requirements

Chunk Size

Channels

Streaming Rate

Format Recommendations

Best Quality (Default)

Telephony Quality

High Fidelity

Audio Preprocessing

Example: Converting Audio for Streaming

Query Parameters