Supported Encoding Formats
The Lightning STT WebSocket API supports the following audio encoding formats for real-time streaming:| Encoding | Description | Use Case |
|---|---|---|
linear16 | 16-bit linear PCM | Recommended for best quality |
linear32 | 32-bit linear PCM | High-fidelity audio |
alaw | A-law encoding | Telephony systems |
mulaw | μ-law encoding | Telephony systems (North America) |
opus | Opus compressed audio | Low bandwidth, high quality |
ogg_opus | Ogg Opus container | Ogg container with Opus codec |
Supported Sample Rates
Sample rate is the number of times the audio signal is measured per second. A higher sample rate naturally implies audio of better detail and higher quality. However it increases the size of the audio file. The WebSocket API supports the following sample rates:- 8000 Hz
- 16000 Hz
- 22050 Hz
- 24000 Hz
- 44100 Hz
- 48000 Hz
Audio Requirements
Chunk Size
The recommended size is4096 bytes per chunk.
Sending audio in consistent 4096-byte chunks helps maintain optimal latency and processing efficiency. It minimizes the tradeoff between processing latency and network latency, finding the right fit between number of requests and the size of each request.
Channels
Currently, we support only single-channel (mono) transcription. Multi-channel support is coming soon.Streaming Rate
For optimal real-time performance:- Stream chunks at regular intervals (e.g., every 50-100ms)
- Maintain consistent chunk sizes when possible
- Avoid sending chunks too rapidly or too slowly
Format Recommendations
Best Quality (Default)
Use 16 kHz mono Linear PCM (linear16) for the optimal mix of accuracy and processing speed:
Telephony Quality
Use 8 kHz μ-law or A-law encoding for low bandwidth usage:High Fidelity
For broadcast or high-quality scenarios, use higher sample rates:Audio Preprocessing
Before streaming audio to the WebSocket API, ensure your audio is:- Converted to the correct format: Use the specified encoding (linear16, linear32, alaw, mulaw, opus, or ogg_opus)
- Set to the correct sample rate: Match the
sample_rateparameter in your WebSocket URL - Mono channel: Downmix stereo or multi-channel audio to mono
- Properly chunked: Split audio into 4096-byte chunks for streaming

