Audio Specifications

Input Methods

Our API supports two input methods for transcribing audio:

Method	Content Type	Use Case
Raw Bytes	`application/octet-stream`	Upload audio files directly from your system
Audio URL	`application/json`	Process audio files hosted on a remote server

Supported Formats

The Pulse STT API supports a wide range of audio formats for pre-recorded transcription.

Format	Extension	Codec	Notes
WAV	`.wav`	PCM, Linear PCM	Recommended for best quality
MP3	`.mp3`	MPEG Audio Layer III	Widely compatible
FLAC	`.flac`	Free Lossless Audio Codec	Lossless compression
OGG	`.ogg`	Vorbis, Opus	Open source format
M4A	`.m4a`	AAC, ALAC	Apple format
WebM	`.webm`	Opus, Vorbis	Web-optimized

Audio Requirements

Sample Rate

Recommended: 16 kHz (16,000 Hz)
Supported range: All frequencies
Optimal: 16 kHz mono for speech recognition

Channels

Currently we support only single channel transcription. We are bringing in multi-channel support soon.

Limits

Maximum size: No limit on file size
Session timeout: 10 minutes per Session

It is recommended to split the file into chunks and then upload them in parallel for faster processing.

Format Recommendations

Best Quality

Use 16 kHz mono Linear PCM (audio/wav) for the optimal mix of accuracy and processing speed. This configuration mirrors Waves’ recommended production setup for real-time speech workloads.

Format: WAV (Linear PCM)
Sample Rate: 16 kHz
Channels: Mono
Bit Depth: 16-bit

Balanced (Telephony & Voice)

Use 8 kHz μ-law encoded with 8-bit encoding for low bandwidth usage. It provides standard quality for voice-only applications like phone calls.

Format: MP3 or μ-law
Sample Rate: 8 kHz
Channels: Mono
Bitrate: 64–96 kbps

Web-Optimized / High Fidelity

For broadcast, captioning, or multimedia scenarios, it is recommended to capture higher sample rates (44.1–48 kHz). Due to the higher quality requirements, bandwidth and processing times would be on the higher side.

Format: WebM (Opus) or FLAC
Sample Rate: 44.1–48 kHz
Channels: Mono or Stereo (downmix before upload)
Bitrate: 96–160 kbps

Getting Started

Text to Speech (Lightning)

Speech to Text (Pulse)

Cookbooks

Voice Cloning

Integrations

Best Practices

Audio Specifications

Input Methods

Supported Formats

Audio Requirements

Sample Rate

Channels

Limits

Format Recommendations

Best Quality

Balanced (Telephony & Voice)

Web-Optimized / High Fidelity

Getting Started

Text to Speech (Lightning)

Speech to Text (Pulse)

Cookbooks

Voice Cloning

Integrations

Best Practices

​Input Methods

​Supported Formats

​Audio Requirements

​Sample Rate

​Channels

​Limits

​Format Recommendations

​Best Quality

​Balanced (Telephony & Voice)

​Web-Optimized / High Fidelity

Input Methods

Supported Formats

Audio Requirements

Sample Rate

Channels

Limits

Format Recommendations

Best Quality

Balanced (Telephony & Voice)

Web-Optimized / High Fidelity