Skip to main content

Pre-recorded best practices

Follow these recommendations to keep Lightning STT latencies low while preserving transcript fidelity.

Audio preprocessing workflow

Convert with FFmpeg

# Convert to 16 kHz mono WAV (recommended ingest format)
ffmpeg -i input.mp3 -ar 16000 -ac 1 -sample_fmt s16 output.wav

# Convert to MP3 with optimal speech settings
ffmpeg -i input.wav -ar 16000 -ac 1 -b:a 128k output.mp3

Python example

from pydub import AudioSegment

audio = AudioSegment.from_file("input.mp3")
audio = audio.set_frame_rate(16000).set_channels(1)
audio.export("output.wav", format="wav")

JavaScript example

import { createFFmpeg, fetchFile } from '@ffmpeg/ffmpeg';

const ffmpeg = createFFmpeg({ log: true });
await ffmpeg.load();

ffmpeg.FS('writeFile', 'input.mp3', await fetchFile('input.mp3'));
await ffmpeg.run('-i', 'input.mp3', '-ar', '16000', '-ac', '1', 'output.wav');
const data = ffmpeg.FS('readFile', 'output.wav');

Quality checklist

  1. Use 16 kHz mono whenever possible; downsample higher-fidelity recordings.
  2. Normalize audio levels so peaks stay consistent across large batches.
  3. Remove silence at the beginning and end to avoid wasted compute.
  4. Handle multiple speakers by enabling diarization when agents and customers share a channel.
  5. Test with a sample clip before launching full backfills to validate accuracy and metadata.