Skip to main content
This page provides performance benchmarks for Lightning STT, including latency, accuracy, and throughput metrics.

Latency Metrics

End-to-End Latency

  • Average Latency: ~64ms
  • P50 Latency: 60-65ms (median - 50% of requests complete within this time)
  • P95 Latency: 80-100ms (95% of requests complete within this time)
  • P99 Latency: 100-150ms (99% of requests complete within this time)
Measured on 16kHz mono PCM audio, English language

Time-to-First-Transcript

  • HTTP POST: ~64ms for complete transcription

Accuracy Metrics

Word Error Rate (WER)

All models were evaluated on the FLEURS dataset, a standardised multilingual speech benchmark ensuring fair cross-model comparison.
LanguageWER
English5.1%
Italian4.2%
Spanish5.4%
Hindi11.4%

Throughput

Requests Per Second

Audio LengthHTTP POST
Short (< 5s)50-100
Medium (5-30s)20-50
Long (30s+)10-20
Throughput varies based on audio length, format, and server load

Performance by Audio Format

Linear16 (PCM)

  • Latency: Lowest (~64ms)
  • Accuracy: Highest
  • Bandwidth: Highest
  • Best for: High-quality applications

Opus

  • Latency: Low (~70-80ms)
  • Accuracy: High
  • Bandwidth: Low
  • Best for: Browser/mobile applications

FLAC

  • Latency: Medium (~80-90ms)
  • Accuracy: Highest
  • Bandwidth: Medium
  • Best for: Archival/quality-critical use cases

μ-law

  • Latency: Low (~65-75ms)
  • Accuracy: Good
  • Bandwidth: Lowest
  • Best for: Telephony applications

Performance by Language

High-Performance Languages

  • Italian: 4.2% WER, ~64ms latency
  • English: 5.1% WER, ~64ms latency
  • Spanish: 5.4% WER, ~64ms latency
  • Portuguese: 7.1% WER, ~64ms latency
  • German: 8.5% WER, ~64ms latency
  • French: 9.2% WER, ~64ms latency

Regional Variations

  • Indian Languages: 10-15% WER, ~90-100ms latency
  • Eastern European: 9-12% WER, ~85-95ms latency

Feature Impact on Performance

Diarization

  • Latency Impact: +10-20ms
  • Accuracy Impact: Minimal
  • Use When: Multiple speakers present

Word Timestamps

  • Latency Impact: +5-10ms
  • Accuracy Impact: None
  • Use When: Timing information needed

Emotion Detection

  • Latency Impact: +15-25ms
  • Accuracy Impact: None
  • Use When: Emotion analysis required

Age/Gender Detection

  • Latency Impact: +10-15ms
  • Accuracy Impact: None
  • Use When: Demographic analysis needed

Optimization Tips

  • Use 16kHz sample rate for optimal balance
  • Choose linear16 format for lowest latency
  • Enable only needed features to reduce latency
  • Batch process when latency isn’t critical

Next Steps