Skip to main contentThis page provides performance benchmarks for Lightning STT, including latency, accuracy, and throughput metrics.
Latency Metrics
End-to-End Latency
- Average Latency: ~64ms
- P50 Latency: 60-65ms (median - 50% of requests complete within this time)
- P95 Latency: 80-100ms (95% of requests complete within this time)
- P99 Latency: 100-150ms (99% of requests complete within this time)
Measured on 16kHz mono PCM audio, English language
Time-to-First-Transcript
- HTTP POST: ~64ms for complete transcription
Accuracy Metrics
Word Error Rate (WER)
All models were evaluated on the FLEURS dataset, a standardised multilingual speech benchmark ensuring fair cross-model comparison.
| Language | WER |
|---|
| English | 5.1% |
| Italian | 4.2% |
| Spanish | 5.4% |
| Hindi | 11.4% |
Throughput
Requests Per Second
| Audio Length | HTTP POST |
|---|
| Short (< 5s) | 50-100 |
| Medium (5-30s) | 20-50 |
| Long (30s+) | 10-20 |
Throughput varies based on audio length, format, and server load
Linear16 (PCM)
- Latency: Lowest (~64ms)
- Accuracy: Highest
- Bandwidth: Highest
- Best for: High-quality applications
Opus
- Latency: Low (~70-80ms)
- Accuracy: High
- Bandwidth: Low
- Best for: Browser/mobile applications
FLAC
- Latency: Medium (~80-90ms)
- Accuracy: Highest
- Bandwidth: Medium
- Best for: Archival/quality-critical use cases
μ-law
- Latency: Low (~65-75ms)
- Accuracy: Good
- Bandwidth: Lowest
- Best for: Telephony applications
- Italian: 4.2% WER, ~64ms latency
- English: 5.1% WER, ~64ms latency
- Spanish: 5.4% WER, ~64ms latency
- Portuguese: 7.1% WER, ~64ms latency
- German: 8.5% WER, ~64ms latency
- French: 9.2% WER, ~64ms latency
Regional Variations
- Indian Languages: 10-15% WER, ~90-100ms latency
- Eastern European: 9-12% WER, ~85-95ms latency
Diarization
- Latency Impact: +10-20ms
- Accuracy Impact: Minimal
- Use When: Multiple speakers present
Word Timestamps
- Latency Impact: +5-10ms
- Accuracy Impact: None
- Use When: Timing information needed
Emotion Detection
- Latency Impact: +15-25ms
- Accuracy Impact: None
- Use When: Emotion analysis required
Age/Gender Detection
- Latency Impact: +10-15ms
- Accuracy Impact: None
- Use When: Demographic analysis needed
Optimization Tips
- Use 16kHz sample rate for optimal balance
- Choose linear16 format for lowest latency
- Enable only needed features to reduce latency
- Batch process when latency isn’t critical
Next Steps