Skip to main content Latency Metrics
Time-to-First-Transcript (TTFT)
Our Pulse STT model provides State of the art TTFT latency of ~64ms , which is one of the least in the world.
TTFT (Time to First Transcript) measures the latency between when a user stops speaking and when the model returns the complete transcript. Lower TTFT means faster response times and better user experience in real-time applications. Model Latency (ms) Smallest Pulse STT 64 Deepgram Nova 2 76 Deepgram Nova 3 71 Assembly AI Universal 698
Accuracy Metrics
Word Error Rate (WER)
All models were evaluated on the FLEURS dataset, a standardised multilingual speech benchmark ensuring fair cross-model comparison.
Language WER English 5.1% Italian 4.2% Spanish 5.4% Hindi 11.4%
Throughput
Requests Per Second
Audio Length HTTP POST Short (< 5s) 50-100 Medium (5-30s) 20-50 Long (30s+) 10-20
Throughput varies based on audio length, format, and server load
Linear16 (PCM)
Latency : Lowest (~64ms)
Accuracy : Highest
Bandwidth : Highest
Best for : High-quality applications
Opus
Latency : Low (~70-80ms)
Accuracy : High
Bandwidth : Low
Best for : Browser/mobile applications
FLAC
Latency : Medium (~80-90ms)
Accuracy : Highest
Bandwidth : Medium
Best for : Archival/quality-critical use cases
μ-law
Latency : Low (~65-75ms)
Accuracy : Good
Bandwidth : Lowest
Best for : Telephony applications
Italian : 4.2% WER, ~64ms latency
English : 5.1% WER, ~64ms latency
Spanish : 5.4% WER, ~64ms latency
Portuguese : 7.1% WER, ~64ms latency
German : 8.5% WER, ~64ms latency
French : 9.2% WER, ~64ms latency
Regional Variations
Indian Languages : 10-15% WER, ~90-100ms latency
Eastern European : 9-12% WER, ~85-95ms latency
Diarization
Latency Impact : +10-20ms
Accuracy Impact : Minimal
Use When : Multiple speakers present
Word Timestamps
Latency Impact : +5-10ms
Accuracy Impact : None
Use When : Timing information needed
Emotion Detection
Latency Impact : +15-25ms
Accuracy Impact : None
Use When : Emotion analysis required
Optimization Tips
Use 16kHz sample rate for optimal balance
Choose linear16 format for lowest latency
Enable only needed features to reduce latency
Batch process when latency isn’t critical
Next Steps