Performance

Latency Metrics
Time-to-First-Transcript (TTFT)
Accuracy Metrics
Word Error Rate (WER)
Throughput
Requests Per Second
Performance by Audio Format
Linear16 (PCM)
Opus
FLAC
μ-law
Performance by Language
High-Performance Languages
Regional Variations
Feature Impact on Performance
Diarization
Word Timestamps
Emotion Detection
Optimization Tips
Next Steps

Latency Metrics

Time-to-First-Transcript (TTFT)

Our Pulse STT model provides State of the art TTFT latency of ~64ms, which is one of the least in the world.

TTFT Comparison Analysis

TTFT (Time to First Transcript) measures the latency between when a user stops speaking and when the model returns the complete transcript. Lower TTFT means faster response times and better user experience in real-time applications.

Model	Latency (ms)
Smallest Pulse STT	64
Deepgram Nova 2	76
Deepgram Nova 3	71
Assembly AI Universal	698

Accuracy Metrics

Word Error Rate (WER)

All models were evaluated on the FLEURS dataset, a standardised multilingual speech benchmark ensuring fair cross-model comparison.

Language	WER
English	5.1%
Italian	4.2%
Spanish	5.4%
Hindi	11.4%

Throughput

Requests Per Second

Audio Length	HTTP POST
Short (< 5s)	50-100
Medium (5-30s)	20-50
Long (30s+)	10-20

Throughput varies based on audio length, format, and server load

Performance by Audio Format

Linear16 (PCM)

Latency: Lowest (~64ms)
Accuracy: Highest
Bandwidth: Highest
Best for: High-quality applications

Opus

Latency: Low (~70-80ms)
Accuracy: High
Bandwidth: Low
Best for: Browser/mobile applications

FLAC

Latency: Medium (~80-90ms)
Accuracy: Highest
Bandwidth: Medium
Best for: Archival/quality-critical use cases

μ-law

Latency: Low (~65-75ms)
Accuracy: Good
Bandwidth: Lowest
Best for: Telephony applications

Performance by Language

High-Performance Languages

Italian: 4.2% WER, ~64ms latency
English: 5.1% WER, ~64ms latency
Spanish: 5.4% WER, ~64ms latency
Portuguese: 7.1% WER, ~64ms latency
German: 8.5% WER, ~64ms latency
French: 9.2% WER, ~64ms latency

Regional Variations

Indian Languages: 10-15% WER, ~90-100ms latency
Eastern European: 9-12% WER, ~85-95ms latency

Feature Impact on Performance

Diarization

Latency Impact: +10-20ms
Accuracy Impact: Minimal
Use When: Multiple speakers present

Word Timestamps

Latency Impact: +5-10ms
Accuracy Impact: None
Use When: Timing information needed

Emotion Detection

Latency Impact: +15-25ms
Accuracy Impact: None
Use When: Emotion analysis required

Optimization Tips

Use 16kHz sample rate for optimal balance
Choose linear16 format for lowest latency
Enable only needed features to reduce latency
Batch process when latency isn’t critical

Next Steps

Inverse Text Normalization (ITN)Metrics Overview

Getting Started

Text to Speech (Lightning)

Speech to Text (Pulse)

Cookbooks

Voice Cloning

Integrations

Best Practices

Latency Metrics

Time-to-First-Transcript (TTFT)

Accuracy Metrics

Word Error Rate (WER)

Throughput

Requests Per Second

Performance by Audio Format

Linear16 (PCM)

Opus

FLAC

μ-law

Performance by Language

High-Performance Languages

Regional Variations

Feature Impact on Performance

Diarization

Word Timestamps

Emotion Detection

Optimization Tips

Next Steps

Getting Started

Text to Speech (Lightning)

Speech to Text (Pulse)

Cookbooks

Voice Cloning

Integrations

Best Practices

​Latency Metrics

​Time-to-First-Transcript (TTFT)

​Accuracy Metrics

​Word Error Rate (WER)

​Throughput

​Requests Per Second

​Performance by Audio Format

​Linear16 (PCM)

​Opus

​FLAC

​μ-law

​Performance by Language

​High-Performance Languages

​Regional Variations

​Feature Impact on Performance

​Diarization

​Word Timestamps

​Emotion Detection

​Optimization Tips

​Next Steps

Latency Metrics

Time-to-First-Transcript (TTFT)

Accuracy Metrics

Word Error Rate (WER)

Throughput

Requests Per Second

Performance by Audio Format

Linear16 (PCM)

Opus

FLAC

μ-law

Performance by Language

High-Performance Languages

Regional Variations

Feature Impact on Performance

Diarization

Word Timestamps

Emotion Detection

Optimization Tips

Next Steps