- Accuracy – how close transcripts are to the ground truth.
- Latency & throughput – how quickly and efficiently results arrive.
- Enrichment quality – how reliable diarization, timestamps, and metadata are.
Accuracy metrics
Word Error Rate (WER)
- Formula:
WER = (Substitutions + Deletions + Insertions) / Total Words. - Interprets overall transcript fidelity; normalize casing/punctuation before computing.
Character Error Rate (CER)
- Parallel to WER but at the character level; useful for languages with compact scripts or heavy compounding.
Sentence accuracy
- Percentage of sentences that match ground truth exactly.
- Tracks readability for QA/customer-support recaps.
Latency & throughput
Time to First Result (TTFR)
- Measures the delay between request start and first interim token.
- For real-time agents, keep TTFR below ~30 ms to maintain natural turn-taking.
End-to-end latency
- Wall-clock time from submission to final transcript.
- Report p50/p90/p95 to capture outliers introduced by long files or retries.
Real-Time Factor (RTF)
RTF = Processing Time / Audio Duration.- Values less than 1 indicate faster-than-real-time processing; Lightning STT typically runs near 0.4 RTF on clean inputs.
Enrichment quality
| Metric | What to watch | Why it matters |
|---|---|---|
| Diarization accuracy | % of words with correct speaker_id | Call-center QA, coaching, compliance |
| Word timestamp drift | Gap between predicted and reference timestamps | Subtitle alignment and editing |
| Sentence-level timestamps | % of audio covered by utterances segments | Chaptering, meeting notes |
| Emotion/age/gender precision | Confidence distribution | Routing, analytics, compliance flags |
Coverage & robustness
- Language detection accuracy: share of files that land on the intended ISO 639-1 code or auto-detected language.
- Noise robustness: WER delta at multiple SNR levels (clean vs. +5 dB noise, etc.).
- Accent/domain diversity: track WER per accent or scenario (support, media, meetings) to avoid blind spots.
Operational metrics
- Requests per second / concurrent sessions: validate you stay within quota and plan scaling needs.
- Cost per minute: Lightning STT bills per second at $0.025/minute list price—include enrichment toggles when modeling cost.
- Retry volume: differentiate infrastructure retries (HTTP 5xx) from transcription failures to spot upstream vs downstream issues.
Reporting checklist
- Describe dataset composition (language, accent, domain, duration).
- Publish WER/CER, TTFR, and RTF with averages and percentiles.
- Include enrichment coverage (how many segments include diarization/timestamps).
- Summarize cost/latency impact when enabling optional features.
- Link to reproducible scripts or notebooks for auditing.

