Metrics Overview - Waves

Accuracy metrics
Word Error Rate (WER)
Character Error Rate (CER)
Sentence accuracy
Latency & throughput
Time to First Result (TTFR)
End-to-end latency
Real-Time Factor (RTF)
Enrichment quality
Coverage & robustness
Operational metrics
Reporting checklist

Lightning STT evaluations revolve around four pillars:

Accuracy – how close transcripts are to the ground truth.
Latency & throughput – how quickly and efficiently results arrive.
Enrichment quality – how reliable diarization, timestamps, and metadata are.

Accuracy metrics

Word Error Rate (WER)

Formula: WER = (Substitutions + Deletions + Insertions) / Total Words.
Interprets overall transcript fidelity; normalize casing/punctuation before computing.

Character Error Rate (CER)

Parallel to WER but at the character level; useful for languages with compact scripts or heavy compounding.

Sentence accuracy

Percentage of sentences that match ground truth exactly.
Tracks readability for QA/customer-support recaps.

Latency & throughput

Time to First Result (TTFR)

Measures the delay between request start and first interim token.
For real-time agents, keep TTFR below ~30 ms to maintain natural turn-taking.

End-to-end latency

Wall-clock time from submission to final transcript.
Report p50/p90/p95 to capture outliers introduced by long files or retries.

Real-Time Factor (RTF)

RTF = Processing Time / Audio Duration.
Values less than 1 indicate faster-than-real-time processing; Lightning STT typically runs near 0.4 RTF on clean inputs.

Enrichment quality

Metric	What to watch	Why it matters
Diarization accuracy	% of words with correct `speaker_id`	Call-center QA, coaching, compliance
Word timestamp drift	Gap between predicted and reference timestamps	Subtitle alignment and editing
Sentence-level timestamps	% of audio covered by `utterances` segments	Chaptering, meeting notes
Emotion/age/gender precision	Confidence distribution	Routing, analytics, compliance flags

Coverage & robustness

Language detection accuracy: share of files that land on the intended ISO 639-1 code or auto-detected language.
Noise robustness: WER delta at multiple SNR levels (clean vs. +5 dB noise, etc.).
Accent/domain diversity: track WER per accent or scenario (support, media, meetings) to avoid blind spots.

Operational metrics

Requests per second / concurrent sessions: validate you stay within quota and plan scaling needs.
Cost per minute: Lightning STT bills per second at $0.025/minute list price—include enrichment toggles when modeling cost.
Retry volume: differentiate infrastructure retries (HTTP 5xx) from transcription failures to spot upstream vs downstream issues.

Reporting checklist

Describe dataset composition (language, accent, domain, duration).
Publish WER/CER, TTFR, and RTF with averages and percentiles.
Include enrichment coverage (how many segments include diarization/timestamps).
Summarize cost/latency impact when enabling optional features.
Link to reproducible scripts or notebooks for auditing.

Performance Evaluation Walkthrough

⌘I