Skip to main content
Fastest real-time speech-to-text transcription using the Lightning STT API.
The Waves Automatic Speech Recognition (STT) stack processes audio via https://waves-api.smallest.ai/api/v1/lightning/get_text and returns low-latency transcripts with configurable languages, formats, and pricing tiers suited for enterprise deployments.

Feature highlights

Our models specialize in processing audio to preserve information that is often lost during conventional speech-to-text conversion.
  • 30+ languages – automatic language detection or ISO 639-1 codes (en, hi, etc.).
  • Diarization – identify and separate generated text into speaker turns.
  • Timestamps – receive sentence-level and word-level timing information.
  • Age prediction – estimate the age group of each speaker.
  • Gender prediction – detect the gender of speakers.
  • Emotion detection – reports emotional tone with strength of 5 core emotion types.
  • Low latency – streaming pipeline tuned for ~64 ms time to first transcript latency.

Supported languages

LanguageCode
Italianit
Spanishes
Englishen
Portuguesept
Hindihi
Germande
Frenchfr
Ukrainianuk
Russianru
Kannadakn
Malayalamml
Polishpl
Marathimr
Gujaratigu
Czechcs
Slovaksk
Telugute
Oriya (Odia)or
Dutchnl
Bengalibn
Latvianlv
Estonianet
Romanianro
Punjabipa
Finnishfi
Swedishsv
Bulgarianbg
Tamilta
Hungarianhu
Danishda
Lithuanianlt
Maltesemt
Use language=multi to auto-detect across the full list or specify one of the codes above to pin the model to a single language.

Next steps