Skip to main content
Pulse is a low-latency automatic speech recognition model built for real-time transcription and pre-recorded audio processing. It supports 32+ languages with automatic language detection, speaker diarization, word-level timestamps, and emotion detection.

32+ Languages

Automatic detection

Low Latency

Real-time streaming

Diarization

Speaker identification

Emotion

Tone analysis

Model Overview

Developed bySmallest AI
Model typeSpeech-to-Text / Automatic Speech Recognition
Languages32+ (auto-detection or ISO 639-1 codes)
LicenseProprietary
Input formatsWAV, MP3, FLAC, OGG, WebM
ModesPre-recorded (HTTP) and Real-time (WebSocket)

Key Capabilities

Pre-Recorded

Transcribe audio files via HTTP POST — raw bytes or URL input.

Real-Time Streaming

WebSocket-based live transcription with sub-second latency.

Rich Features

Word timestamps, speaker diarization, PII redaction, emotion and age/gender detection.

Detailed benchmarks and evaluation coming soon Full model card with performance benchmarks, accuracy metrics, and comparison data is in progress. For current capabilities, see the Pulse API Reference and STT Benchmarks.