Pulse

Pulse is a low-latency automatic speech recognition model built for real-time transcription and pre-recorded audio processing. It supports 32+ languages with automatic language detection, speaker diarization, word-level timestamps, and emotion detection.

32+ Languages

Automatic detection

Low Latency

Real-time streaming

Diarization

Speaker identification

Emotion

Tone analysis

Model Overview


Developed by	Smallest AI
Model type	Speech-to-Text / Automatic Speech Recognition
Languages	32+ (auto-detection or ISO 639-1 codes)
License	Proprietary
Input formats	WAV, MP3, FLAC, OGG, WebM
Modes	Pre-recorded (HTTP) and Real-time (WebSocket)

Key Capabilities

Pre-Recorded

Transcribe audio files via HTTP POST — raw bytes or URL input.

Real-Time Streaming

WebSocket-based live transcription with sub-second latency.

Rich Features

Word timestamps, speaker diarization, PII redaction, emotion and age/gender detection.

Detailed benchmarks and evaluation coming soon Full model card with performance benchmarks, accuracy metrics, and comparison data is in progress. For current capabilities, see the Pulse API Reference and STT Benchmarks.

Text to Speech

Speech to Text

32+ Languages

Low Latency

Diarization

Emotion

Model Overview

Key Capabilities

Pre-Recorded

Real-Time Streaming

Rich Features

Text to Speech

Speech to Text

32+ Languages

Low Latency

Diarization

Emotion

​Model Overview

​Key Capabilities

Pre-Recorded

Real-Time Streaming

Rich Features

Model Overview

Key Capabilities