32+ Languages
Automatic detection
Low Latency
Real-time streaming
Diarization
Speaker identification
Emotion
Tone analysis
Model Overview
| Developed by | Smallest AI |
| Model type | Speech-to-Text / Automatic Speech Recognition |
| Languages | 32+ (auto-detection or ISO 639-1 codes) |
| License | Proprietary |
| Input formats | WAV, MP3, FLAC, OGG, WebM |
| Modes | Pre-recorded (HTTP) and Real-time (WebSocket) |
Key Capabilities
Pre-Recorded
Transcribe audio files via HTTP POST — raw bytes or URL input.
Real-Time Streaming
WebSocket-based live transcription with sub-second latency.
Rich Features
Word timestamps, speaker diarization, PII redaction, emotion and age/gender detection.
Detailed benchmarks and evaluation coming soon Full model card with performance benchmarks, accuracy metrics, and comparison data is in progress. For current capabilities, see the Pulse API Reference and STT Benchmarks.

