Enabling Word Timestamps
Pre-Recorded API
Addword_timestamps=true to your Pulse STT query parameters. This works for both raw-byte uploads (Content-Type: audio/wav) and JSON requests with hosted audio URLs.
Sample request
Real-Time WebSocket API
Addword_timestamps=true to your WebSocket connection query parameters when connecting to the Pulse STT WebSocket API.
Output format & field of interest
Responses include awords array with word, start, end, and confidence fields. When diarization is enabled, the array also includes speaker (integer ID for realtime, string label for pre-recorded) and speaker_confidence (0.0 to 1.0, realtime only) fields.
Pre-Recorded API Response
Real-Time WebSocket API Response
When
diarize=true is enabled, the words array also includes speaker (integer ID) and speaker_confidence (0.0 to 1.0) fields.Response Fields
| Field | Type | When Included | Description |
|---|---|---|---|
word | string | word_timestamps=true | The transcribed word |
start | number | word_timestamps=true | Start time in seconds |
end | number | word_timestamps=true | End time in seconds |
confidence | number | word_timestamps=true (realtime only) | Confidence score for the word (0.0 to 1.0) |
speaker | integer (realtime) / string (pre-recorded) | diarize=true | Speaker label. Real-time API uses integer IDs (0, 1, …), pre-recorded API uses string labels (speaker_0, speaker_1, …) |
speaker_confidence | number | diarize=true (realtime only) | Confidence score for the speaker assignment (0.0 to 1.0) |
Use Cases
- Caption generation: Create synchronized captions for video or live streams
- Subtitle tracks: Generate SRT or VTT subtitle files
- Analytics: Align transcripts with audio playback for detailed analysis
- Search: Enable time-based search within audio content

