Sentence-level timestamps

Pre-Recorded Real-Time Sentence-level timestamps (utterances) are supported in both Pre-Recorded and Real-Time transcription APIs. The utterances array aggregates contiguous words into sentence-level segments, providing structured timing information for longer audio chunks.

Enabling sentence-level timestamps

Pre-Recorded API

For the Pre-Recorded API, set word_timestamps=true in your query parameters. When word timestamps are enabled, the response includes both words and utterances arrays.

Sentence-level timestamps (utterances) are supported in both Pre-Recorded and Real-Time APIs. Use sentence_timestamps=true for Real-Time API.

curl --request POST \
  --url "https://api.smallest.ai/waves/v1/pulse/get_text?language=en&word_timestamps=true&diarize=true" \
  --header "Authorization: Bearer $SMALLEST_API_KEY" \
  --header "Content-Type: audio/wav" \
  --data-binary "@/path/to/audio.wav"

Real-Time API (WebSocket)

For the Real-Time WebSocket API, set sentence_timestamps=true as a query parameter when establishing the WebSocket connection.

const url = new URL("wss://api.smallest.ai/waves/v1/pulse/get_text");
url.searchParams.append("language", "en");
url.searchParams.append("sentence_timestamps", "true");

const ws = new WebSocket(url.toString(), {
  headers: {
    Authorization: `Bearer ${API_KEY}`,
  },
});

Output format

Each utterances entry contains text, start, end, and optional speaker fields (when diarization is enabled). Use these sentence-level timestamps when you need to display readable captions, synchronize larger chunks of audio, or store structured call summaries.

Sample response

Pre-Recorded API

{
  "status": "success",
  "transcription": "Hello world. How are you?",
  "words": {...}
  "utterances": [
    { "text": "Hello world.", "start": 0.0, "end": 0.9, "speaker": "speaker_0" },
    { "text": "How are you?", "start": 1.0, "end": 2.1, "speaker": "speaker_1" }
  ]
}

This response has the speaker field due to diarize being enabled in the query.

Real-Time API (WebSocket)

{
  "session_id": "sess_12345abcde",
  "transcript": "Hello world. How are you?",
  "is_final": true,
  "is_last": false,
  "language": "en",
  "utterances": [
    { "text": "Hello world.", "start": 0.0, "end": 0.9 },
    { "text": "How are you?", "start": 1.0, "end": 2.1 }
  ]
}

When diarize=true is enabled, the utterances array also includes a speaker field (integer ID) for real-time API responses. For example: { "text": "Hello world.", "start": 0.0, "end": 0.9, "speaker": 0 }

Getting Started

Text to Speech (Lightning)

Speech to Text (Pulse)

Cookbooks

Voice Cloning

Integrations

Best Practices

Sentence-level timestamps

Enabling sentence-level timestamps

Pre-Recorded API

Real-Time API (WebSocket)

Output format

Sample response

Pre-Recorded API

Real-Time API (WebSocket)

Getting Started

Text to Speech (Lightning)

Speech to Text (Pulse)

Cookbooks

Voice Cloning

Integrations

Best Practices

​Enabling sentence-level timestamps

​Pre-Recorded API

​Real-Time API (WebSocket)

​Output format

​Sample response

​Pre-Recorded API

​Real-Time API (WebSocket)

Enabling sentence-level timestamps

Pre-Recorded API

Real-Time API (WebSocket)

Output format

Sample response

Pre-Recorded API

Real-Time API (WebSocket)