This guide shows you how to transcribe streaming audio using Smallest AI’s Lightning STT model via the WebSocket API. The Lightning model provides state-of-the-art low latencies (64ms) for TTFT (Time to First Transcript), making it an ideal choice for speech-to-text conversion during live conversations.

Real-Time Audio Transcription

The Real-Time API allows you to stream audio data and receive transcription results as the audio is processed. This is ideal for live conversations, voice assistants, and scenarios where you need immediate transcription feedback. For these scenarios, where minimizing latency is critical, stream audio in chunks of a few kilobytes over a live connection.

When to Use Real-Time Transcription

Live conversations: Transcribe phone calls, video conferences, or live events.
Voice assistants: Build interactive voice applications that respond immediately.
Streaming workflows: Process audio as it is being captured or generated.
Low-latency requirements: When you need transcription results with minimal delay.

Endpoint

WSS wss://waves-api.smallest.ai/api/v1/lightning/get_text

Authentication

Head over to the smallest console to generate an API key if not done previously. Also look at Authentication guide for more information about API keys and their usage. Include your API key in the Authorization header when establishing the WebSocket connection:

Authorization: Bearer SMALLEST_API_KEY

Example Connection

const API_KEY = "SMALLEST_API_KEY";

const url = new URL("wss://waves-api.smallest.ai/api/v1/lightning/get_text");
url.searchParams.append("language", "en");
url.searchParams.append("encoding", "linear16");
url.searchParams.append("sample_rate", "16000");
url.searchParams.append("word_timestamps", "true");

const ws = new WebSocket(url.toString(), {
  headers: {
    Authorization: `Bearer ${API_KEY}`,
  },
});

ws.onopen = () => {
  console.log("Connected to STT WebSocket");
  // Start streaming audio chunks
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log("Transcript:", data.transcript);
  console.log("Full transcript:", data.full_transcript);
  console.log("Is final:", data.is_final);
};

Example Response

The server responds with JSON messages containing transcription results:

{
  "session_id": "sess_12345abcde",
  "transcript": "Hello, how are you?",
  "is_final": true,
  "is_last": false,
  "language": "en"
}

For detailed information about response fields, see the response format documentation.

Streaming Audio

Send raw audio bytes as binary WebSocket messages. The recommended chunk size is 4096 bytes:

const audioChunk = new Uint8Array(4096);
ws.send(audioChunk);

When you’re done streaming, send an end signal:

{
  "type": "end"
}

Next Steps

Learn about supported audio formats for WebSocket streaming.
Review complete code examples for Python, Node.js, and Browser JavaScript.
Follow best practices for optimal streaming performance.
Troubleshoot common issues in the troubleshooting guide.

Introduction

Getting Started

Text to Speech

Speech to Text

Voice Cloning

Integrations

Best Practices

Quickstart

Real-Time Audio Transcription

When to Use Real-Time Transcription

Endpoint

Authentication

Example Connection

Example Response

Streaming Audio

Next Steps

Introduction

Getting Started

Text to Speech

Speech to Text

Voice Cloning

Integrations

Best Practices

​Real-Time Audio Transcription

​When to Use Real-Time Transcription

​Endpoint

​Authentication

​Example Connection

​Example Response

​Streaming Audio

​Next Steps

Real-Time Audio Transcription

When to Use Real-Time Transcription

Endpoint

Authentication

Example Connection

Example Response

Streaming Audio

Next Steps