Real-Time Audio Transcription
The Real-Time API allows you to stream audio data and receive transcription results as the audio is processed. This is ideal for live conversations, voice assistants, and scenarios where you need immediate transcription feedback. For these scenarios, where minimizing latency is critical, stream audio in chunks of a few kilobytes over a live connection.When to Use Real-Time Transcription
- Live conversations: Transcribe phone calls, video conferences, or live events.
- Voice assistants: Build interactive voice applications that respond immediately.
- Streaming workflows: Process audio as it is being captured or generated.
- Low-latency requirements: When you need transcription results with minimal delay.
Endpoint
Authentication
Head over to the smallest console to generate an API key if not done previously. Also look at Authentication guide for more information about API keys and their usage. Include your API key in the Authorization header when establishing the WebSocket connection:Example Connection
Example Response
The server responds with JSON messages containing transcription results:Streaming Audio
Send raw audio bytes as binary WebSocket messages. The recommended chunk size is 4096 bytes:Next Steps
- Learn about supported audio formats for WebSocket streaming.
- Review complete code examples for Python, Node.js, and Browser JavaScript.
- Follow best practices for optimal streaming performance.
- Troubleshoot common issues in the troubleshooting guide.

