Get Streaming Speech ⚠️
This API provides real-time text-to-speech conversion using WebSockets. This allows you to send a text message and receive audio data back in real-time.
⚠️ Deprecated: This endpoint is deprecated and will be removed in a future version. Please use Waves API instead.
Endpoint: wss://call-dev.smallest.ai/invocations_streaming?token= {authorization_token}
Contact the team at info@smallest.ai for your authorization_token.
Protocol
The WebSocket uses a half-duplex protocol wherein the client sends a json object but receives bytes in return. The default output audio sample rate is 24000.
How to use
The client can send messages with text input to the server. The messages can contain the following fields:
The text that needs to be converted to speech.
The name of the voice to be used. The voice_id should be provided as a string. The allowed values are:
- amar_indian_male
- anuja_indian_male
- saaira_indian_female
- sanu_indian_female
- muskan_indian_female
- monika_indian_female
- vijay_indian_male
- janhvi_professional_female
- govind_indian_male
- prabhu_indian_male
- sarika_north_indian_female
- rakesh_north_indian_male
- chandralekha_bengal_female
- rustom_bengal_male
- ishwara_south_indian_male
- suresh_indian_male
- anamika_whisper_female
- joy_sleepy_male
- willian_angry_male
- mehul_cheerful_male
Indicates whether a WAV header is needed in the output. Default is true.
The sample rate for the output. Default is 24000.
The language of the text that needs to be converted to speech. Allowed values are:
- ‘en’ - for English
- ’hi’ - for Hindi
Providing the correct language code helps improve the quality of the audio output. Note that both ‘hi’ and ‘en’ can work for Hinglish, but specifying the language code is beneficial.
The speed of the audio output. Adjusts the speed based on the characteristics of the voice.
Determines whether to keep the WebSocket connection open. When set to true, the connection will remain open from the server side with a timeout of 600 seconds. Useful for making requests in quick succession and reducing connection overload.
Indicates whether to remove extra silences in the audio output. This is useful for streaming input text and removing unnecessary silences from the final audio output.
Specifies whether you wish to receive an end-of-response token. The end-of-response token is represented as <END>
. The WebSocket will send this string as a response once all audio data is sent.
Specifies whether to transliterate text from Hinglish to Hindi. Since the model does not currently support Hinglish, setting this parameter to true will convert Hinglish text to Devanagari script before processing. For example, ‘Awaaz abhi hinglish support nahi karta’ will be transliterated to ‘आवाज़ अभी हिंगलिश सपोर्ट नहीं करता.‘
Quick developer demo - Python
- Replit - https://replit.com/@akshat34/Smallestai-Awaaz-Streaming-Demo
- Google Colab - https://colab.research.google.com/drive/1zeuC7GdRn3Xw7ZsDH-9dSQPnKh-5lh2u?usp=sharing