⚠️ Deprecated: This endpoint is deprecated and will be removed in a future version. Please use Waves API instead.

Endpoint: wss://call-dev.smallest.ai/invocations_streaming?token= {authorization_token}

Contact the team at info@smallest.ai for your authorization_token.

Protocol

The WebSocket uses a half-duplex protocol wherein the client sends a json object but receives bytes in return. The default output audio sample rate is 24000.

How to use

The client can send messages with text input to the server. The messages can contain the following fields:

{
  "text": "आवाज़ एक बहुत ही बढ़िया text to speech मॉडल है",
  "voice_id": "amar_indian_male",
  "add_wav_header": false,
  "sample_rate": 24000,
  "language": "hi",
  "speed": 1.3,
  "keep_ws_open": true,
  "remove_extra_silence": false,
  "transliterate": false,
  "get_end_of_response_token": true
}
text
string

The text that needs to be converted to speech.

voice_id
string

The name of the voice to be used. The voice_id should be provided as a string. The allowed values are:

  • amar_indian_male
  • anuja_indian_male
  • saaira_indian_female
  • sanu_indian_female
  • muskan_indian_female
  • monika_indian_female
  • vijay_indian_male
  • janhvi_professional_female
  • govind_indian_male
  • prabhu_indian_male
  • sarika_north_indian_female
  • rakesh_north_indian_male
  • chandralekha_bengal_female
  • rustom_bengal_male
  • ishwara_south_indian_male
  • suresh_indian_male
  • anamika_whisper_female
  • joy_sleepy_male
  • willian_angry_male
  • mehul_cheerful_male
add_wav_header
bool

Indicates whether a WAV header is needed in the output. Default is true.

sample_rate
int

The sample rate for the output. Default is 24000.

language
string

The language of the text that needs to be converted to speech. Allowed values are:

  • ‘en’ - for English
  • ’hi’ - for Hindi

Providing the correct language code helps improve the quality of the audio output. Note that both ‘hi’ and ‘en’ can work for Hinglish, but specifying the language code is beneficial.

speed
float

The speed of the audio output. Adjusts the speed based on the characteristics of the voice.

keep_ws_open
bool

Determines whether to keep the WebSocket connection open. When set to true, the connection will remain open from the server side with a timeout of 600 seconds. Useful for making requests in quick succession and reducing connection overload.

remove_extra_silence
bool

Indicates whether to remove extra silences in the audio output. This is useful for streaming input text and removing unnecessary silences from the final audio output.

get_end_of_response_token
bool

Specifies whether you wish to receive an end-of-response token. The end-of-response token is represented as <END>. The WebSocket will send this string as a response once all audio data is sent.

transliterate
bool

Specifies whether to transliterate text from Hinglish to Hindi. Since the model does not currently support Hinglish, setting this parameter to true will convert Hinglish text to Devanagari script before processing. For example, ‘Awaaz abhi hinglish support nahi karta’ will be transliterated to ‘आवाज़ अभी हिंगलिश सपोर्ट नहीं करता.‘

Quick developer demo - Python

  1. Replit - https://replit.com/@akshat34/Smallestai-Awaaz-Streaming-Demo
  2. Google Colab - https://colab.research.google.com/drive/1zeuC7GdRn3Xw7ZsDH-9dSQPnKh-5lh2u?usp=sharing

Example Code to Use - Python

################## Environment setup #######################
# 1. conda setup
#     - conda create -n awaaz_venv python=3.10 -y
#     - conda activate awaaz_venv
#     - pip install websockets pydub

# 2. pipenv setup
#     - python -m venv awaaz_venv
#     - pip install websockets pydub
############################################################

import asyncio
import io
import json
import time
from datetime import datetime, timezone

import websockets
from pydub import AudioSegment

###################### Do your changes here ###########################

TOKEN = "ADD YOUR TOKEN HERE"  # Your token ID
SAMPLE_RATE = 24000  # Sample rate of the audio that you wish to generate
SPEED = 1.3  # Speed of the audio that you wish to generate
VOICE_ID = "amar_indian_male"  # Voice id that you wish to use
TIMEOUT = 2  # Timeout in seconds for receiving a message
CLOSE_CONNECTION_TIMEOUT = 500  # Timeout for your websocket connection
SENTENCES = ["  ", "भारत एक विशाल और विविधता से भरा देश है,","जिसमें विभिन्न भाषाएँ संस्कृतियाँ", "और परंपराएँ समाहित हैं।", " "]

############## Do not change anything below this ######################

SAMPLE_WIDTH = 2
CHANNELS = 1

def add_wav_header(frame_input, sample_rate=24000, sample_width=2, channels=1):
    """Ensure the WAV file is encoded with standard settings.

    Args:
        frame_input (bytes): Audio bytes.
        sample_rate (int, optional): Sample rate needed. Defaults to 24000.
        sample_width (int, optional): Sample width of audio. Defaults to 2.
        channels (int, optional): Channels of the audio. Defaults to 1.

    Returns:
        bytes: Audio data with proper header.
    """
    audio = AudioSegment(data=frame_input, sample_width=sample_width, frame_rate=sample_rate, channels=channels)
    wav_buf = io.BytesIO()
    audio.export(wav_buf, format="wav")
    wav_buf.seek(0)
    return wav_buf.read()


awaaz_websocket_url = f"wss://call-dev.smallest.ai/invocations_streaming?token={TOKEN}"
awaaz_payloads = []
for sentence in SENTENCES:
    payload = {
        "text": sentence,
        "voice_id": VOICE_ID,
        "add_wav_header": False,
        "language": "hi",
        "sample_rate": SAMPLE_RATE,
        "speed": SPEED,
        "keep_ws_open": True,
        "remove_extra_silence": False,
        "transliterate": False,
        "get_end_of_response_token": True
    }
    awaaz_payloads.append(payload)


async def waves_streaming(url: str, payloads: list):
    """Awaaz streaming demo function.

    Args:
        url (str): URL
        payloads (list): List of dictionaries of payloads that are sent to API.

    Returns:
        bytes: Audio bytes generated for the sentences concatenated together.
    """
    first = False
    wav_audio_bytes = b""
    print(f"Start_time: {datetime.now(timezone.utc)}")
    start_time = time.time()

    try:
        async with websockets.connect(url) as websocket:
             for payload in payloads:
                print(f"Connected_time: {datetime.now(timezone.utc)}")
                data = json.dumps(payload)
                await websocket.send(data)

                # Collect responses until no more data or timeout
                response = b""
                while True:
                    response_part = await asyncio.wait_for(websocket.recv(),
                                                           timeout=TIMEOUT)
                    if response_part == "<START>":
                        print("Connection started, Now you will get superfast speeds")
                        break
                    elif response_part == "<END>":
                        print("End of response received")
                        print(f"last response time: {datetime.now(timezone.utc)}")
                        break
                    else:
                        response += response_part
                        print(f"Responses are being read: {datetime.now(timezone.utc)}")
                        if not first:
                            print(f"First response: {datetime.now(timezone.utc)}")
                            first = True

                # Handle the accumulated response if needed
                wav_audio_bytes += response
                time.sleep(1)

                # Optionally close the connection if needed
                if (time.time() - start_time) > CLOSE_CONNECTION_TIMEOUT:
                    await websocket.close()
                    break  # Exit the loop if the connection is closed

    except websockets.exceptions.ConnectionClosed as e:
        if e.code == 1000:
            print("Connection closed normally (code 1000).")
        else:
            print(f"Connection closed with code {e.code}: {e.reason}")
        return None
    except Exception as e:
        print(f"Exception occurred: {e}")

    finally:
        print(f"Connection closed: {datetime.now(timezone.utc)}")
        return wav_audio_bytes


async def post_process(audio_data):
    """Post-process the provided audio data by ensuring the total length of the WAV audio bytes
    aligns with the required sample width and number of channels.

    Args:
        audio_data (bytes): The audio data in WAV format that needs to be processed.

    Returns:
        bytes: Audio bytes with post processing
    """
    total_length = len(audio_data)
    alignment = SAMPLE_WIDTH * CHANNELS
    padding_length = (alignment - (total_length % alignment)) % alignment
    audio_data += b"\x00" * padding_length
    return audio_data


# Example usage with async event loop
async def main():
    """Main function to generate, post-process, and save audio data as a WAV file."""
    # Generate audio data
    wav_audio_bytes = await waves_streaming(awaaz_websocket_url, awaaz_payloads)
    if wav_audio_bytes is None:
        print("Some error occured, Did you check your token?")
    else:
        ################ Optional Post processing for saving the file ###################
        wav_audio_bytes = await post_process(wav_audio_bytes)
        #################################################################################

        # Use audio_data as needed
        print(f"Saving audio file: {datetime.now(timezone.utc)}")
        wav_audio_bytes = add_wav_header(wav_audio_bytes, sample_rate=SAMPLE_RATE)
        with open("waves_demo.wav", "wb") as f:
            f.write(wav_audio_bytes)

if __name__ == "__main__":
    asyncio.run(main())

Example Code to Use - Node

const WebSocket = require("websocket").w3cwebsocket;
const fs = require("fs");
const WaveFile = require("wavefile").WaveFile;

// WebSocket URL and payload
const url =
  "wss://call-dev.smallest.ai/invocations_streaming?token=<authorization_token>";
const payload = {
  text: "आवाज़ एक बहुत ही बढ़िया text to speech मॉडल है",
  voice_id: "sarika_north_indian_female",
  add_wav_header: false
};

// Function to encode PCM data into a WAV file
function encodeAudioCommonWav(
  frameInput,
  sampleRate = 24000,
  sampleWidth = 2,
  channels = 1
) {
  let wav = new WaveFile();
  let bitDepth = sampleWidth * 8;
  wav.fromScratch(channels, sampleRate, bitDepth, frameInput);
  return Buffer.from(wav.toBuffer());
}

// Function to handle streaming and saving the audio file
async function awaazStreaming() {
  let first = false;
  let wavAudioBytes = Buffer.alloc(0);

  if (!payload.add_wav_header) {
    wavAudioBytes = Buffer.concat([
      wavAudioBytes,
      encodeAudioCommonWav(wavAudioBytes)
    ]);
  }
  console.log(`start_time: ${new Date().toISOString()}`);

  try {
    const client = new WebSocket(url);

    client.onopen = () => {
      const data = JSON.stringify(payload);
      client.send(data);
      console.log(`> ${data}`);
    };

    client.onmessage = (message) => {
      wavAudioBytes = Buffer.concat([
        wavAudioBytes,
        Buffer.from(message.data)
      ]);
      if (!first) {
        console.log(`first response: ${new Date().toISOString()}`);
        first = true;
      }
    };

    client.onclose = (event) => {
      if (event.code === 1000) {
        console.log("Connection closed normally (code 1000).");
      } else {
        console.log(
          `Connection closed with code ${event.code}: ${event.reason}`
        );
      }
      saveAudioFile(wavAudioBytes);
      console.log("Connection closed");
    };

    client.onerror = (error) => {
      console.log(`WebSocket error: ${error.message}`);
    };
  } catch (error) {
    console.log(`Failed to establish connection: ${error}`);
  }
}

// Function to save the audio file
function saveAudioFile(wavAudioBytes) {
  console.log("Saving audio file");
  if (!payload.add_wav_header) {
    wavAudioBytes = encodeAudioCommonWav(wavAudioBytes);
  }
  fs.writeFileSync(`awaaz_${payload.voice_id}.wav`, wavAudioBytes);
}

// Run the streaming function
awaazStreaming();