Skip to main content

System Architecture

Components

Routes requests to Lightning ASR/TTS workers, manages WebSocket connections, and provides a unified REST API interface.Resources: 0.5-2 CPU cores, 512 MB - 2 GB RAM, no GPU
GPU-accelerated speech-to-text engine with 0.05-0.15x real-time factor. Supports real-time and batch transcription.Resources: 4-8 CPU cores, 12-16 GB RAM, 1x NVIDIA GPU (16+ GB VRAM)
GPU-accelerated text-to-speech engine for natural voice synthesis. Supports streaming and batch generation.Resources: 4-8 CPU cores, 12-16 GB RAM, 1x NVIDIA GPU (16+ GB VRAM)
Validates license keys and reports usage metadata. Supports offline grace periods.Resources: 0.25-1 CPU core, 256-512 MB RAM, no GPU
Request queuing, session state, and caching. Can use embedded or external (ElastiCache).Resources: 0.5-1 CPU core, 512 MB - 2 GB RAM, no GPU

Data Flow

  1. Client Request — Your application sends audio (STT) or text (TTS) via HTTP or WebSocket
  2. API Server — Routes the request to the appropriate worker and validates the license
  3. Worker Processing — Lightning ASR or TTS processes the request on GPU
  4. Response — Results stream back through the API server to your application
All processing happens within your infrastructure. Only license validation metadata is sent to Smallest Cloud.

What’s Next?

Prerequisites

License key, credentials, and infrastructure requirements

Why Self-Host?

Benefits of self-hosting for your use case