Architecture Overview

System Architecture

Components

API Server

Routes requests to Lightning ASR/TTS workers, manages WebSocket connections, and provides a unified REST API interface.Resources: 0.5-2 CPU cores, 512 MB - 2 GB RAM, no GPU

Lightning ASR

GPU-accelerated speech-to-text engine with 0.05-0.15x real-time factor. Supports real-time and batch transcription.Resources: 4-8 CPU cores, 12-16 GB RAM, 1x NVIDIA GPU (16+ GB VRAM)

Lightning TTS

GPU-accelerated text-to-speech engine for natural voice synthesis. Supports streaming and batch generation.Resources: 4-8 CPU cores, 12-16 GB RAM, 1x NVIDIA GPU (16+ GB VRAM)

License Proxy

Validates license keys and reports usage metadata. Supports offline grace periods.Resources: 0.25-1 CPU core, 256-512 MB RAM, no GPU

Redis

Request queuing, session state, and caching. Can use embedded or external (ElastiCache).Resources: 0.5-1 CPU core, 512 MB - 2 GB RAM, no GPU

Data Flow

Client Request — Your application sends audio (STT) or text (TTS) via HTTP or WebSocket
API Server — Routes the request to the appropriate worker and validates the license
Worker Processing — Lightning ASR or TTS processes the request on GPU
Response — Results stream back through the API server to your application

All processing happens within your infrastructure. Only license validation metadata is sent to Smallest Cloud.

Getting Started

Docker Setup

Kubernetes Setup

Troubleshooting

Architecture Overview

System Architecture

Components

Data Flow

What’s Next?

Prerequisites

Why Self-Host?

Getting Started

Docker Setup

Kubernetes Setup

Troubleshooting

​System Architecture

​Components

​Data Flow

​What’s Next?

Prerequisites

Why Self-Host?

System Architecture

Components

Data Flow

What’s Next?