Introduction

What is Smallest Self-Host?

Smallest Self-Host enables you to deploy state-of-the-art speech-to-text (STT) models in your own infrastructure, whether in the cloud or on-premises. Built for enterprises with stringent performance, security, or compliance requirements, it provides the same powerful AI capabilities as Smallest’s cloud service while keeping your data under your complete control.

Why Self-Host?

Using Smallest as a managed service has many benefits: it’s fast to start developing with, requires no infrastructure setup, and eliminates all hardware, installation, configuration, backup, and maintenance-related costs. However, there are situations where a self-hosted deployment makes more sense.

Performance Requirements

Certain use cases have very sensitive latency and load requirements. If you need ultra-low latency with voice AI services colocated with your other services, self-hosting can meet these requirements. Ideal for:

Real-time AI voicebots requiring <100ms response times
Live transcription systems for broadcasts or conferences
High-volume processing with predictable costs
Edge deployments with limited internet connectivity

Benefits:

Colocate speech services with your application infrastructure
Scale independently based on your specific workload patterns
No network latency to external APIs
Consistent performance regardless of internet conditions

Security & Data Privacy

One of the most common use cases for self-hosting Smallest is to satisfy security or data privacy requirements. In a typical self-hosted deployment, no audio, transcripts, or other identifying markers of the request content are sent to Smallest servers. Ideal for:

Healthcare applications requiring HIPAA compliance
Financial services with strict data governance
Government and defense applications
Enterprise environments with air-gapped networks

Data Privacy:

Your audio data never leaves your infrastructure
Transcripts remain entirely within your control
No data stored beyond the duration of the API request
Self-hosted deployments do not persist request/response data

What is reported:

Only metadata such as audio duration, character count, features requested, and success response codes
No audio content, transcripts, or personally identifiable information

In a typical self-hosted deployment, no audio or transcript data is sent to Smallest servers. Only usage metadata (duration, feature flags, response codes) is reported to the license server for validation and billing purposes.

Cost Optimization

For high-volume or predictable workloads, self-hosting can be more cost-effective:

Predictable costs based on infrastructure, not usage
No per-minute charges for audio processing
Efficient resource utilization with autoscaling
Long-term savings for sustained high volumes

Customization & Control

Self-hosting provides complete control over your deployment:

Custom resource allocation optimized for your workload
Version control - upgrade on your schedule
Network isolation - deploy in private networks
Integration flexibility - direct database access, custom monitoring

Components

Before you deploy Smallest, you’ll need to understand the components of your system, their relationships, and the interactions between components. A well-designed architecture will meet your business needs, optimize both performance and security, and provide a strong technical foundation for future growth.

Architecture Diagram

Component Details

API Server

Purpose: The API server interfaces with Lightning ASR to expose endpoints for your requests.Key Features:

Routes incoming API requests to available Lightning ASR workers
Manages WebSocket connections for streaming transcription
Handles request queuing and load balancing across workers
Provides unified REST API interface

Resource Requirements:

CPU: 0.5-2 cores
Memory: 512 MB - 2 GB
No GPU required

Lightning ASR

Purpose: The Lightning ASR engine performs the computationally intensive task of speech recognition. It manages GPU devices and responds to requests from the API layer. Key Features: - GPU-accelerated speech recognition (0.05-0.15x real-time factor) - Real-time and batch audio transcription - Automatic model loading and optimization - Horizontal scaling support Resource Requirements: - CPU: 4-8 cores - Memory: 12-16 GB RAM - GPU: 1x NVIDIA GPU (16+ GB VRAM required) - Storage: 50+ GB for models Note: Because Lightning ASR is decoupled from the API Server, you can scale it independently based on your transcription load.

License Proxy

Purpose: Components register with the Smallest License Server to verify licensing and report usage. API and Engine containers can be configured to connect directly to the licensing server, or to proxy their communication through the License Proxy. Key Features: - License key validation on startup - Usage metadata reporting (no audio/transcript data) - Grace period support for offline operation - Secure communication with Smallest License Server Resource Requirements: - CPU: 0.25-1 core - Memory: 256-512 MB - No GPU required Network: Requires outbound HTTPS to https://console-api.smallest.ai

Redis

Purpose: Provides caching and state management for the system.Key Features:

Request queuing and coordination between API and ASR workers
Session state for streaming connections
Performance optimization through caching
Can be embedded or external (AWS ElastiCache, etc.)

Resource Requirements:

CPU: 0.5-1 core
Memory: 512 MB - 2 GB
No GPU required

Common Setup Path

All deployments follow the same initial setup path through environment preparation. Here’s what to expect:

1. Choose Your Deployment Method

Docker/Podman

Best for: Development, testing, small-scale productionTimeline: 15-30 minutesComplexity: Low

Kubernetes

Best for: Production deployments with autoscalingTimeline: 1-2 hoursComplexity: Medium-High

2. Prepare Infrastructure

Steps:

Obtain credentials from Smallest.ai (license key, registry access, model URLs)
Prepare infrastructure (Docker host or Kubernetes cluster)
Setup GPU support (NVIDIA drivers, device plugins)
Deploy components (API Server, Lightning ASR, License Proxy, Redis)
Configure autoscaling (optional, Kubernetes only)
Setup monitoring (optional, Prometheus & Grafana)

What You’ll Need

Before starting, ensure you have:

From Smallest.ai

License key
Container registry credentials
Model download URLs

Contact: [email protected]

Technical Requirements

GPU infrastructure (NVIDIA A10, T4, or better)
Kubernetes cluster or Docker host
Basic DevOps knowledge
Network connectivity for license validation

Deployment Options

Smallest Self-Host supports two primary deployment methods, each suited for different operational requirements:

Docker Deployment

Best for development, testing, or small-scale production deploymentsPros:

Fastest setup (under 15 minutes)
Minimal infrastructure requirements
Single-machine deployment
Easy configuration with docker-compose

Use Cases:

Development and testing
Proof of concept
Small-scale production
Edge deployments

Kubernetes Deployment

Production-grade deployment with enterprise features

Available for ASR only. TTS Kubernetes support coming soon.

Pros:

Auto-scaling based on load
High availability and fault tolerance
Advanced monitoring with Grafana
Shared model storage

Use Cases:

Production workloads
High-traffic applications
Multi-region deployments
Enterprise infrastructure

Prerequisites

Before deploying Smallest Self-Host, ensure you have:

License Key

Contact [email protected] or your Smallest representative to obtain:

License key for validation
Container registry credentials

Infrastructure

Provision compute resources: - For Docker: Single machine with NVIDIA GPU

For Kubernetes: Cluster with GPU node pool

GPU Drivers

Install NVIDIA drivers and container runtime:

NVIDIA Driver 525+ (for A10, A100, L4)
NVIDIA Driver 470+ (for T4, V100)
NVIDIA Container Toolkit

What’s Next?

Choose your deployment path based on your needs:

For Quick Start & Testing

Start with Docker

Fastest path to get running (15-30 minutes) Perfect if you’re: - Evaluating Smallest Self-Host for the first time - Building a proof-of-concept

Setting up a development environment - Running on a single GPU server Go to Docker Setup →

For Production Deployment

Kubernetes on AWS

Full-featured production setup

Auto-scaling (HPA + Cluster Autoscaler)
High availability across zones
Grafana monitoring dashboards
Shared model storage with EFS

Setup AWS EKS →

Kubernetes (Generic)

For any Kubernetes cluster

Works on GCP, Azure, on-prem
Full autoscaling support
Advanced monitoring
Production-ready

Setup Kubernetes →

Quick Links by Role

I'm a DevOps Engineer

Start here:

Kubernetes Prerequisites - Check cluster requirements
AWS EKS Setup - Create EKS cluster (if on AWS)
Quick Start - Deploy with Helm
Autoscaling - Configure HPA
Monitoring - Setup Grafana

I'm a Developer

Start here: 1. Docker Prerequisites - Setup local environment 2. Docker Quick Start - Get running in 15 minutes 3. API Reference - Integrate with your app 4. Examples - See code examples

I'm Evaluating the Product

Start here: 1. Docker Quick Start - Fastest way to test 2. API Reference - See what you can do 3. Common Issues - Get help if stuck 4. Then move to Kubernetes for production

I Need Help

Resources:

Common Issues - Quick fixes
Debugging Guide - Advanced troubleshooting
Logs Analysis - Interpret error messages
Support: [email protected]

Recommendation: Start with Docker to familiarize yourself with the components and API. Once you’re comfortable, move to Kubernetes for production deployments with autoscaling and high availability.

Start with Docker

Kubernetes on AWS

Kubernetes (Generic)

Getting Started

Docker Setup

Kubernetes Setup

Troubleshooting

Introduction

What is Smallest Self-Host?

Why Self-Host?

Performance Requirements

Security & Data Privacy

Cost Optimization

Customization & Control

Components

Architecture Diagram

Component Details

Common Setup Path

1. Choose Your Deployment Method

Docker/Podman

Kubernetes

2. Prepare Infrastructure

What You’ll Need

From Smallest.ai

Technical Requirements

Deployment Options

Docker Deployment

Kubernetes Deployment

Prerequisites

What’s Next?

For Quick Start & Testing

For Production Deployment

Quick Links by Role

Getting Started

Docker Setup

Kubernetes Setup

Troubleshooting

​What is Smallest Self-Host?

​Why Self-Host?

​Performance Requirements

​Security & Data Privacy

​Cost Optimization

​Customization & Control

​Components

​Architecture Diagram

​Component Details

​Common Setup Path

​1. Choose Your Deployment Method

Docker/Podman

Kubernetes

​2. Prepare Infrastructure

​What You’ll Need

From Smallest.ai

Technical Requirements

​Deployment Options

Docker Deployment

Kubernetes Deployment

​Prerequisites

​What’s Next?

​For Quick Start & Testing

Start with Docker

​For Production Deployment

Kubernetes on AWS

Kubernetes (Generic)

​Quick Links by Role

What is Smallest Self-Host?

Why Self-Host?

Performance Requirements

Security & Data Privacy

Cost Optimization

Customization & Control

Components

Architecture Diagram

Component Details

Common Setup Path

1. Choose Your Deployment Method

2. Prepare Infrastructure

What You’ll Need

Deployment Options

Prerequisites

What’s Next?

For Quick Start & Testing

For Production Deployment

Quick Links by Role