Docker Troubleshooting

Common Issues

GPU Not Accessible

Symptoms:

Error: could not select device driver "nvidia"
Error: no NVIDIA GPU devices found
Lightning ASR fails to start

Diagnosis:

docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Solution 1: Restart Docker

sudo systemctl restart docker
docker compose up -d

Solution 2: Reinstall NVIDIA Container Toolkit

sudo apt-get remove nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

sudo systemctl restart docker

Solution 3: Update NVIDIA Driver

nvidia-smi

If driver version is below 470, update:

sudo ubuntu-drivers autoinstall
sudo reboot

Solution 4: Check Docker Daemon Configuration

Verify /etc/docker/daemon.json contains:

{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

Restart Docker after changes:

sudo systemctl restart docker

License Validation Failed

Symptoms:

Error: License validation failed
Error: Invalid license key
Services fail to start

Diagnosis: Check license-proxy logs:

docker compose logs license-proxy

Solution 1: Verify License Key

Check .env file:

cat .env | grep LICENSE_KEY

Ensure there are no:

Extra spaces
Quotes around the key
Line breaks

Correct format:

LICENSE_KEY=abc123def456

Solution 2: Check Network Connectivity

Test connection to license server:

curl -v https://console-api.smallest.ai

If this fails, check:

Firewall rules
Proxy settings
DNS resolution

Solution 3: Contact Support

If the key appears correct and network is accessible, your license may be:

Expired
Revoked
Invalid

Contact support@smallest.ai with:

Your license key
License-proxy logs
Error messages

Model Download Failed

Symptoms:

Lightning ASR stuck at startup
Error: Failed to download model
Error: Connection timeout

Diagnosis: Check Lightning ASR logs:

docker compose logs lightning-asr

Solution 1: Verify Model URL

Check .env file:

cat .env | grep MODEL_URL

Test URL accessibility:

curl -I "${MODEL_URL}"

Solution 2: Check Disk Space

Models require ~20-30 GB:

df -h

Free up space if needed:

docker system prune -a

Solution 3: Manual Download

Download model manually and use volume mount:

mkdir -p ~/models
cd ~/models
wget "${MODEL_URL}" -O model.bin

Update docker-compose.yml:

lightning-asr:
  volumes:
    - ~/models:/app/models

Solution 4: Increase Timeout

For slow connections, increase download timeout:

lightning-asr:
  environment:
    - DOWNLOAD_TIMEOUT=3600

Port Already in Use

Symptoms:

Error: port is already allocated
Error: bind: address already in use

Diagnosis: Find what’s using the port:

sudo lsof -i :7100
sudo netstat -tulpn | grep 7100

Solution 1: Stop Conflicting Service

If another service is using the port:

sudo systemctl stop [service-name]

Or kill the process:

sudo kill -9 [PID]

Solution 2: Change Port

Modify docker-compose.yml to use different port:

api-server:
  ports:
    - "8080:7100"

Access API at http://localhost:8080 instead

Solution 3: Remove Old Containers

Old containers may still be bound:

docker compose down
docker container prune -f
docker compose up -d

Out of Memory

Symptoms:

Container killed unexpectedly
Error: OOMKilled
System becomes unresponsive

Diagnosis: Check container status:

docker compose ps
docker inspect [container-name] | grep OOMKilled

Solution 1: Increase System Memory

Lightning ASR requires minimum 16 GB RAMCheck current memory:

free -h

Solution 2: Add Memory Limits

Prevent one service from consuming all memory:

services:
  lightning-asr:
    deploy:
      resources:
        limits:
          memory: 14G
        reservations:
          memory: 12G

Solution 3: Enable Swap

Add swap space (temporary solution):

sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Solution 4: Optimize Model Loading

Use smaller model or reduce batch size:

lightning-asr:
  environment:
    - BATCH_SIZE=1
    - MODEL_PRECISION=fp16

Container Keeps Restarting

Symptoms:

Container status shows Restarting
Logs show crash loop

Diagnosis: View recent logs:

docker compose logs --tail=100 [service-name]

Solution 1: Check Exit Code

docker inspect [container-name] --format='{{.State.ExitCode}}'

Common exit codes:

137: Out of memory (OOMKilled)
139: Segmentation fault
1: General error

Solution 2: Disable Auto-Restart

Temporarily disable restart to debug:

lightning-asr:
  restart: "no"

Start manually and watch logs:

docker compose up lightning-asr

Solution 3: Check Dependencies

Ensure required services are healthy:

docker compose ps

All should show Up (healthy) or Up

Slow Performance

Symptoms:

High latency (>500ms)
Low throughput
GPU underutilized

Diagnosis: Monitor GPU usage:

watch -n 1 nvidia-smi

Check container resources:

docker stats

Solution 1: Optimize GPU Usage

Ensure GPU is not throttling:

nvidia-smi -q -d PERFORMANCE

Enable persistence mode:

sudo nvidia-smi -pm 1

Solution 2: Increase CPU Allocation

lightning-asr:
  deploy:
    resources:
      limits:
        cpus: '8'

Solution 3: Use Host Network

For maximum performance (loses isolation):

api-server:
  network_mode: host

Solution 4: Optimize Redis

Use Redis with persistence disabled for speed:

redis:
  command: redis-server --save ""

Solution 5: Add More Workers

Scale Lightning ASR workers:

docker compose up -d --scale lightning-asr=2

Performance Optimization

Best Practices

Use Persistent Volumes

Cache models to avoid re-downloading:

volumes:
  - model-cache:/app/models

Enable GPU Persistence Mode

Reduces GPU initialization time:

sudo nvidia-smi -pm 1

Optimize Container Resources

Allocate appropriate CPU/memory:

deploy:
  resources:
    limits:
      cpus: '8'
      memory: 14G

Monitor and Tune

Use monitoring tools:

docker stats
nvidia-smi dmon

Benchmark Your Deployment

Test transcription performance:

time curl -X POST http://localhost:7100/v1/listen \
  -H "Authorization: Token ${LICENSE_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/test-audio-60s.wav"
  }'

Expected performance:

Cold start: First request after container start (5-10 seconds)
Warm requests: Subsequent requests (50-200ms)
Real-time factor: 0.05-0.15x (60s audio in 3-9 seconds)

Debugging Tools

View All Logs

docker compose logs -f

Follow Specific Service

docker compose logs -f lightning-asr

Last N Lines

docker compose logs --tail=100 api-server

Save Logs to File

docker compose logs > deployment-logs.txt

Execute Commands in Container

docker compose exec lightning-asr bash

Check Container Configuration

docker inspect lightning-asr-1

Network Debugging

Test connectivity between containers:

docker compose exec api-server ping lightning-asr
docker compose exec api-server curl http://lightning-asr:2233/health

Health Checks

API Server

curl http://localhost:7100/health

Expected: {"status": "healthy"}

Lightning ASR

curl http://localhost:2233/health

Expected: {"status": "ready", "gpu": "NVIDIA A10"}

License Proxy

docker compose exec license-proxy wget -q -O- http://localhost:6699/health

Expected: {"status": "valid"}

Redis

docker compose exec redis redis-cli ping

Expected: PONG

Log Analysis

Common Log Patterns

Successful Startup
License Issues
GPU Issues
Network Issues

redis-1              | Ready to accept connections
license-proxy        | License validated successfully
lightning-asr-1      | Model loaded successfully
lightning-asr-1      | GPU: NVIDIA A10 (24GB)
lightning-asr-1      | Server ready on port 2233
api-server           | Connected to Lightning ASR
api-server           | API server listening on port 7100

license-proxy        | ERROR: License validation failed
license-proxy        | ERROR: Invalid license key
license-proxy        | ERROR: Connection to license server failed

lightning-asr-1      | ERROR: No CUDA-capable device detected
lightning-asr-1      | ERROR: CUDA out of memory
lightning-asr-1      | ERROR: GPU not accessible

api-server           | ERROR: Connection refused: lightning-asr:2233
api-server           | ERROR: Timeout connecting to license-proxy

Getting Help

Before Contacting Support

Collect the following information:

System Information

docker version
docker compose version
nvidia-smi
uname -a

Container Status

docker compose ps > status.txt
docker stats --no-stream > resources.txt

Logs

docker compose logs > all-logs.txt

Configuration

Sanitize and include:

docker-compose.yml
.env (remove license key)

Contact Support

Email: support@smallest.ai Include:

Description of the issue
Steps to reproduce
System information
Logs and configuration
License key (via secure channel)

Getting Started

Docker Setup

Kubernetes Setup

Troubleshooting

Docker Troubleshooting

Common Issues

GPU Not Accessible

License Validation Failed

Model Download Failed

Port Already in Use

Out of Memory

Container Keeps Restarting

Slow Performance

Performance Optimization

Best Practices

Benchmark Your Deployment

Debugging Tools

View All Logs

Follow Specific Service

Last N Lines

Save Logs to File

Execute Commands in Container

Check Container Configuration

Network Debugging

Health Checks

API Server

Lightning ASR

License Proxy

Redis

Log Analysis

Common Log Patterns

Getting Help

Before Contacting Support

Contact Support

What’s Next?

STT Configuration

API Reference

Getting Started

Docker Setup

Kubernetes Setup

Troubleshooting

​Common Issues

​GPU Not Accessible

​License Validation Failed

​Model Download Failed

​Port Already in Use

​Out of Memory

​Container Keeps Restarting

​Slow Performance

​Performance Optimization

​Best Practices

​Benchmark Your Deployment

​Debugging Tools

​View All Logs

​Follow Specific Service

​Last N Lines

​Save Logs to File

​Execute Commands in Container

​Check Container Configuration

​Network Debugging

​Health Checks

​API Server

​Lightning ASR

​License Proxy

​Redis

​Log Analysis

​Common Log Patterns

​Getting Help

​Before Contacting Support

​Contact Support

​What’s Next?

STT Configuration

API Reference

Common Issues

GPU Not Accessible

License Validation Failed

Model Download Failed

Port Already in Use

Out of Memory

Container Keeps Restarting

Slow Performance

Performance Optimization

Best Practices

Benchmark Your Deployment

Debugging Tools

View All Logs

Follow Specific Service

Last N Lines

Save Logs to File

Execute Commands in Container

Check Container Configuration

Network Debugging

Health Checks

API Server

Lightning ASR

License Proxy

Redis

Log Analysis

Common Log Patterns

Getting Help

Before Contacting Support

Contact Support

What’s Next?