Skip to main content

Common Issues

GPU Not Accessible

Symptoms:
  • Error: could not select device driver "nvidia"
  • Error: no NVIDIA GPU devices found
  • Lightning ASR fails to start
Diagnosis:
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
sudo systemctl restart docker
docker compose up -d
sudo apt-get remove nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

sudo systemctl restart docker
nvidia-smi
If driver version is below 470, update:
sudo ubuntu-drivers autoinstall
sudo reboot
Verify /etc/docker/daemon.json contains:
{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}
Restart Docker after changes:
sudo systemctl restart docker

License Validation Failed

Symptoms:
  • Error: License validation failed
  • Error: Invalid license key
  • Services fail to start
Diagnosis: Check license-proxy logs:
docker compose logs license-proxy
Check .env file:
cat .env | grep LICENSE_KEY
Ensure there are no:
  • Extra spaces
  • Quotes around the key
  • Line breaks
Correct format:
LICENSE_KEY=abc123def456
Test connection to license server:
curl -v https://console-api.smallest.ai
If this fails, check:
  • Firewall rules
  • Proxy settings
  • DNS resolution
If the key appears correct and network is accessible, your license may be:
  • Expired
  • Revoked
  • Invalid
Contact [email protected] with:
  • Your license key
  • License-proxy logs
  • Error messages

Model Download Failed

Symptoms:
  • Lightning ASR stuck at startup
  • Error: Failed to download model
  • Error: Connection timeout
Diagnosis: Check Lightning ASR logs:
docker compose logs lightning-asr
Check .env file:
cat .env | grep MODEL_URL
Test URL accessibility:
curl -I "${MODEL_URL}"
Models require ~20-30 GB:
df -h
Free up space if needed:
docker system prune -a
Download model manually and use volume mount:
mkdir -p ~/models
cd ~/models
wget "${MODEL_URL}" -O model.bin
Update docker-compose.yml:
lightning-asr:
  volumes:
    - ~/models:/app/models
For slow connections, increase download timeout:
lightning-asr:
  environment:
    - DOWNLOAD_TIMEOUT=3600

Port Already in Use

Symptoms:
  • Error: port is already allocated
  • Error: bind: address already in use
Diagnosis: Find what’s using the port:
sudo lsof -i :7100
sudo netstat -tulpn | grep 7100
If another service is using the port:
sudo systemctl stop [service-name]
Or kill the process:
sudo kill -9 [PID]
Modify docker-compose.yml to use different port:
api-server:
  ports:
    - "8080:7100"
Access API at http://localhost:8080 instead
Old containers may still be bound:
docker compose down
docker container prune -f
docker compose up -d

Out of Memory

Symptoms:
  • Container killed unexpectedly
  • Error: OOMKilled
  • System becomes unresponsive
Diagnosis: Check container status:
docker compose ps
docker inspect [container-name] | grep OOMKilled
Lightning ASR requires minimum 16 GB RAMCheck current memory:
free -h
Prevent one service from consuming all memory:
services:
  lightning-asr:
    deploy:
      resources:
        limits:
          memory: 14G
        reservations:
          memory: 12G
Add swap space (temporary solution):
sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
Use smaller model or reduce batch size:
lightning-asr:
  environment:
    - BATCH_SIZE=1
    - MODEL_PRECISION=fp16

Container Keeps Restarting

Symptoms:
  • Container status shows Restarting
  • Logs show crash loop
Diagnosis: View recent logs:
docker compose logs --tail=100 [service-name]
docker inspect [container-name] --format='{{.State.ExitCode}}'
Common exit codes:
  • 137: Out of memory (OOMKilled)
  • 139: Segmentation fault
  • 1: General error
Temporarily disable restart to debug:
lightning-asr:
  restart: "no"
Start manually and watch logs:
docker compose up lightning-asr
Ensure required services are healthy:
docker compose ps
All should show Up (healthy) or Up

Slow Performance

Symptoms:
  • High latency (>500ms)
  • Low throughput
  • GPU underutilized
Diagnosis: Monitor GPU usage:
watch -n 1 nvidia-smi
Check container resources:
docker stats
Ensure GPU is not throttling:
nvidia-smi -q -d PERFORMANCE
Enable persistence mode:
sudo nvidia-smi -pm 1
lightning-asr:
  deploy:
    resources:
      limits:
        cpus: '8'
For maximum performance (loses isolation):
api-server:
  network_mode: host
Use Redis with persistence disabled for speed:
redis:
  command: redis-server --save ""
Scale Lightning ASR workers:
docker compose up -d --scale lightning-asr=2

Performance Optimization

Best Practices

1

Use Persistent Volumes

Cache models to avoid re-downloading:
volumes:
  - model-cache:/app/models
2

Enable GPU Persistence Mode

Reduces GPU initialization time:
sudo nvidia-smi -pm 1
3

Optimize Container Resources

Allocate appropriate CPU/memory:
deploy:
  resources:
    limits:
      cpus: '8'
      memory: 14G
4

Monitor and Tune

Use monitoring tools:
docker stats
nvidia-smi dmon

Benchmark Your Deployment

Test transcription performance:
time curl -X POST http://localhost:7100/v1/listen \
  -H "Authorization: Token ${LICENSE_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/test-audio-60s.wav"
  }'
Expected performance:
  • Cold start: First request after container start (5-10 seconds)
  • Warm requests: Subsequent requests (50-200ms)
  • Real-time factor: 0.05-0.15x (60s audio in 3-9 seconds)

Debugging Tools

View All Logs

docker compose logs -f

Follow Specific Service

docker compose logs -f lightning-asr

Last N Lines

docker compose logs --tail=100 api-server

Save Logs to File

docker compose logs > deployment-logs.txt

Execute Commands in Container

docker compose exec lightning-asr bash

Check Container Configuration

docker inspect lightning-asr-1

Network Debugging

Test connectivity between containers:
docker compose exec api-server ping lightning-asr
docker compose exec api-server curl http://lightning-asr:2233/health

Health Checks

API Server

curl http://localhost:7100/health
Expected: {"status": "healthy"}

Lightning ASR

curl http://localhost:2233/health
Expected: {"status": "ready", "gpu": "NVIDIA A10"}

License Proxy

docker compose exec license-proxy wget -q -O- http://localhost:6699/health
Expected: {"status": "valid"}

Redis

docker compose exec redis redis-cli ping
Expected: PONG

Log Analysis

Common Log Patterns

redis-1              | Ready to accept connections
license-proxy        | License validated successfully
lightning-asr-1      | Model loaded successfully
lightning-asr-1      | GPU: NVIDIA A10 (24GB)
lightning-asr-1      | Server ready on port 2233
api-server           | Connected to Lightning ASR
api-server           | API server listening on port 7100

Getting Help

Before Contacting Support

Collect the following information:
1

System Information

docker version
docker compose version
nvidia-smi
uname -a
2

Container Status

docker compose ps > status.txt
docker stats --no-stream > resources.txt
3

Logs

docker compose logs > all-logs.txt
4

Configuration

Sanitize and include:
  • docker-compose.yml
  • .env (remove license key)

Contact Support

Email: [email protected] Include:
  • Description of the issue
  • Steps to reproduce
  • System information
  • Logs and configuration
  • License key (via secure channel)

What’s Next?