Skip to main content

Overview

AI models for Lightning ASR are large files (20-30 GB) that significantly impact startup time. This guide covers strategies for efficient model storage and caching to minimize download time and enable fast scaling.

Storage Strategies

Best for production with autoscaling. Advantages:
  • Models downloaded once, shared across all pods
  • New pods start in 30-60 seconds
  • No storage duplication
  • Enables horizontal scaling
Implementation:
values.yaml
models:
  asrModelUrl: "https://example.com/model.bin"
  volumes:
    aws:
      efs:
        enabled: true
        fileSystemId: "fs-0123456789abcdef"
        namePrefix: "models"

scaling:
  auto:
    enabled: true
    lightningAsr:
      hpa:
        enabled: true
        maxReplicas: 10
See EFS Configuration for setup.

Strategy 2: Container Image with Baked Model

Best for fixed deployments with infrequent updates. Advantages:
  • Fastest startup (model pre-loaded)
  • No external download required
  • Works offline
Disadvantages:
  • Very large container images (20+ GB)
  • Slow image pulls
  • Updates require new image build
Implementation: Build custom image:
Dockerfile
FROM quay.io/smallestinc/lightning-asr:latest

RUN wget -O /app/models/model.bin https://example.com/model.bin

ENV MODEL_PATH=/app/models/model.bin
Build and push:
docker build -t myregistry/lightning-asr:with-model .
docker push myregistry/lightning-asr:with-model
Update values:
values.yaml
lightningAsr:
  image: "myregistry/lightning-asr:with-model"

models:
  asrModelUrl: ""

Strategy 3: EmptyDir Volume

Best for development/testing only. Advantages:
  • Simple configuration
  • No external storage required
Disadvantages:
  • Model downloaded on every pod start
  • Cannot scale beyond single node
  • Data lost on pod restart
Implementation:
values.yaml
models:
  asrModelUrl: "https://example.com/model.bin"
  volumes:
    aws:
      efs:
        enabled: false

lightningAsr:
  persistence:
    enabled: false
Each pod downloads the model independently.

Strategy 4: Init Container with S3

Best for AWS deployments without EFS. Advantages:
  • Fast downloads from S3 within AWS
  • No EFS cost
  • Works with ReadWriteOnce volumes
Disadvantages:
  • Each pod downloads independently
  • Slower scaling than EFS
  • Requires S3 bucket
Implementation: Upload model to S3:
aws s3 cp model.bin s3://my-bucket/models/model.bin
Create custom deployment with init container:
initContainers:
  - name: download-model
    image: amazon/aws-cli
    command:
      - sh
      - -c
      - |
        if [ ! -f /models/model.bin ]; then
          aws s3 cp s3://my-bucket/models/model.bin /models/model.bin
        fi
    volumeMounts:
      - name: model-cache
        mountPath: /models
    env:
      - name: AWS_REGION
        value: us-east-1

Model Download Optimization

Parallel Downloads

For multiple model files, download in parallel:
lightningAsr:
  env:
    - name: MODEL_DOWNLOAD_WORKERS
      value: "4"

Resume on Failure

Enable download resume for interrupted downloads:
lightningAsr:
  env:
    - name: MODEL_DOWNLOAD_RESUME
      value: "true"

CDN Acceleration

Use CloudFront for faster downloads:
models:
  asrModelUrl: "https://d111111abcdef8.cloudfront.net/model.bin"

Model Versioning

Multiple Models

Support multiple model versions:
values.yaml
models:
  asrModelUrl: "https://example.com/model-v1.bin"
  
lightningAsr:
  env:
    - name: MODEL_VERSION
      value: "v1"
    - name: MODEL_CACHE_DIR
      value: "/app/models/v1"

Blue-Green Deployments

Deploy new model version alongside old:
helm install smallest-v2 smallest-self-host/smallest-self-host \
  -f values.yaml \
  --set models.asrModelUrl="https://example.com/model-v2.bin" \
  --set lightningAsr.namePrefix="lightning-asr-v2" \
  --namespace smallest
Test v2, then switch traffic:
apiServer:
  env:
    - name: LIGHTNING_ASR_BASE_URL
      value: "http://lightning-asr-v2:2269"

Storage Quotas

Limit Model Cache Size

Prevent unbounded growth:
lightningAsr:
  persistence:
    enabled: true
    size: 100Gi

  env:
    - name: MODEL_CACHE_MAX_SIZE
      value: "50GB"
    - name: MODEL_CACHE_EVICTION
      value: "lru"

Monitor Storage Usage

Check PVC usage:
kubectl get pvc -n smallest
kubectl describe pvc models-aws-efs-pvc -n smallest
Check actual usage in pod:
kubectl exec -it <lightning-asr-pod> -n smallest -- df -h /app/models

Pre-warming Models

Pre-download Before Scaling

Download models before peak traffic:
kubectl create job model-preload \
  --image=quay.io/smallestinc/lightning-asr:latest \
  --namespace=smallest \
  -- sh -c "wget -O /app/models/model.bin $MODEL_URL && exit 0"

Scheduled Pre-warming

Use CronJob for regular pre-warming:
apiVersion: batch/v1
kind: CronJob
metadata:
  name: model-preload
  namespace: smallest
spec:
  schedule: "0 8 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: preload
            image: quay.io/smallestinc/lightning-asr:latest
            command:
              - sh
              - -c
              - wget -O /app/models/model.bin $MODEL_URL || true
            env:
              - name: MODEL_URL
                value: "https://example.com/model.bin"
            volumeMounts:
              - name: models
                mountPath: /app/models
          volumes:
            - name: models
              persistentVolumeClaim:
                claimName: models-aws-efs-pvc
          restartPolicy: OnFailure

Model Integrity

Checksum Validation

Verify model integrity after download:
lightningAsr:
  env:
    - name: MODEL_CHECKSUM
      value: "sha256:abc123..."
    - name: MODEL_VALIDATE
      value: "true"

Automatic Retry

Retry failed downloads:
lightningAsr:
  env:
    - name: MODEL_DOWNLOAD_RETRIES
      value: "3"
    - name: MODEL_DOWNLOAD_TIMEOUT
      value: "3600"

Performance Comparison

StrategyFirst StartSubsequent StartsScaling SpeedCost
EFS Shared5-10 min30-60 secFastMedium
Baked Image3-5 min3-5 minMediumLow
EmptyDir5-10 min5-10 minSlowLow
S3 Init2-5 min2-5 minMediumLow

Best Practices

Always use shared storage (EFS) for production deployments with autoscaling.The cost savings from reduced download time and faster scaling far outweigh EFS costs.
Watch logs during first deployment:
kubectl logs -f -l app=lightning-asr -n smallest
Look for download progress indicators.
Ensure sufficient storage for models:
models:
  volumes:
    aws:
      efs:
        enabled: true

lightningAsr:
  resources:
    ephemeral-storage: "50Gi"
Test new models in separate deployment before updating production:
helm install test smallest-self-host/smallest-self-host \
  --set models.asrModelUrl="new-model-url" \
  --namespace smallest-test

Troubleshooting

Model Download Stalled

Check pod logs:
kubectl logs -l app=lightning-asr -n smallest --tail=100
Check network connectivity:
kubectl exec -it <pod> -n smallest -- wget --spider $MODEL_URL

Insufficient Storage

Check available space:
kubectl exec -it <pod> -n smallest -- df -h
Increase PVC size:
models:
  volumes:
    aws:
      efs:
        enabled: true

lightningAsr:
  persistence:
    size: 200Gi

Model Corruption

Delete cached model and restart:
kubectl exec -it <pod> -n smallest -- rm -rf /app/models/*
kubectl delete pod <pod> -n smallest

What’s Next?