Model Storage

Overview

AI models for Lightning ASR are large files (20-30 GB) that significantly impact startup time. This guide covers strategies for efficient model storage and caching to minimize download time and enable fast scaling.

Storage Strategies

Strategy 1: Shared EFS Volume (Recommended)

Best for production with autoscaling. Advantages:

Models downloaded once, shared across all pods
New pods start in 30-60 seconds
No storage duplication
Enables horizontal scaling

Implementation:

values.yaml

models:
  asrModelUrl: "https://example.com/model.bin"
  volumes:
    aws:
      efs:
        enabled: true
        fileSystemId: "fs-0123456789abcdef"
        namePrefix: "models"

scaling:
  auto:
    enabled: true
    lightningAsr:
      hpa:
        enabled: true
        maxReplicas: 10

See EFS Configuration for setup.

Strategy 2: Container Image with Baked Model

Best for fixed deployments with infrequent updates. Advantages:

Fastest startup (model pre-loaded)
No external download required
Works offline

Disadvantages:

Very large container images (20+ GB)
Slow image pulls
Updates require new image build

Implementation: Build custom image:

Dockerfile

FROM quay.io/smallestinc/lightning-asr:latest

RUN wget -O /app/models/model.bin https://example.com/model.bin

ENV MODEL_PATH=/app/models/model.bin

Build and push:

docker build -t myregistry/lightning-asr:with-model .
docker push myregistry/lightning-asr:with-model

Update values:

values.yaml

lightningAsr:
  image: "myregistry/lightning-asr:with-model"

models:
  asrModelUrl: ""

Strategy 3: EmptyDir Volume

Best for development/testing only. Advantages:

Simple configuration
No external storage required

Disadvantages:

Model downloaded on every pod start
Cannot scale beyond single node
Data lost on pod restart

Implementation:

values.yaml

models:
  asrModelUrl: "https://example.com/model.bin"
  volumes:
    aws:
      efs:
        enabled: false

lightningAsr:
  persistence:
    enabled: false

Each pod downloads the model independently.

Strategy 4: Init Container with S3

Best for AWS deployments without EFS. Advantages:

Fast downloads from S3 within AWS
No EFS cost
Works with ReadWriteOnce volumes

Disadvantages:

Each pod downloads independently
Slower scaling than EFS
Requires S3 bucket

Implementation: Upload model to S3:

aws s3 cp model.bin s3://my-bucket/models/model.bin

Create custom deployment with init container:

initContainers:
  - name: download-model
    image: amazon/aws-cli
    command:
      - sh
      - -c
      - |
        if [ ! -f /models/model.bin ]; then
          aws s3 cp s3://my-bucket/models/model.bin /models/model.bin
        fi
    volumeMounts:
      - name: model-cache
        mountPath: /models
    env:
      - name: AWS_REGION
        value: us-east-1

Model Download Optimization

Parallel Downloads

For multiple model files, download in parallel:

lightningAsr:
  env:
    - name: MODEL_DOWNLOAD_WORKERS
      value: "4"

Resume on Failure

Enable download resume for interrupted downloads:

lightningAsr:
  env:
    - name: MODEL_DOWNLOAD_RESUME
      value: "true"

CDN Acceleration

Use CloudFront for faster downloads:

models:
  asrModelUrl: "https://d111111abcdef8.cloudfront.net/model.bin"

Model Versioning

Multiple Models

Support multiple model versions:

values.yaml

models:
  asrModelUrl: "https://example.com/model-v1.bin"
  
lightningAsr:
  env:
    - name: MODEL_VERSION
      value: "v1"
    - name: MODEL_CACHE_DIR
      value: "/app/models/v1"

Blue-Green Deployments

Deploy new model version alongside old:

helm install smallest-v2 smallest-self-host/smallest-self-host \
  -f values.yaml \
  --set models.asrModelUrl="https://example.com/model-v2.bin" \
  --set lightningAsr.namePrefix="lightning-asr-v2" \
  --namespace smallest

Test v2, then switch traffic:

apiServer:
  env:
    - name: LIGHTNING_ASR_BASE_URL
      value: "http://lightning-asr-v2:2269"

Storage Quotas

Limit Model Cache Size

Prevent unbounded growth:

lightningAsr:
  persistence:
    enabled: true
    size: 100Gi

  env:
    - name: MODEL_CACHE_MAX_SIZE
      value: "50GB"
    - name: MODEL_CACHE_EVICTION
      value: "lru"

Monitor Storage Usage

Check PVC usage:

kubectl get pvc -n smallest
kubectl describe pvc models-aws-efs-pvc -n smallest

Check actual usage in pod:

kubectl exec -it <lightning-asr-pod> -n smallest -- df -h /app/models

Pre-warming Models

Pre-download Before Scaling

Download models before peak traffic:

kubectl create job model-preload \
  --image=quay.io/smallestinc/lightning-asr:latest \
  --namespace=smallest \
  -- sh -c "wget -O /app/models/model.bin $MODEL_URL && exit 0"

Scheduled Pre-warming

Use CronJob for regular pre-warming:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: model-preload
  namespace: smallest
spec:
  schedule: "0 8 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: preload
            image: quay.io/smallestinc/lightning-asr:latest
            command:
              - sh
              - -c
              - wget -O /app/models/model.bin $MODEL_URL || true
            env:
              - name: MODEL_URL
                value: "https://example.com/model.bin"
            volumeMounts:
              - name: models
                mountPath: /app/models
          volumes:
            - name: models
              persistentVolumeClaim:
                claimName: models-aws-efs-pvc
          restartPolicy: OnFailure

Model Integrity

Checksum Validation

Verify model integrity after download:

lightningAsr:
  env:
    - name: MODEL_CHECKSUM
      value: "sha256:abc123..."
    - name: MODEL_VALIDATE
      value: "true"

Automatic Retry

Retry failed downloads:

lightningAsr:
  env:
    - name: MODEL_DOWNLOAD_RETRIES
      value: "3"
    - name: MODEL_DOWNLOAD_TIMEOUT
      value: "3600"

Performance Comparison

Strategy	First Start	Subsequent Starts	Scaling Speed	Cost
EFS Shared	5-10 min	30-60 sec	Fast	Medium
Baked Image	3-5 min	3-5 min	Medium	Low
EmptyDir	5-10 min	5-10 min	Slow	Low
S3 Init	2-5 min	2-5 min	Medium	Low

Best Practices

Use EFS for Production

Always use shared storage (EFS) for production deployments with autoscaling.The cost savings from reduced download time and faster scaling far outweigh EFS costs.

Monitor Download Progress

Watch logs during first deployment:

kubectl logs -f -l app=lightning-asr -n smallest

Look for download progress indicators.

Set Resource Limits

Ensure sufficient storage for models:

models:
  volumes:
    aws:
      efs:
        enabled: true

lightningAsr:
  resources:
    ephemeral-storage: "50Gi"

Test Model Updates

Test new models in separate deployment before updating production:

helm install test smallest-self-host/smallest-self-host \
  --set models.asrModelUrl="new-model-url" \
  --namespace smallest-test

Troubleshooting

Model Download Stalled

Check pod logs:

kubectl logs -l app=lightning-asr -n smallest --tail=100

Check network connectivity:

kubectl exec -it <pod> -n smallest -- wget --spider $MODEL_URL

Insufficient Storage

Check available space:

kubectl exec -it <pod> -n smallest -- df -h

Increase PVC size:

models:
  volumes:
    aws:
      efs:
        enabled: true

lightningAsr:
  persistence:
    size: 200Gi

Model Corruption

Delete cached model and restart:

kubectl exec -it <pod> -n smallest -- rm -rf /app/models/*
kubectl delete pod <pod> -n smallest

What’s Next?

EFS Configuration

Set up EFS for shared model storage

Redis Persistence

Configure Redis data persistence

HPA Configuration

Enable autoscaling with fast pod startup

Getting Started

Docker Setup

Kubernetes Setup

Troubleshooting

Model Storage

Overview

Storage Strategies

Strategy 1: Shared EFS Volume (Recommended)

Strategy 2: Container Image with Baked Model

Strategy 3: EmptyDir Volume

Strategy 4: Init Container with S3

Model Download Optimization

Parallel Downloads

Resume on Failure

CDN Acceleration

Model Versioning

Multiple Models

Blue-Green Deployments

Storage Quotas

Limit Model Cache Size

Monitor Storage Usage

Pre-warming Models

Pre-download Before Scaling

Scheduled Pre-warming

Model Integrity

Checksum Validation

Automatic Retry

Performance Comparison

Best Practices

Troubleshooting

Model Download Stalled

Insufficient Storage

Model Corruption

What’s Next?

EFS Configuration

Redis Persistence

HPA Configuration

Getting Started

Docker Setup

Kubernetes Setup

Troubleshooting

​Overview

​Storage Strategies

​Strategy 1: Shared EFS Volume (Recommended)

​Strategy 2: Container Image with Baked Model

​Strategy 3: EmptyDir Volume

​Strategy 4: Init Container with S3

​Model Download Optimization

​Parallel Downloads

​Resume on Failure

​CDN Acceleration

​Model Versioning

​Multiple Models

​Blue-Green Deployments

​Storage Quotas

​Limit Model Cache Size

​Monitor Storage Usage

​Pre-warming Models

​Pre-download Before Scaling

​Scheduled Pre-warming

​Model Integrity

​Checksum Validation

​Automatic Retry

​Performance Comparison

​Best Practices

​Troubleshooting

​Model Download Stalled

​Insufficient Storage

​Model Corruption

​What’s Next?

EFS Configuration

Redis Persistence

HPA Configuration

Overview

Storage Strategies

Strategy 1: Shared EFS Volume (Recommended)

Strategy 2: Container Image with Baked Model

Strategy 3: EmptyDir Volume

Strategy 4: Init Container with S3

Model Download Optimization

Parallel Downloads

Resume on Failure

CDN Acceleration

Model Versioning

Multiple Models

Blue-Green Deployments

Storage Quotas

Limit Model Cache Size

Monitor Storage Usage

Pre-warming Models

Pre-download Before Scaling

Scheduled Pre-warming

Model Integrity

Checksum Validation

Automatic Retry

Performance Comparison

Best Practices

Troubleshooting

Model Download Stalled

Insufficient Storage

Model Corruption

What’s Next?