Overview
AI models for Lightning ASR are large files (20-30 GB) that significantly impact startup time. This guide covers strategies for efficient model storage and caching to minimize download time and enable fast scaling.Storage Strategies
Strategy 1: Shared EFS Volume (Recommended)
Best for production with autoscaling. Advantages:- Models downloaded once, shared across all pods
- New pods start in 30-60 seconds
- No storage duplication
- Enables horizontal scaling
values.yaml
Strategy 2: Container Image with Baked Model
Best for fixed deployments with infrequent updates. Advantages:- Fastest startup (model pre-loaded)
- No external download required
- Works offline
- Very large container images (20+ GB)
- Slow image pulls
- Updates require new image build
Dockerfile
values.yaml
Strategy 3: EmptyDir Volume
Best for development/testing only. Advantages:- Simple configuration
- No external storage required
- Model downloaded on every pod start
- Cannot scale beyond single node
- Data lost on pod restart
values.yaml
Strategy 4: Init Container with S3
Best for AWS deployments without EFS. Advantages:- Fast downloads from S3 within AWS
- No EFS cost
- Works with ReadWriteOnce volumes
- Each pod downloads independently
- Slower scaling than EFS
- Requires S3 bucket
Model Download Optimization
Parallel Downloads
For multiple model files, download in parallel:Resume on Failure
Enable download resume for interrupted downloads:CDN Acceleration
Use CloudFront for faster downloads:Model Versioning
Multiple Models
Support multiple model versions:values.yaml
Blue-Green Deployments
Deploy new model version alongside old:Storage Quotas
Limit Model Cache Size
Prevent unbounded growth:Monitor Storage Usage
Check PVC usage:Pre-warming Models
Pre-download Before Scaling
Download models before peak traffic:Scheduled Pre-warming
Use CronJob for regular pre-warming:Model Integrity
Checksum Validation
Verify model integrity after download:Automatic Retry
Retry failed downloads:Performance Comparison
| Strategy | First Start | Subsequent Starts | Scaling Speed | Cost |
|---|---|---|---|---|
| EFS Shared | 5-10 min | 30-60 sec | Fast | Medium |
| Baked Image | 3-5 min | 3-5 min | Medium | Low |
| EmptyDir | 5-10 min | 5-10 min | Slow | Low |
| S3 Init | 2-5 min | 2-5 min | Medium | Low |
Best Practices
Use EFS for Production
Use EFS for Production
Always use shared storage (EFS) for production deployments with autoscaling.The cost savings from reduced download time and faster scaling far outweigh EFS costs.
Monitor Download Progress
Monitor Download Progress
Watch logs during first deployment:Look for download progress indicators.
Set Resource Limits
Set Resource Limits
Ensure sufficient storage for models:
Test Model Updates
Test Model Updates
Test new models in separate deployment before updating production:

