Ensure you’ve completed all prerequisites before starting.
Add Helm Repository
Create Namespace
Configure Values
Create avalues.yaml file:
values.yaml
Install
| Component | Startup Time | Ready Indicator |
|---|---|---|
| Redis | ~30s | 1/1 Running |
| License Proxy | ~1m | 1/1 Running |
| Lightning ASR | 2-10m | 1/1 Running (model download on first run) |
| API Server | ~30s | 1/1 Running |
Verify Installation
Running status with the following services available:
| Service | Port | Description |
|---|---|---|
| api-server | 7100 | REST API endpoint |
| lightning-asr-internal | 2269 | ASR inference service |
| license-proxy | 3369 | License validation |
| redis-master | 6379 | Request queue |
Test the API
Port forward and send a health check:Autoscaling
Enable automatic scaling based on real-time inference load:values.yaml
| Component | Metric | Default Target | Behavior |
|---|---|---|---|
| Lightning ASR | asr_active_requests | 4 per pod | Scales GPU workers based on inference queue depth |
| API Server | lightning_asr_replica_count | 2:1 ratio | Maintains API capacity proportional to ASR workers |
How It Works
- Lightning ASR exposes
asr_active_requestsmetric on port 9090 - Prometheus scrapes this metric via ServiceMonitor
- Prometheus Adapter makes it available to the Kubernetes metrics API
- HPA scales pods when average requests per pod exceeds target
Configuration
values.yaml
Verify Autoscaling
TARGETS column shows current/target. When current exceeds target, pods scale up.
Autoscaling requires the Prometheus stack. It’s included as a dependency and enabled by default.
Helm Operations
Troubleshooting
| Issue | Cause | Resolution |
|---|---|---|
Pods Pending | Insufficient resources or missing GPU nodes | Check kubectl describe pod <name> for scheduling errors |
ImagePullBackOff | Invalid registry credentials | Verify imageCredentials in values.yaml |
CrashLoopBackOff | Invalid license or insufficient memory | Check logs with kubectl logs <pod> --previous |
| Slow model download | Large model size (~20GB) | Use shared storage (EFS) for caching |

