Overview
This page focuses on collecting and validating Lightning ASR metrics with Prometheus and exposing them through the Prometheus Adapter.
Autoscaling documentation is currently under active development.
Use this page as a metrics reference.
If you need autoscaling now, configure your own HPA/KEDA rules using these metrics.
Architecture
/* The original had a syntax error in Mermaid—edges must connect nodes, not labels.
“Metrics” is now a node, and edge directions/names are consistent.
*/
Components
Prometheus
Collects and stores metrics from Lightning ASR pods.
Included in chart :
scaling :
auto :
enabled : true
kube-prometheus-stack :
prometheus :
prometheusSpec :
serviceMonitorSelectorNilUsesHelmValues : false
retention : 7d
resources :
requests :
memory : 2Gi
ServiceMonitor
CRD that tells Prometheus which services to scrape.
Enabled for Lightning ASR :
scaling :
auto :
lightningAsr :
servicemonitor :
enabled : true
Prometheus Adapter
Converts Prometheus metrics to Kubernetes custom metrics API.
Configuration :
prometheus-adapter :
prometheus :
url : http://smallest-prometheus-stack-prometheus.default.svc
port : 9090
rules :
custom :
- seriesQuery : "asr_active_requests"
resources :
overrides :
namespace : { resource : "namespace" }
pod : { resource : "pod" }
name :
matches : "^(.*)$"
as : "${1}"
metricsQuery : "asr_active_requests{<<.LabelMatchers>>}"
- seriesQuery : "asr_batch_queue_depth"
resources :
overrides :
namespace : { resource : "namespace" }
pod : { resource : "pod" }
name :
matches : "^(.*)$"
as : "${1}"
metricsQuery : "asr_batch_queue_depth{<<.LabelMatchers>>}"
- seriesQuery : "asr_active_streams"
resources :
overrides :
namespace : { resource : "namespace" }
pod : { resource : "pod" }
name :
matches : "^(.*)$"
as : "${1}"
metricsQuery : "asr_active_streams{<<.LabelMatchers>>}"
- seriesQuery : "asr_stream_queue_depth"
resources :
overrides :
namespace : { resource : "namespace" }
pod : { resource : "pod" }
name :
matches : "^(.*)$"
as : "${1}"
metricsQuery : "asr_stream_queue_depth{<<.LabelMatchers>>}"
Available Metrics
Lightning ASR exposes the following metrics:
Metric Type Description asr_active_requestsGauge Active batch requests currently being processed on GPU asr_batch_queue_depthGauge Requests waiting in the batch queue asr_active_streamsGauge Active streaming sessions asr_stream_queue_depthGauge Pending sessions in the streaming Redis queue
Verify Metrics Setup
Check Prometheus
Forward Prometheus port:
kubectl port-forward -n default svc/smallest-prometheus-stack-prometheus 9090:9090
Open http://localhost:9090 and verify:
Status → Targets : Lightning ASR endpoints should be “UP”
Graph : Query asr_active_requests or asr_batch_queue_depth - should return data
Status → Service Discovery : Should show ServiceMonitor
Check ServiceMonitor
kubectl get servicemonitor -n smallest
Expected output:
NAME AGE
lightning-asr 5m
Describe ServiceMonitor:
kubectl describe servicemonitor lightning-asr -n smallest
Should show:
Spec :
Endpoints :
Port : metrics
Path : /metrics
Selector :
Match Labels :
app : lightning-asr
Check Prometheus Adapter
Verify custom metrics are available:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq -r '.resources[].name' | grep asr
Expected output:
pods/asr_active_requests
pods/asr_batch_queue_depth
pods/asr_active_streams
pods/asr_stream_queue_depth
Query specific metric:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/smallest/pods/*/asr_active_requests" | jq .
Custom Metric Configuration
Add New Custom Metrics
To expose additional metrics for your own autoscaling setup:
prometheus-adapter :
rules :
custom :
- seriesQuery : "asr_active_requests"
resources :
overrides :
namespace : { resource : "namespace" }
pod : { resource : "pod" }
name :
matches : "^(.*)$"
as : "${1}"
metricsQuery : "asr_active_requests{<<.LabelMatchers>>}"
- seriesQuery : "asr_batch_queue_depth"
resources :
overrides :
namespace : { resource : "namespace" }
pod : { resource : "pod" }
name :
matches : "^(.*)$"
as : "${1}"
metricsQuery : "asr_batch_queue_depth{<<.LabelMatchers>>}"
- seriesQuery : "asr_active_streams"
resources :
overrides :
namespace : { resource : "namespace" }
pod : { resource : "pod" }
name :
matches : "^(.*)$"
as : "${1}"
metricsQuery : "asr_active_streams{<<.LabelMatchers>>}"
- seriesQuery : "asr_stream_queue_depth"
resources :
overrides :
namespace : { resource : "namespace" }
pod : { resource : "pod" }
name :
matches : "^(.*)$"
as : "${1}"
metricsQuery : "asr_stream_queue_depth{<<.LabelMatchers>>}"
Prometheus Configuration
Retention Policy
Configure how long metrics are stored:
kube-prometheus-stack :
prometheus :
prometheusSpec :
retention : 15d
retentionSize : "50GB"
Storage
Persist Prometheus data:
kube-prometheus-stack :
prometheus :
prometheusSpec :
storageSpec :
volumeClaimTemplate :
spec :
storageClassName : gp3
accessModes : [ "ReadWriteOnce" ]
resources :
requests :
storage : 100Gi
Scrape Interval
Adjust how frequently metrics are collected:
kube-prometheus-stack :
prometheus :
prometheusSpec :
scrapeInterval : 30s
evaluationInterval : 30s
Lower intervals (e.g., 15s) provide faster metrics response but increase storage.
Recording Rules
Pre-compute expensive queries:
kube-prometheus-stack :
prometheus :
prometheusSpec :
additionalScrapeConfigs :
- job_name : 'lightning-asr-aggregated'
scrape_interval : 15s
static_configs :
- targets : [ 'lightning-asr:2269' ]
additionalPrometheusRulesMap :
asr-rules :
groups :
- name : asr_aggregations
interval : 30s
rules :
- record : asr:requests:active_avg
expr : avg(asr_active_requests) by (namespace)
- record : asr:batch_queue:depth_avg
expr : avg(asr_batch_queue_depth) by (namespace)
- record : asr:streams:active_avg
expr : avg(asr_active_streams) by (namespace)
- record : asr:stream_queue:depth_avg
expr : avg(asr_stream_queue_depth) by (namespace)
Use recording rules in your autoscaling queries for better performance.
Alerting Rules
Create alerts for anomalies:
kube-prometheus-stack :
prometheus :
prometheusSpec :
additionalPrometheusRulesMap :
asr-alerts :
groups :
- name : asr_alerts
rules :
- alert : HighBatchQueueDepth
expr : asr_batch_queue_depth > 20
for : 5m
labels :
severity : warning
annotations :
summary : "ASR batch queue depth is high"
description : "{{ $value }} requests are waiting in the batch queue"
- alert : HighStreamQueueDepth
expr : asr_stream_queue_depth > 30
for : 2m
labels :
severity : warning
annotations :
summary : "ASR stream queue depth is high"
description : "{{ $value }} streaming sessions are waiting in Redis"
- alert : HighActiveStreams
expr : asr_active_streams > 100
for : 5m
labels :
severity : warning
annotations :
summary : "ASR active streams are high"
description : "{{ $value }} active streaming sessions"
Debugging Metrics
Check Metrics Endpoint
Directly query Lightning ASR metrics:
kubectl port-forward -n smallest svc/lightning-asr 2269:2269
curl http://localhost:2269/metrics
Expected output:
# HELP asr_active_requests Current active requests
# TYPE asr_active_requests gauge
asr_active_requests{pod="lightning-asr-xxx"} 3
# HELP asr_batch_queue_depth Requests waiting in the batch queue
# TYPE asr_batch_queue_depth gauge
asr_batch_queue_depth{pod="lightning-asr-xxx"} 2
# HELP asr_active_streams Active streaming sessions
# TYPE asr_active_streams gauge
asr_active_streams{pod="lightning-asr-xxx"} 14
# HELP asr_stream_queue_depth Pending sessions in the streaming Redis queue
# TYPE asr_stream_queue_depth gauge
asr_stream_queue_depth{pod="lightning-asr-xxx"} 1
...
Test Prometheus Query
Access Prometheus UI and test queries:
asr_active_requests
asr_batch_queue_depth
asr_active_streams
asr_stream_queue_depth
Check Prometheus Targets
kubectl port-forward -n default svc/smallest-prometheus-stack-prometheus 9090:9090
Navigate to: http://localhost:9090/targets
Verify Lightning ASR targets are “UP”
View Prometheus Logs
kubectl logs -n default -l app.kubernetes.io/name=prometheus --tail=100
Look for scrape errors.
Troubleshooting
Metrics Not Appearing
Check ServiceMonitor is created :
kubectl get servicemonitor -n smallest
Check Prometheus is discovering :
kubectl logs -n default -l app.kubernetes.io/name=prometheus | grep lightning-asr
Check service has metrics port :
kubectl get svc lightning-asr -n smallest -o yaml
Should show:
ports :
- name : metrics
port : 2269
Custom Metrics Not Available
Check Prometheus Adapter logs :
kubectl logs -n kube-system -l app.kubernetes.io/name=prometheus-adapter
Verify adapter configuration :
kubectl get configmap prometheus-adapter -n kube-system -o yaml
Test API manually :
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
High Cardinality Issues
If Prometheus is using too much memory:
Reduce label cardinality
Increase retention limits
Use recording rules for complex queries
kube-prometheus-stack :
prometheus :
prometheusSpec :
resources :
requests :
memory : 4Gi
limits :
memory : 8Gi
Best Practices
Pre-compute expensive queries: - record : asr:batch_queue:depth_avg
expr : avg(asr_batch_queue_depth) by (namespace)
Then use this in your autoscaling logic instead of a raw query
Set Appropriate Scrape Intervals
Balance responsiveness vs storage:
Fast autoscaling: 15s
Normal: 30s
Cost-optimized: 60s
Always persist Prometheus data: storageSpec :
volumeClaimTemplate :
spec :
resources :
requests :
storage : 100Gi
Monitor Prometheus Itself
Track Prometheus performance:
Query duration
Scrape duration
Memory usage
TSDB size
Use Grafana for Visualization
What’s Next?
Grafana Dashboards Visualize metrics