Skip to main content

Overview

This page focuses on collecting and validating Lightning ASR metrics with Prometheus and exposing them through the Prometheus Adapter.
Autoscaling documentation is currently under active development. Use this page as a metrics reference. If you need autoscaling now, configure your own HPA/KEDA rules using these metrics.

Architecture

/* The original had a syntax error in Mermaid—edges must connect nodes, not labels. “Metrics” is now a node, and edge directions/names are consistent. */

Components

Prometheus

Collects and stores metrics from Lightning ASR pods. Included in chart:
values.yaml
scaling:
  auto:
    enabled: true

kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      serviceMonitorSelectorNilUsesHelmValues: false
      retention: 7d
      resources:
        requests:
          memory: 2Gi

ServiceMonitor

CRD that tells Prometheus which services to scrape. Enabled for Lightning ASR:
values.yaml
scaling:
  auto:
    lightningAsr:
      servicemonitor:
        enabled: true

Prometheus Adapter

Converts Prometheus metrics to Kubernetes custom metrics API. Configuration:
values.yaml
prometheus-adapter:
  prometheus:
    url: http://smallest-prometheus-stack-prometheus.default.svc
    port: 9090
  rules:
    custom:
      - seriesQuery: "asr_active_requests"
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^(.*)$"
          as: "${1}"
        metricsQuery: "asr_active_requests{<<.LabelMatchers>>}"
      - seriesQuery: "asr_batch_queue_depth"
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^(.*)$"
          as: "${1}"
        metricsQuery: "asr_batch_queue_depth{<<.LabelMatchers>>}"
      - seriesQuery: "asr_active_streams"
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^(.*)$"
          as: "${1}"
        metricsQuery: "asr_active_streams{<<.LabelMatchers>>}"
      - seriesQuery: "asr_stream_queue_depth"
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^(.*)$"
          as: "${1}"
        metricsQuery: "asr_stream_queue_depth{<<.LabelMatchers>>}"

Available Metrics

Lightning ASR exposes the following metrics:
MetricTypeDescription
asr_active_requestsGaugeActive batch requests currently being processed on GPU
asr_batch_queue_depthGaugeRequests waiting in the batch queue
asr_active_streamsGaugeActive streaming sessions
asr_stream_queue_depthGaugePending sessions in the streaming Redis queue

Verify Metrics Setup

Check Prometheus

Forward Prometheus port:
kubectl port-forward -n default svc/smallest-prometheus-stack-prometheus 9090:9090
Open http://localhost:9090 and verify:
  1. Status → Targets: Lightning ASR endpoints should be “UP”
  2. Graph: Query asr_active_requests or asr_batch_queue_depth - should return data
  3. Status → Service Discovery: Should show ServiceMonitor

Check ServiceMonitor

kubectl get servicemonitor -n smallest
Expected output:
NAME            AGE
lightning-asr   5m
Describe ServiceMonitor:
kubectl describe servicemonitor lightning-asr -n smallest
Should show:
Spec:
  Endpoints:
    Port: metrics
    Path: /metrics
  Selector:
    Match Labels:
      app: lightning-asr

Check Prometheus Adapter

Verify custom metrics are available:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq -r '.resources[].name' | grep asr
Expected output:
pods/asr_active_requests
pods/asr_batch_queue_depth
pods/asr_active_streams
pods/asr_stream_queue_depth
Query specific metric:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/smallest/pods/*/asr_active_requests" | jq .

Custom Metric Configuration

Add New Custom Metrics

To expose additional metrics for your own autoscaling setup:
values.yaml
prometheus-adapter:
  rules:
    custom:
      - seriesQuery: "asr_active_requests"
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^(.*)$"
          as: "${1}"
        metricsQuery: "asr_active_requests{<<.LabelMatchers>>}"
      
      - seriesQuery: "asr_batch_queue_depth"
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^(.*)$"
          as: "${1}"
        metricsQuery: "asr_batch_queue_depth{<<.LabelMatchers>>}"

      - seriesQuery: "asr_active_streams"
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^(.*)$"
          as: "${1}"
        metricsQuery: "asr_active_streams{<<.LabelMatchers>>}"

      - seriesQuery: "asr_stream_queue_depth"
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^(.*)$"
          as: "${1}"
        metricsQuery: "asr_stream_queue_depth{<<.LabelMatchers>>}"

Prometheus Configuration

Retention Policy

Configure how long metrics are stored:
values.yaml
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      retention: 15d
      retentionSize: "50GB"

Storage

Persist Prometheus data:
values.yaml
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      storageSpec:
        volumeClaimTemplate:
          spec:
            storageClassName: gp3
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 100Gi

Scrape Interval

Adjust how frequently metrics are collected:
values.yaml
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      scrapeInterval: 30s
      evaluationInterval: 30s
Lower intervals (e.g., 15s) provide faster metrics response but increase storage.

Recording Rules

Pre-compute expensive queries:
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      additionalScrapeConfigs:
        - job_name: 'lightning-asr-aggregated'
          scrape_interval: 15s
          static_configs:
            - targets: ['lightning-asr:2269']
      
      additionalPrometheusRulesMap:
        asr-rules:
          groups:
            - name: asr_aggregations
              interval: 30s
              rules:
                - record: asr:requests:active_avg
                  expr: avg(asr_active_requests) by (namespace)
                
                - record: asr:batch_queue:depth_avg
                  expr: avg(asr_batch_queue_depth) by (namespace)

                - record: asr:streams:active_avg
                  expr: avg(asr_active_streams) by (namespace)

                - record: asr:stream_queue:depth_avg
                  expr: avg(asr_stream_queue_depth) by (namespace)
Use recording rules in your autoscaling queries for better performance.

Alerting Rules

Create alerts for anomalies:
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      additionalPrometheusRulesMap:
        asr-alerts:
          groups:
            - name: asr_alerts
              rules:
                - alert: HighBatchQueueDepth
                  expr: asr_batch_queue_depth > 20
                  for: 5m
                  labels:
                    severity: warning
                  annotations:
                    summary: "ASR batch queue depth is high"
                    description: "{{ $value }} requests are waiting in the batch queue"
                
                - alert: HighStreamQueueDepth
                  expr: asr_stream_queue_depth > 30
                  for: 2m
                  labels:
                    severity: warning
                  annotations:
                    summary: "ASR stream queue depth is high"
                    description: "{{ $value }} streaming sessions are waiting in Redis"
                
                - alert: HighActiveStreams
                  expr: asr_active_streams > 100
                  for: 5m
                  labels:
                    severity: warning
                  annotations:
                    summary: "ASR active streams are high"
                    description: "{{ $value }} active streaming sessions"

Debugging Metrics

Check Metrics Endpoint

Directly query Lightning ASR metrics:
kubectl port-forward -n smallest svc/lightning-asr 2269:2269
curl http://localhost:2269/metrics
Expected output:
# HELP asr_active_requests Current active requests
# TYPE asr_active_requests gauge
asr_active_requests{pod="lightning-asr-xxx"} 3

# HELP asr_batch_queue_depth Requests waiting in the batch queue
# TYPE asr_batch_queue_depth gauge
asr_batch_queue_depth{pod="lightning-asr-xxx"} 2

# HELP asr_active_streams Active streaming sessions
# TYPE asr_active_streams gauge
asr_active_streams{pod="lightning-asr-xxx"} 14

# HELP asr_stream_queue_depth Pending sessions in the streaming Redis queue
# TYPE asr_stream_queue_depth gauge
asr_stream_queue_depth{pod="lightning-asr-xxx"} 1

...

Test Prometheus Query

Access Prometheus UI and test queries:
asr_active_requests
asr_batch_queue_depth
asr_active_streams
asr_stream_queue_depth

Check Prometheus Targets

kubectl port-forward -n default svc/smallest-prometheus-stack-prometheus 9090:9090
Navigate to: http://localhost:9090/targets Verify Lightning ASR targets are “UP”

View Prometheus Logs

kubectl logs -n default -l app.kubernetes.io/name=prometheus --tail=100
Look for scrape errors.

Troubleshooting

Metrics Not Appearing

Check ServiceMonitor is created:
kubectl get servicemonitor -n smallest
Check Prometheus is discovering:
kubectl logs -n default -l app.kubernetes.io/name=prometheus | grep lightning-asr
Check service has metrics port:
kubectl get svc lightning-asr -n smallest -o yaml
Should show:
ports:
  - name: metrics
    port: 2269

Custom Metrics Not Available

Check Prometheus Adapter logs:
kubectl logs -n kube-system -l app.kubernetes.io/name=prometheus-adapter
Verify adapter configuration:
kubectl get configmap prometheus-adapter -n kube-system -o yaml
Test API manually:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

High Cardinality Issues

If Prometheus is using too much memory:
  1. Reduce label cardinality
  2. Increase retention limits
  3. Use recording rules for complex queries
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      resources:
        requests:
          memory: 4Gi
        limits:
          memory: 8Gi

Best Practices

Pre-compute expensive queries:
- record: asr:batch_queue:depth_avg
  expr: avg(asr_batch_queue_depth) by (namespace)
Then use this in your autoscaling logic instead of a raw query
Balance responsiveness vs storage:
  • Fast autoscaling: 15s
  • Normal: 30s
  • Cost-optimized: 60s
Always persist Prometheus data:
storageSpec:
  volumeClaimTemplate:
    spec:
      resources:
        requests:
          storage: 100Gi
Track Prometheus performance:
  • Query duration
  • Scrape duration
  • Memory usage
  • TSDB size
Don’t rely on Prometheus UIUse Grafana dashboards for opsSee Grafana Dashboards

What’s Next?

Grafana Dashboards

Visualize metrics