Skip to main content

Overview

The metrics setup enables autoscaling by collecting Lightning ASR metrics with Prometheus and exposing them to Kubernetes HPA through the Prometheus Adapter.

Architecture

/* The original had a syntax error in Mermaid—edges must connect nodes, not labels. “Metrics” is now a node, and edge directions/names are consistent. */

Components

Prometheus

Collects and stores metrics from Lightning ASR pods. Included in chart:
values.yaml
scaling:
  auto:
    enabled: true

kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      serviceMonitorSelectorNilUsesHelmValues: false
      retention: 7d
      resources:
        requests:
          memory: 2Gi

ServiceMonitor

CRD that tells Prometheus which services to scrape. Enabled for Lightning ASR:
values.yaml
scaling:
  auto:
    lightningAsr:
      servicemonitor:
        enabled: true

Prometheus Adapter

Converts Prometheus metrics to Kubernetes custom metrics API. Configuration:
values.yaml
prometheus-adapter:
  prometheus:
    url: http://smallest-prometheus-stack-prometheus.default.svc
    port: 9090
  rules:
    custom:
      - seriesQuery: "asr_active_requests"
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^(.*)$"
          as: "${1}"
        metricsQuery: "asr_active_requests{<<.LabelMatchers>>}"

Available Metrics

Lightning ASR exposes the following metrics:
MetricTypeDescription
asr_active_requestsGaugeCurrent number of active transcription requests
asr_total_requestsCounterTotal requests processed
asr_failed_requestsCounterTotal failed requests
asr_request_duration_secondsHistogramRequest processing time
asr_model_load_time_secondsGaugeTime to load model on startup
asr_gpu_utilizationGaugeGPU utilization percentage
asr_gpu_memory_used_bytesGaugeGPU memory used

Verify Metrics Setup

Check Prometheus

Forward Prometheus port:
kubectl port-forward -n default svc/smallest-prometheus-stack-prometheus 9090:9090
Open http://localhost:9090 and verify:
  1. Status → Targets: Lightning ASR endpoints should be “UP”
  2. Graph: Query asr_active_requests - should return data
  3. Status → Service Discovery: Should show ServiceMonitor

Check ServiceMonitor

kubectl get servicemonitor -n smallest
Expected output:
NAME            AGE
lightning-asr   5m
Describe ServiceMonitor:
kubectl describe servicemonitor lightning-asr -n smallest
Should show:
Spec:
  Endpoints:
    Port: metrics
    Path: /metrics
  Selector:
    Match Labels:
      app: lightning-asr

Check Prometheus Adapter

Verify custom metrics are available:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq -r '.resources[].name' | grep asr
Expected output:
pods/asr_active_requests
pods/asr_total_requests
pods/asr_failed_requests
Query specific metric:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/smallest/pods/*/asr_active_requests" | jq .

Custom Metric Configuration

Add New Custom Metrics

To expose additional metrics to HPA:
values.yaml
prometheus-adapter:
  rules:
    custom:
      - seriesQuery: "asr_active_requests"
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^(.*)$"
          as: "${1}"
        metricsQuery: "asr_active_requests{<<.LabelMatchers>>}"
      
      - seriesQuery: "asr_gpu_utilization"
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          as: "gpu_utilization"
        metricsQuery: "avg_over_time(asr_gpu_utilization{<<.LabelMatchers>>}[2m])"

External Metrics

For cluster-wide metrics:
values.yaml
prometheus-adapter:
  rules:
    external:
      - seriesQuery: 'kube_deployment_status_replicas{deployment="lightning-asr"}'
        metricsQuery: 'sum(kube_deployment_status_replicas{deployment="lightning-asr"})'
        name:
          as: "lightning_asr_replica_count"
        resources:
          overrides:
            namespace: {resource: "namespace"}
Use in HPA:
spec:
  metrics:
    - type: External
      external:
        metric:
          name: lightning_asr_replica_count
        target:
          type: Value
          value: "5"

Prometheus Configuration

Retention Policy

Configure how long metrics are stored:
values.yaml
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      retention: 15d
      retentionSize: "50GB"

Storage

Persist Prometheus data:
values.yaml
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      storageSpec:
        volumeClaimTemplate:
          spec:
            storageClassName: gp3
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 100Gi

Scrape Interval

Adjust how frequently metrics are collected:
values.yaml
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      scrapeInterval: 30s
      evaluationInterval: 30s
Lower intervals (e.g., 15s) provide faster HPA response but increase storage.

Recording Rules

Pre-compute expensive queries:
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      additionalScrapeConfigs:
        - job_name: 'lightning-asr-aggregated'
          scrape_interval: 15s
          static_configs:
            - targets: ['lightning-asr:2269']
      
      additionalPrometheusRulesMap:
        asr-rules:
          groups:
            - name: asr_aggregations
              interval: 30s
              rules:
                - record: asr:requests:rate5m
                  expr: rate(asr_total_requests[5m])
                
                - record: asr:requests:active_avg
                  expr: avg(asr_active_requests) by (namespace)
                
                - record: asr:gpu:utilization_avg
                  expr: avg(asr_gpu_utilization) by (namespace)
Use recording rules in HPA for better performance.

Alerting Rules

Create alerts for anomalies:
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      additionalPrometheusRulesMap:
        asr-alerts:
          groups:
            - name: asr_alerts
              rules:
                - alert: HighErrorRate
                  expr: rate(asr_failed_requests[5m]) > 0.1
                  for: 5m
                  labels:
                    severity: warning
                  annotations:
                    summary: "High ASR error rate"
                    description: "Error rate is {{ $value }} errors/sec"
                
                - alert: HighQueueLength
                  expr: asr_active_requests > 50
                  for: 2m
                  labels:
                    severity: warning
                  annotations:
                    summary: "ASR queue backing up"
                    description: "{{ $value }} requests queued"
                
                - alert: GPUMemoryHigh
                  expr: asr_gpu_memory_used_bytes / 24000000000 > 0.9
                  for: 5m
                  labels:
                    severity: warning
                  annotations:
                    summary: "GPU memory usage high"
                    description: "GPU memory at {{ $value | humanizePercentage }}"

Debugging Metrics

Check Metrics Endpoint

Directly query Lightning ASR metrics:
kubectl port-forward -n smallest svc/lightning-asr 2269:2269
curl http://localhost:2269/metrics
Expected output:
# HELP asr_active_requests Current active requests
# TYPE asr_active_requests gauge
asr_active_requests{pod="lightning-asr-xxx"} 3

# HELP asr_total_requests Total requests processed
# TYPE asr_total_requests counter
asr_total_requests{pod="lightning-asr-xxx"} 1523

...

Test Prometheus Query

Access Prometheus UI and test queries:
asr_active_requests
rate(asr_total_requests[5m])
histogram_quantile(0.95, asr_request_duration_seconds_bucket)

Check Prometheus Targets

kubectl port-forward -n default svc/smallest-prometheus-stack-prometheus 9090:9090
Navigate to: http://localhost:9090/targets Verify Lightning ASR targets are “UP”

View Prometheus Logs

kubectl logs -n default -l app.kubernetes.io/name=prometheus --tail=100
Look for scrape errors.

Troubleshooting

Metrics Not Appearing

Check ServiceMonitor is created:
kubectl get servicemonitor -n smallest
Check Prometheus is discovering:
kubectl logs -n default -l app.kubernetes.io/name=prometheus | grep lightning-asr
Check service has metrics port:
kubectl get svc lightning-asr -n smallest -o yaml
Should show:
ports:
  - name: metrics
    port: 2269

Custom Metrics Not Available

Check Prometheus Adapter logs:
kubectl logs -n kube-system -l app.kubernetes.io/name=prometheus-adapter
Verify adapter configuration:
kubectl get configmap prometheus-adapter -n kube-system -o yaml
Test API manually:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

High Cardinality Issues

If Prometheus is using too much memory:
  1. Reduce label cardinality
  2. Increase retention limits
  3. Use recording rules for complex queries
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      resources:
        requests:
          memory: 4Gi
        limits:
          memory: 8Gi

Best Practices

Pre-compute expensive queries:
- record: asr:requests:rate5m
  expr: rate(asr_total_requests[5m])
Then use in HPA instead of raw query
Balance responsiveness vs storage:
  • Fast autoscaling: 15s
  • Normal: 30s
  • Cost-optimized: 60s
Always persist Prometheus data:
storageSpec:
  volumeClaimTemplate:
    spec:
      resources:
        requests:
          storage: 100Gi
Track Prometheus performance:
  • Query duration
  • Scrape duration
  • Memory usage
  • TSDB size
Don’t rely on Prometheus UIUse Grafana dashboards for opsSee Grafana Dashboards

What’s Next?