Overview
Grafana provides powerful visualization of Lightning ASR metrics, autoscaling behavior, and system performance. This guide covers accessing Grafana, importing dashboards, and creating custom visualizations.Access Grafana
Enable Grafana
Ensure Grafana is enabled in your Helm values:values.yaml
Port Forward
Access Grafana locally:Default Credentials
- Username:
admin - Password:
prom-operator(or custom password fromadminPassword)
Expose Externally
For permanent access, expose via LoadBalancer or Ingress:- LoadBalancer
- Ingress
values.yaml
Import ASR Dashboard
The Smallest Self-Host repository includes a pre-built ASR dashboard.Import from File
1
Get Dashboard JSON
The dashboard is available at
grafana/dashboards/asr-dashboard.json in the repository.2
Open Grafana
Navigate to Grafana → Dashboards → Import
3
Upload JSON
- Click “Upload JSON file”
- Select
asr-dashboard.json - Click “Load”
4
Configure Data Source
- Select Prometheus data source:
Prometheus - Click “Import”
Import via ConfigMap
Automatically load dashboard on Grafana startup:values.yaml
ASR Dashboard Overview
The pre-built dashboard includes the following panels:Active Requests
Shows current requests being processed:- Metric:
asr_active_requests - Visualization: Stat panel with thresholds
- Colors:
- Green: 0-5 requests
- Yellow: 5-10 requests
- Orange: 10-20 requests
- Red: 20+ requests
Request Rate
Requests per second over time:- Metric:
rate(asr_total_requests[5m]) - Visualization: Time series graph
- Use: Track traffic patterns
Error Rate
Failed requests percentage:- Metric:
rate(asr_failed_requests[5m]) / rate(asr_total_requests[5m]) * 100 - Visualization: Stat panel + time series
- Alert: Warning if > 5%
Response Time
Request duration percentiles:- Metrics:
- P50:
histogram_quantile(0.50, asr_request_duration_seconds_bucket) - P95:
histogram_quantile(0.95, asr_request_duration_seconds_bucket) - P99:
histogram_quantile(0.99, asr_request_duration_seconds_bucket)
- P50:
- Visualization: Time series graph
Pod Count
Number of Lightning ASR replicas:- Metric:
count(asr_active_requests) - Visualization: Stat panel
- Use: Monitor autoscaling
GPU Utilization
GPU usage per pod:- Metric:
asr_gpu_utilization - Visualization: Time series graph
- Use: Ensure GPUs are utilized
GPU Memory
GPU memory usage:- Metric:
asr_gpu_memory_used_bytes / 1024 / 1024 / 1024 - Visualization: Gauge + time series
- Use: Monitor memory leaks
Create Custom Dashboards
Add New Dashboard
1
Create Dashboard
Grafana → Dashboards → New Dashboard
2
Add Panel
Click “Add panel”
3
Configure Query
- Data source: Prometheus
- Metric:
asr_active_requests - Legend:
{{pod}}
4
Customize Visualization
- Choose visualization type (Time series, Stat, Gauge, etc.)
- Configure thresholds
- Set units and decimals
5
Save
Click “Save dashboard”Enter name: “Custom ASR Dashboard”
Useful Queries
Average Active Requests
Total Throughput (requests/hour)
Pod Resource Usage
Autoscaling Events
GPU Temperature
Dashboard Variables
Add variables for dynamic filtering:Namespace Variable
1
Dashboard Settings
Click gear icon → Variables → Add variable
2
Configure Variable
- Name:
namespace - Type: Query
- Data source: Prometheus
- Query:
label_values(asr_active_requests, namespace) - Multi-value: Enabled
3
Use in Queries
Update panels to use variable:
Pod Variable
Time Range Variable
Alerting
Configure Alert Rules
1
Edit Panel
Open panel → Alert tab
2
Create Alert
- Name: High Active Requests
- Evaluate every: 1m
- For: 5m
3
Conditions
4
Notification
- Choose notification channel
- Add message template
Alert Notification Channels
Configure notifications:- Email
- Slack
- PagerDuty
Grafana → Alerting → Notification channels → Add channel
- Type: Email
- Addresses: [email protected]
Pre-Built Dashboard Examples
System Overview Dashboard
Autoscaling Dashboard
Track HPA behavior:Cost Dashboard
Monitor resource costs:Best Practices
Use Dashboard Folders
Use Dashboard Folders
Organize dashboards by category:
- Smallest Overview: High-level metrics
- Lightning ASR: Detailed ASR metrics
- Infrastructure: Node and cluster metrics
- Autoscaling: HPA and scaling behavior
Set Appropriate Time Ranges
Set Appropriate Time Ranges
Default time ranges for different views:
- Real-time monitoring: Last 15 minutes
- Troubleshooting: Last 1 hour
- Analysis: Last 24 hours
- Trends: Last 7 days
Use Annotations
Use Annotations
Mark important events:
- Deployments
- Scaling events
- Incidents
- Configuration changes
Template Dashboards
Template Dashboards
Create template dashboards for:
- Different environments (dev, staging, prod)
- Different namespaces
- Different models
Export and Version Control
Export and Version Control
Save dashboard JSON to git:
Troubleshooting
Grafana Not Showing Data
Check Prometheus data source: Grafana → Configuration → Data Sources → Prometheus- URL:
http://smallest-prometheus-stack-prometheus:9090 - Access: Server (default)

