Logs Analysis

Overview

Understanding log messages is crucial for diagnosing issues. This guide helps you interpret logs from each component and identify common error patterns.

Log Levels

All components use standard log levels:

Level	Description	Example
`DEBUG`	Detailed diagnostic info	Variable values, function calls
`INFO`	Normal operation events	Request received, model loaded
`WARNING`	Potential issues	Slow response, retry attempt
`ERROR`	Error that needs attention	Failed request, connection error
`CRITICAL`	Severe error	Service crash, unrecoverable error

Lightning ASR Logs

Successful Startup

INFO: Starting Lightning ASR v1.0.0
INFO: GPU detected: NVIDIA A10 (24GB)
INFO: Downloading model from URL...
INFO: Model downloaded: 23.5GB
INFO: Loading model into GPU memory...
INFO: Model loaded successfully (5.2GB GPU memory)
INFO: Warmup inference completed in 3.2s
INFO: Server ready on port 2269

Request Processing

INFO: Request received: req_abc123
DEBUG: Audio duration: 60.5s, sample_rate: 44100
DEBUG: Preprocessing audio...
DEBUG: Running inference...
INFO: Transcription completed in 3.1s (RTF: 0.05x)
INFO: Confidence: 0.95

Common Errors

GPU Not Found

ERROR: No CUDA-capable device detected
ERROR: nvidia-smi command not found
CRITICAL: Cannot initialize GPU, exiting

Cause: GPU not available or drivers not installedSolution:

Check nvidia-smi works
Verify GPU device plugin (Kubernetes)
Check NVIDIA Container Toolkit (Docker)

Out of GPU Memory

ERROR: CUDA out of memory
ERROR: Tried to allocate 2.5GB but only 1.2GB available
WARNING: Reducing batch size

Cause: Not enough GPU memorySolution:

Reduce concurrent requests
Use larger GPU (A10 vs T4)
Scale horizontally (more pods)

Model Download Failed

INFO: Downloading model from https://example.com/model.bin
WARNING: Download attempt 1 failed: Connection timeout
WARNING: Retrying download...
ERROR: Download failed after 3 attempts

Cause: Network issues, invalid URL, disk fullSolution:

Verify MODEL_URL
Check disk space: df -h
Test URL: curl -I $MODEL_URL
Use shared storage (EFS)

Audio Processing Error

ERROR: Failed to process audio: req_xyz789
ERROR: Unsupported audio format: audio/webm
ERROR: Audio file corrupted or invalid

Cause: Invalid audio fileSolution:

Verify audio format (WAV, MP3, FLAC supported)
Check file is not corrupted
Ensure proper sample rate (16kHz+)

API Server Logs

Successful Startup

INFO: Starting API Server v1.0.0
INFO: Connecting to Lightning ASR at http://lightning-asr:2269
INFO: Connected to Lightning ASR (2 replicas)
INFO: Connecting to License Proxy at http://license-proxy:3369
INFO: License validated
INFO: API server listening on port 7100

Request Handling

INFO: POST /v1/listen from 10.0.1.5
DEBUG: Request ID: req_abc123
DEBUG: Audio URL: https://example.com/audio.wav
DEBUG: Routing to Lightning ASR pod: lightning-asr-0
INFO: Response time: 3.2s
INFO: Status: 200 OK

Common Errors

Authentication Failed

WARNING: Invalid license key from 10.0.1.5
WARNING: Missing Authorization header
ERROR: License validation failed: expired

Cause: Invalid, missing, or expired license keySolution:

Verify Authorization: Token <key> header
Check license key is correct
Renew expired license

No ASR Workers

ERROR: No Lightning ASR workers available
WARNING: Request queued: req_abc123
WARNING: Queue size: 15

Cause: All Lightning ASR pods busy or downSolution:

Check Lightning ASR pods: kubectl get pods
Scale up replicas
Check HPA configuration

Request Timeout

ERROR: Request timeout after 300s
ERROR: Lightning ASR pod not responding: lightning-asr-0
WARNING: Retrying with different pod

Cause: Lightning ASR overloaded or crashedSolution:

Check Lightning ASR logs
Increase timeout
Scale up pods

License Proxy Logs

Successful Validation

INFO: Starting License Proxy v1.0.0
INFO: License key loaded
INFO: Connecting to console-api.smallest.ai
INFO: License validated successfully
INFO: License valid until: 2025-12-31T23:59:59Z
INFO: Grace period: 24 hours
INFO: Server listening on port 3369

Usage Reporting

DEBUG: Reporting usage batch: 150 requests
DEBUG: Total duration: 3600s
DEBUG: Features: [streaming, punctuation]
INFO: Usage reported successfully

Common Errors

License Validation Failed

ERROR: License validation failed: Invalid license key
ERROR: License server returned 401 Unauthorized
CRITICAL: Cannot start without valid license

Cause: Invalid or expired licenseSolution:

Verify LICENSE_KEY is correct
Check license hasn’t expired
Contact support@smallest.ai

Connection Failed

WARNING: Connection to console-api.smallest.ai failed
WARNING: Connection timeout after 10s
INFO: Using cached validation
INFO: Grace period active (23h remaining)

Cause: Network connectivity issueSolution:

Test: curl https://console-api.smallest.ai
Check firewall allows HTTPS
Restore connectivity before grace period expires

Grace Period Expiring

WARNING: Grace period expires in 1 hour
WARNING: Cannot connect to license server
ERROR: Grace period expired
CRITICAL: Service will stop accepting requests

Cause: Extended network outageSolution:

Restore network connectivity immediately
Check firewall rules
Contact support if persistent

Redis Logs

Normal Operation

Ready to accept connections
Client connected from 10.0.1.5:45678
DB 0: 1523 keys (expires: 0)

Common Errors

Memory Limit Reached

WARNING: Memory usage: 95%
ERROR: OOM command not allowed when used memory > 'maxmemory'

Solution:

Increase memory limit
Enable eviction policy
Clear old keys

Persistence Issues

ERROR: Failed writing the RDB file
ERROR: Disk is full

Solution:

Increase disk space
Disable persistence if not needed
Clean up old snapshots

Log Pattern Analysis

Error Rate Analysis

Count errors in last 1000 lines:

kubectl logs <pod> --tail=1000 | grep -c "ERROR"

Group errors by type:

kubectl logs <pod> | grep "ERROR" | sort | uniq -c | sort -rn

Performance Analysis

Extract response times:

kubectl logs <pod> | grep "Response time" | awk '{print $NF}' | sort -n

Calculate average:

kubectl logs <pod> | grep "Response time" | awk '{sum+=$NF; count++} END {print sum/count}'

Request Tracking

Follow a specific request ID:

kubectl logs <pod> | grep "req_abc123"

Across all pods:

kubectl logs -l app=lightning-asr | grep "req_abc123"

Log Aggregation

Using stern

Install stern:

brew install stern

Follow logs from all Lightning ASR pods:

stern lightning-asr -n smallest

Filter by pattern:

stern lightning-asr -n smallest --grep "ERROR"

Using Loki (if installed)

Query logs via LogQL:

{app="lightning-asr"} |= "ERROR"
{app="api-server"} |= "req_abc123"
rate({app="lightning-asr"}[5m])

Structured Logging

Parse JSON Logs

If logs are in JSON format:

kubectl logs <pod> | jq 'select(.level=="ERROR")'
kubectl logs <pod> | jq 'select(.duration > 1000)'
kubectl logs <pod> | jq '.message' -r

Filter by Field

kubectl logs <pod> | jq 'select(.request_id=="req_abc123")'
kubectl logs <pod> | jq 'select(.component=="license_proxy")'

Log Retention

Configure Log Rotation

Docker:

docker-compose.yml

services:
  lightning-asr:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

Kubernetes:

apiVersion: v1
kind: Pod
metadata:
  name: lightning-asr
spec:
  containers:
  - name: lightning-asr
    imagePullPolicy: Always

Kubernetes automatically rotates logs via kubelet.

Export Logs

Save logs for analysis:

kubectl logs <pod> > logs.txt
kubectl logs <pod> --since=1h > logs-last-hour.txt
kubectl logs <pod> --since-time=2024-01-15T10:00:00Z > logs-since.txt

Debugging Log Issues

No Logs Appearing

Check pod is running:

kubectl get pods -n smallest
kubectl describe pod <pod-name>

Check stdout/stderr:

kubectl exec -it <pod> -- sh -c "ls -la /proc/1/fd/"

Logs Truncated

Increase log size limits:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: privileged
spec:
  containers:
  - name: app
    env:
    - name: LOG_MAX_SIZE
      value: "100M"

Best Practices

Use Structured Logging

Prefer JSON format for easier parsing:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "ERROR",
  "message": "Request failed",
  "request_id": "req_abc123",
  "duration_ms": 3200
}

Include Context

Always include relevant context in logs:

Request ID
Component name
Timestamp
User/session info (if applicable)

Set Appropriate Levels

Use correct log levels:

DEBUG: Development only
INFO: Normal operation
WARNING: Potential issues
ERROR: Actual problems
CRITICAL: Service-breaking issues

Aggregate Logs

Use centralized logging:

ELK Stack (Elasticsearch, Logstash, Kibana)
Loki + Grafana
CloudWatch Logs (AWS)
Cloud Logging (GCP)

What’s Next?

Common Issues

Quick solutions to frequent problems

Debugging Guide

Advanced debugging techniques

Getting Started

Docker Setup

Kubernetes Setup

Troubleshooting

​Overview

​Log Levels

​Lightning ASR Logs

​Successful Startup

​Request Processing

​Common Errors

​API Server Logs

​Successful Startup

​Request Handling

​Common Errors

​License Proxy Logs

​Successful Validation

​Usage Reporting

​Common Errors

​Redis Logs

​Normal Operation

​Common Errors

​Log Pattern Analysis

​Error Rate Analysis

​Performance Analysis

​Request Tracking

​Log Aggregation

​Using stern

​Using Loki (if installed)

​Structured Logging

​Parse JSON Logs

​Filter by Field

​Log Retention

​Configure Log Rotation

​Export Logs

​Debugging Log Issues

​No Logs Appearing

​Logs Truncated

​Best Practices

​What’s Next?

Common Issues

Debugging Guide

Overview

Log Levels

Lightning ASR Logs

Successful Startup

Request Processing

Common Errors

API Server Logs

Successful Startup

Request Handling

Common Errors

License Proxy Logs

Successful Validation

Usage Reporting

Common Errors

Redis Logs

Normal Operation

Common Errors

Log Pattern Analysis

Error Rate Analysis

Performance Analysis

Request Tracking

Log Aggregation

Using stern

Using Loki (if installed)

Structured Logging

Parse JSON Logs

Filter by Field

Log Retention

Configure Log Rotation

Export Logs

Debugging Log Issues

No Logs Appearing

Logs Truncated

Best Practices

What’s Next?