GPU Nodes Configuration

Overview

This guide covers advanced configuration and optimization for GPU nodes in AWS EKS, including node taints, tolerations, labels, and performance tuning.

GPU Node Configuration

Node Labels

Labels help Kubernetes schedule pods on the correct nodes.

Automatic Labels

EKS automatically adds these labels to GPU nodes:

node.kubernetes.io/instance-type: g5.xlarge
beta.kubernetes.io/instance-type: g5.xlarge
topology.kubernetes.io/zone: us-east-1a
topology.kubernetes.io/region: us-east-1

Custom Labels

Add custom labels when creating node groups:

cluster-config.yaml

managedNodeGroups:
  - name: gpu-nodes
    instanceType: g5.xlarge
    labels:
      workload: gpu
      nvidia.com/gpu: "true"
      gpu-type: a10
      cost-tier: spot

Or add labels to existing nodes:

kubectl label nodes <node-name> workload=gpu
kubectl label nodes <node-name> gpu-type=a10

Node Taints

Taints prevent non-GPU workloads from running on expensive GPU nodes.

Add Taints During Node Group Creation

cluster-config.yaml

managedNodeGroups:
  - name: gpu-nodes
    instanceType: g5.xlarge
    taints:
      - key: nvidia.com/gpu
        value: "true"
        effect: NoSchedule

Add Taints to Existing Nodes

kubectl taint nodes <node-name> nvidia.com/gpu=true:NoSchedule

Tolerations in Pod Specs

Pods must have matching tolerations to run on tainted nodes:

values.yaml

lightningAsr:
  tolerations:
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule
    - key: nvidia.com/gpu
      operator: Equal
      value: "true"
      effect: NoSchedule

Node Selectors

Using Instance Type

Most common approach for AWS:

values.yaml

lightningAsr:
  nodeSelector:
    node.kubernetes.io/instance-type: g5.xlarge

Using Custom Labels

values.yaml

lightningAsr:
  nodeSelector:
    workload: gpu
    gpu-type: a10

Multiple Selectors

Combine multiple selectors for precise placement:

values.yaml

lightningAsr:
  nodeSelector:
    node.kubernetes.io/instance-type: g5.xlarge
    topology.kubernetes.io/zone: us-east-1a
    cost-tier: on-demand

NVIDIA Device Plugin

The NVIDIA device plugin makes GPUs available to Kubernetes pods.

Installation via GPU Operator

The recommended approach is using the NVIDIA GPU Operator (included in the Smallest Helm chart):

values.yaml

gpu-operator:
  enabled: true
  driver:
    enabled: true
  toolkit:
    enabled: true
  devicePlugin:
    enabled: true

Manual Installation

Alternatively, install the device plugin directly:

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.0/nvidia-device-plugin.yml

Verify Device Plugin

kubectl get pods -n kube-system | grep nvidia-device-plugin
kubectl logs -n kube-system -l name=nvidia-device-plugin

Check GPU Availability

kubectl get nodes -o json | \
  jq -r '.items[] | select(.status.capacity."nvidia.com/gpu" != null) | 
  "\(.metadata.name)\t\(.status.capacity."nvidia.com/gpu")"'

GPU Resource Limits

Request GPU in Pod Spec

The Lightning ASR deployment automatically requests GPU:

resources:
  limits:
    nvidia.com/gpu: 1
  requests:
    nvidia.com/gpu: 1

Multiple GPUs

For pods that need multiple GPUs:

resources:
  limits:
    nvidia.com/gpu: 2
  requests:
    nvidia.com/gpu: 2

Smallest Self-Host Lightning ASR is optimized for single GPU per pod. Use multiple pods for scaling rather than multiple GPUs per pod.

GPU Performance Optimization

Enable GPU Persistence Mode

GPU persistence mode keeps the NVIDIA driver loaded, reducing initialization time:

gpu-operator:
  enabled: true
  driver:
    enabled: true
    env:
      - name: NVIDIA_DRIVER_CAPABILITIES
        value: "compute,utility"
      - name: NVIDIA_REQUIRE_CUDA
        value: "cuda>=11.8"
  toolkit:
    enabled: true
    env:
      - name: NVIDIA_MPS_ENABLED
        value: "1"

Use DaemonSet for GPU Configuration

Create a DaemonSet to configure GPU settings on all GPU nodes:

gpu-config-daemonset.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: gpu-config
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: gpu-config
  template:
    metadata:
      labels:
        name: gpu-config
    spec:
      hostPID: true
      nodeSelector:
        nvidia.com/gpu: "true"
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
      containers:
      - name: gpu-config
        image: nvidia/cuda:11.8.0-base-ubuntu22.04
        command:
          - /bin/bash
          - -c
          - |
            nvidia-smi -pm 1
            nvidia-smi --auto-boost-default=DISABLED
            nvidia-smi -ac 1215,1410
            sleep infinity
        securityContext:
          privileged: true
        volumeMounts:
          - name: sys
            mountPath: /sys
      volumes:
        - name: sys
          hostPath:
            path: /sys

Apply:

kubectl apply -f gpu-config-daemonset.yaml

Monitor GPU Utilization

Deploy NVIDIA DCGM exporter for Prometheus metrics:

helm repo add gpu-helm-charts https://nvidia.github.io/dcgm-exporter/helm-charts
helm repo update

helm install dcgm-exporter gpu-helm-charts/dcgm-exporter \
  --namespace kube-system \
  --set serviceMonitor.enabled=true

Multi-GPU Strategies

Strategy 1: One Pod per GPU (Recommended)

Scale horizontally with one pod per GPU:

values.yaml

scaling:
  auto:
    enabled: true
    lightningAsr:
      hpa:
        enabled: true
        minReplicas: 1
        maxReplicas: 10

lightningAsr:
  resources:
    limits:
      nvidia.com/gpu: 1

Allow multiple pods to share a single GPU (reduces isolation):

gpu-operator:
  enabled: true
  devicePlugin:
    config:
      name: time-slicing-config
      default: any
      sharing:
        timeSlicing:
          replicas: 4

GPU sharing reduces isolation and can impact performance. Use only if cost is more critical than performance.

Strategy 3: Multi-Instance GPU (MIG)

For A100 and A30 GPUs, use MIG to partition GPUs:

nvidia-smi mig -cgi 9,9,9,9,9,9,9 -C

Configure pods to use MIG instances:

resources:
  limits:
    nvidia.com/mig-1g.5gb: 1

Node Auto-Scaling

Configure Auto-Scaling Groups

When creating node groups, enable auto-scaling:

cluster-config.yaml

managedNodeGroups:
  - name: gpu-nodes
    instanceType: g5.xlarge
    minSize: 0
    maxSize: 10
    desiredCapacity: 1
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/smallest-cluster: "owned"

Install Cluster Autoscaler

See Cluster Autoscaler for full setup. Quick enable:

values.yaml

cluster-autoscaler:
  enabled: true
  autoDiscovery:
    clusterName: smallest-cluster
  awsRegion: us-east-1
  nodeSelector:
    workload: cpu

Run Cluster Autoscaler on CPU nodes, not GPU nodes, to avoid wasting GPU resources.

Cost Optimization

Use Spot Instances

Save up to 70% with Spot instances:

cluster-config.yaml

managedNodeGroups:
  - name: gpu-nodes-spot
    instanceType: g5.xlarge
    minSize: 0
    maxSize: 10
    desiredCapacity: 1
    spot: true
    instancesDistribution:
      maxPrice: 0.50
      instanceTypes: ["g5.xlarge", "g5.2xlarge"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
      spotAllocationStrategy: capacity-optimized
    labels:
      capacity-type: spot
    taints:
      - key: nvidia.com/gpu
        value: "true"
        effect: NoSchedule

Handle Spot Interruptions

Add pod disruption budget:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: lightning-asr-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: lightning-asr

Configure graceful shutdown:

values.yaml

lightningAsr:
  terminationGracePeriodSeconds: 120

Mixed On-Demand and Spot

Combine both for reliability:

cluster-config.yaml

managedNodeGroups:
  - name: gpu-nodes-ondemand
    instanceType: g5.xlarge
    minSize: 1
    maxSize: 3
    labels:
      capacity-type: on-demand
      
  - name: gpu-nodes-spot
    instanceType: g5.xlarge
    minSize: 0
    maxSize: 10
    spot: true
    labels:
      capacity-type: spot

Use pod affinity to prefer spot:

values.yaml

lightningAsr:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          preference:
            matchExpressions:
              - key: capacity-type
                operator: In
                values:
                  - spot

Monitoring GPU Nodes

View GPU Node Status

kubectl get nodes -l nvidia.com/gpu=true

Check GPU Allocation

kubectl describe nodes -l nvidia.com/gpu=true | grep -A 5 "Allocated resources"

GPU Utilization

Using NVIDIA SMI:

kubectl run nvidia-smi --rm -it --restart=Never \
  --image=nvidia/cuda:11.8.0-base-ubuntu22.04 \
  --overrides='{"spec":{"nodeSelector":{"nvidia.com/gpu":"true"},"tolerations":[{"key":"nvidia.com/gpu","operator":"Exists"}]}}' \
  -- nvidia-smi

Troubleshooting

GPU Not Detected

Check NVIDIA device plugin:

kubectl get pods -n kube-system | grep nvidia
kubectl logs -n kube-system -l name=nvidia-device-plugin

Verify driver on node:

kubectl debug node/<node-name> -it --image=ubuntu
apt-get update && apt-get install -y nvidia-utils
nvidia-smi

Pods Not Scheduling on GPU Nodes

Check tolerations:

kubectl describe pod <pod-name> | grep -A 5 Tolerations

Check node selector:

kubectl get pod <pod-name> -o jsonpath='{.spec.nodeSelector}'

Check node taints:

kubectl describe node <node-name> | grep Taints

GPU Out of Memory

Check pod resource limits:

kubectl describe pod <pod-name> | grep -A 5 Limits

Monitor GPU memory:

kubectl exec <pod-name> -- nvidia-smi

Best Practices

Isolate GPU Workloads

Always use taints and tolerations to prevent non-GPU workloads from running on GPU nodes:

taints:
  - key: nvidia.com/gpu
    value: "true"
    effect: NoSchedule

Set Resource Limits

Always specify GPU resource requests and limits:

resources:
  limits:
    nvidia.com/gpu: 1
  requests:
    nvidia.com/gpu: 1

Use Node Auto-Scaling

Configure auto-scaling to scale GPU nodes to zero during off-hours:

minSize: 0
maxSize: 10

Monitor GPU Utilization

Use DCGM exporter and Grafana to monitor GPU metrics:

GPU utilization
Memory usage
Temperature
Power consumption

Test Spot Interruptions

Regularly test your application’s response to spot interruptions:

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

What’s Next?

EFS Configuration

Set up shared storage for model caching

HPA Configuration

Configure pod autoscaling based on metrics

Cluster Autoscaler

Enable automatic node scaling

Monitoring

Set up Grafana dashboards

Getting Started

Docker Setup

Kubernetes Setup

Troubleshooting

​Overview

​GPU Node Configuration

​Node Labels

​Automatic Labels

​Custom Labels

​Node Taints

​Add Taints During Node Group Creation

​Add Taints to Existing Nodes

​Tolerations in Pod Specs

​Node Selectors

​Using Instance Type

​Using Custom Labels

​Multiple Selectors

​NVIDIA Device Plugin

​Installation via GPU Operator

​Manual Installation

​Verify Device Plugin

​Check GPU Availability

​GPU Resource Limits

​Request GPU in Pod Spec

​Multiple GPUs

​GPU Performance Optimization

​Enable GPU Persistence Mode

​Use DaemonSet for GPU Configuration

​Monitor GPU Utilization

​Multi-GPU Strategies

​Strategy 1: One Pod per GPU (Recommended)

​Strategy 2: GPU Sharing (Time-Slicing)

​Strategy 3: Multi-Instance GPU (MIG)

​Node Auto-Scaling

​Configure Auto-Scaling Groups

​Install Cluster Autoscaler

​Cost Optimization

​Use Spot Instances

​Handle Spot Interruptions

​Mixed On-Demand and Spot

​Monitoring GPU Nodes

​View GPU Node Status

​Check GPU Allocation

​GPU Utilization

​Troubleshooting

​GPU Not Detected

​Pods Not Scheduling on GPU Nodes

​GPU Out of Memory

​Best Practices

​What’s Next?

EFS Configuration

HPA Configuration

Cluster Autoscaler

Monitoring

Overview

GPU Node Configuration

Node Labels

Automatic Labels

Custom Labels

Node Taints

Add Taints During Node Group Creation

Add Taints to Existing Nodes

Tolerations in Pod Specs

Node Selectors

Using Instance Type

Using Custom Labels

Multiple Selectors

NVIDIA Device Plugin

Installation via GPU Operator

Manual Installation

Verify Device Plugin

Check GPU Availability

GPU Resource Limits

Request GPU in Pod Spec

Multiple GPUs

GPU Performance Optimization

Enable GPU Persistence Mode

Use DaemonSet for GPU Configuration

Monitor GPU Utilization

Multi-GPU Strategies

Strategy 1: One Pod per GPU (Recommended)

Strategy 2: GPU Sharing (Time-Slicing)

Strategy 3: Multi-Instance GPU (MIG)

Node Auto-Scaling

Configure Auto-Scaling Groups

Install Cluster Autoscaler

Cost Optimization

Use Spot Instances

Handle Spot Interruptions

Mixed On-Demand and Spot

Monitoring GPU Nodes

View GPU Node Status

Check GPU Allocation

GPU Utilization

Troubleshooting

GPU Not Detected

Pods Not Scheduling on GPU Nodes

GPU Out of Memory

Best Practices

What’s Next?