Overview
This guide covers advanced configuration and optimization for GPU nodes in AWS EKS, including node taints, tolerations, labels, and performance tuning.
GPU Node Configuration
Node Labels
Labels help Kubernetes schedule pods on the correct nodes.
Automatic Labels
EKS automatically adds these labels to GPU nodes:
node.kubernetes.io/instance-type : g5.xlarge
beta.kubernetes.io/instance-type : g5.xlarge
topology.kubernetes.io/zone : us-east-1a
topology.kubernetes.io/region : us-east-1
Custom Labels
Add custom labels when creating node groups:
managedNodeGroups :
- name : gpu-nodes
instanceType : g5.xlarge
labels :
workload : gpu
nvidia.com/gpu : "true"
gpu-type : a10
cost-tier : spot
Or add labels to existing nodes:
kubectl label nodes < node-nam e > workload=gpu
kubectl label nodes < node-nam e > gpu-type=a10
Node Taints
Taints prevent non-GPU workloads from running on expensive GPU nodes.
Add Taints During Node Group Creation
managedNodeGroups :
- name : gpu-nodes
instanceType : g5.xlarge
taints :
- key : nvidia.com/gpu
value : "true"
effect : NoSchedule
Add Taints to Existing Nodes
kubectl taint nodes < node-nam e > nvidia.com/gpu= true :NoSchedule
Tolerations in Pod Specs
Pods must have matching tolerations to run on tainted nodes:
lightningAsr :
tolerations :
- key : nvidia.com/gpu
operator : Exists
effect : NoSchedule
- key : nvidia.com/gpu
operator : Equal
value : "true"
effect : NoSchedule
Node Selectors
Using Instance Type
Most common approach for AWS:
lightningAsr :
nodeSelector :
node.kubernetes.io/instance-type : g5.xlarge
Using Custom Labels
lightningAsr :
nodeSelector :
workload : gpu
gpu-type : a10
Multiple Selectors
Combine multiple selectors for precise placement:
lightningAsr :
nodeSelector :
node.kubernetes.io/instance-type : g5.xlarge
topology.kubernetes.io/zone : us-east-1a
cost-tier : on-demand
NVIDIA Device Plugin
The NVIDIA device plugin makes GPUs available to Kubernetes pods.
Installation via GPU Operator
The recommended approach is using the NVIDIA GPU Operator (included in the Smallest Helm chart):
gpu-operator :
enabled : true
driver :
enabled : true
toolkit :
enabled : true
devicePlugin :
enabled : true
Manual Installation
Alternatively, install the device plugin directly:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.0/nvidia-device-plugin.yml
Verify Device Plugin
kubectl get pods -n kube-system | grep nvidia-device-plugin
kubectl logs -n kube-system -l name=nvidia-device-plugin
Check GPU Availability
kubectl get nodes -o json | \
jq -r '.items[] | select(.status.capacity."nvidia.com/gpu" != null) |
"\(.metadata.name)\t\(.status.capacity."nvidia.com/gpu")"'
GPU Resource Limits
Request GPU in Pod Spec
The Lightning ASR deployment automatically requests GPU:
resources :
limits :
nvidia.com/gpu : 1
requests :
nvidia.com/gpu : 1
Multiple GPUs
For pods that need multiple GPUs:
resources :
limits :
nvidia.com/gpu : 2
requests :
nvidia.com/gpu : 2
Smallest Self-Host Lightning ASR is optimized for single GPU per pod. Use multiple pods for scaling rather than multiple GPUs per pod.
Enable GPU Persistence Mode
GPU persistence mode keeps the NVIDIA driver loaded, reducing initialization time:
gpu-operator :
enabled : true
driver :
enabled : true
env :
- name : NVIDIA_DRIVER_CAPABILITIES
value : "compute,utility"
- name : NVIDIA_REQUIRE_CUDA
value : "cuda>=11.8"
toolkit :
enabled : true
env :
- name : NVIDIA_MPS_ENABLED
value : "1"
Use DaemonSet for GPU Configuration
Create a DaemonSet to configure GPU settings on all GPU nodes:
gpu-config-daemonset.yaml
apiVersion : apps/v1
kind : DaemonSet
metadata :
name : gpu-config
namespace : kube-system
spec :
selector :
matchLabels :
name : gpu-config
template :
metadata :
labels :
name : gpu-config
spec :
hostPID : true
nodeSelector :
nvidia.com/gpu : "true"
tolerations :
- key : nvidia.com/gpu
operator : Exists
effect : NoSchedule
containers :
- name : gpu-config
image : nvidia/cuda:11.8.0-base-ubuntu22.04
command :
- /bin/bash
- -c
- |
nvidia-smi -pm 1
nvidia-smi --auto-boost-default=DISABLED
nvidia-smi -ac 1215,1410
sleep infinity
securityContext :
privileged : true
volumeMounts :
- name : sys
mountPath : /sys
volumes :
- name : sys
hostPath :
path : /sys
Apply:
kubectl apply -f gpu-config-daemonset.yaml
Monitor GPU Utilization
Deploy NVIDIA DCGM exporter for Prometheus metrics:
helm repo add gpu-helm-charts https://nvidia.github.io/dcgm-exporter/helm-charts
helm repo update
helm install dcgm-exporter gpu-helm-charts/dcgm-exporter \
--namespace kube-system \
--set serviceMonitor.enabled= true
Multi-GPU Strategies
Strategy 1: One Pod per GPU (Recommended)
Scale horizontally with one pod per GPU:
scaling :
auto :
enabled : true
lightningAsr :
hpa :
enabled : true
minReplicas : 1
maxReplicas : 10
lightningAsr :
resources :
limits :
nvidia.com/gpu : 1
Strategy 2: GPU Sharing (Time-Slicing)
Allow multiple pods to share a single GPU (reduces isolation):
gpu-operator :
enabled : true
devicePlugin :
config :
name : time-slicing-config
default : any
sharing :
timeSlicing :
replicas : 4
GPU sharing reduces isolation and can impact performance. Use only if cost is more critical than performance.
Strategy 3: Multi-Instance GPU (MIG)
For A100 and A30 GPUs, use MIG to partition GPUs:
nvidia-smi mig -cgi 9,9,9,9,9,9,9 -C
Configure pods to use MIG instances:
resources :
limits :
nvidia.com/mig-1g.5gb : 1
Node Auto-Scaling
When creating node groups, enable auto-scaling:
managedNodeGroups :
- name : gpu-nodes
instanceType : g5.xlarge
minSize : 0
maxSize : 10
desiredCapacity : 1
tags :
k8s.io/cluster-autoscaler/enabled : "true"
k8s.io/cluster-autoscaler/smallest-cluster : "owned"
Install Cluster Autoscaler
See Cluster Autoscaler for full setup.
Quick enable:
cluster-autoscaler :
enabled : true
autoDiscovery :
clusterName : smallest-cluster
awsRegion : us-east-1
nodeSelector :
workload : cpu
Run Cluster Autoscaler on CPU nodes, not GPU nodes, to avoid wasting GPU resources.
Cost Optimization
Use Spot Instances
Save up to 70% with Spot instances:
managedNodeGroups :
- name : gpu-nodes-spot
instanceType : g5.xlarge
minSize : 0
maxSize : 10
desiredCapacity : 1
spot : true
instancesDistribution :
maxPrice : 0.50
instanceTypes : [ "g5.xlarge" , "g5.2xlarge" ]
onDemandBaseCapacity : 0
onDemandPercentageAboveBaseCapacity : 0
spotAllocationStrategy : capacity-optimized
labels :
capacity-type : spot
taints :
- key : nvidia.com/gpu
value : "true"
effect : NoSchedule
Handle Spot Interruptions
Add pod disruption budget:
apiVersion : policy/v1
kind : PodDisruptionBudget
metadata :
name : lightning-asr-pdb
spec :
minAvailable : 1
selector :
matchLabels :
app : lightning-asr
Configure graceful shutdown:
lightningAsr :
terminationGracePeriodSeconds : 120
Mixed On-Demand and Spot
Combine both for reliability:
managedNodeGroups :
- name : gpu-nodes-ondemand
instanceType : g5.xlarge
minSize : 1
maxSize : 3
labels :
capacity-type : on-demand
- name : gpu-nodes-spot
instanceType : g5.xlarge
minSize : 0
maxSize : 10
spot : true
labels :
capacity-type : spot
Use pod affinity to prefer spot:
lightningAsr :
affinity :
nodeAffinity :
preferredDuringSchedulingIgnoredDuringExecution :
- weight : 100
preference :
matchExpressions :
- key : capacity-type
operator : In
values :
- spot
Monitoring GPU Nodes
View GPU Node Status
kubectl get nodes -l nvidia.com/gpu= true
Check GPU Allocation
kubectl describe nodes -l nvidia.com/gpu= true | grep -A 5 "Allocated resources"
GPU Utilization
Using NVIDIA SMI:
kubectl run nvidia-smi --rm -it --restart=Never \
--image=nvidia/cuda:11.8.0-base-ubuntu22.04 \
--overrides= '{"spec":{"nodeSelector":{"nvidia.com/gpu":"true"},"tolerations":[{"key":"nvidia.com/gpu","operator":"Exists"}]}}' \
-- nvidia-smi
Troubleshooting
GPU Not Detected
Check NVIDIA device plugin :
kubectl get pods -n kube-system | grep nvidia
kubectl logs -n kube-system -l name=nvidia-device-plugin
Verify driver on node :
kubectl debug node/ < node-nam e > -it --image=ubuntu
apt-get update && apt-get install -y nvidia-utils
nvidia-smi
Pods Not Scheduling on GPU Nodes
Check tolerations :
kubectl describe pod < pod-nam e > | grep -A 5 Tolerations
Check node selector :
kubectl get pod < pod-nam e > -o jsonpath='{.spec.nodeSelector}'
Check node taints :
kubectl describe node < node-nam e > | grep Taints
GPU Out of Memory
Check pod resource limits :
kubectl describe pod < pod-nam e > | grep -A 5 Limits
Monitor GPU memory :
kubectl exec < pod-nam e > -- nvidia-smi
Best Practices
Always use taints and tolerations to prevent non-GPU workloads from running on GPU nodes: taints :
- key : nvidia.com/gpu
value : "true"
effect : NoSchedule
Always specify GPU resource requests and limits: resources :
limits :
nvidia.com/gpu : 1
requests :
nvidia.com/gpu : 1
Configure auto-scaling to scale GPU nodes to zero during off-hours:
Use DCGM exporter and Grafana to monitor GPU metrics:
GPU utilization
Memory usage
Temperature
Power consumption
Regularly test your application’s response to spot interruptions: kubectl drain < node-nam e > --ignore-daemonsets --delete-emptydir-data
What’s Next?