Skip to main content

Overview

This guide walks you through creating an Amazon EKS cluster optimized for running Smallest Self-Host with GPU acceleration.

Prerequisites

1

AWS CLI

Install and configure AWS CLI:
aws --version
aws configure
2

eksctl

Install eksctl (EKS cluster management tool):
brew install eksctl
Verify:
eksctl version
3

kubectl

Install kubectl:
brew install kubectl
4

IAM Permissions

Ensure your AWS user/role has permissions to:
  • Create EKS clusters
  • Manage EC2 instances
  • Create IAM roles
  • Manage VPC resources

Cluster Configuration

Option 1: Quick Start with eksctl

Create a cluster with GPU nodes using a single command:
eksctl create cluster \
  --name smallest-cluster \
  --region us-east-1 \
  --version 1.28 \
  --nodegroup-name cpu-nodes \
  --node-type t3.large \
  --nodes 2 \
  --nodes-min 1 \
  --nodes-max 3 \
  --managed
Then add GPU node group:
eksctl create nodegroup \
  --cluster smallest-cluster \
  --region us-east-1 \
  --name gpu-nodes \
  --node-type g5.xlarge \
  --nodes 1 \
  --nodes-min 0 \
  --nodes-max 5 \
  --managed \
  --node-labels "workload=gpu,nvidia.com/gpu=true" \
  --node-taints "nvidia.com/gpu=true:NoSchedule"
This creates a cluster with separate CPU and GPU node groups, allowing for cost-effective scaling.

Option 2: Using Cluster Config File

Create a cluster configuration file for more control:
cluster-config.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: smallest-cluster
  region: us-east-1
  version: "1.28"

iam:
  withOIDC: true

managedNodeGroups:
  - name: cpu-nodes
    instanceType: t3.large
    minSize: 1
    maxSize: 3
    desiredCapacity: 2
    volumeSize: 50
    ssh:
      allow: false
    labels:
      workload: cpu
    tags:
      Environment: production
      Application: smallest-self-host

  - name: gpu-nodes
    instanceType: g5.xlarge
    minSize: 0
    maxSize: 5
    desiredCapacity: 1
    volumeSize: 100
    ssh:
      allow: false
    labels:
      workload: gpu
      nvidia.com/gpu: "true"
      node.kubernetes.io/instance-type: g5.xlarge
    taints:
      - key: nvidia.com/gpu
        value: "true"
        effect: NoSchedule
    tags:
      Environment: production
      Application: smallest-self-host
      NodeType: gpu
    iam:
      withAddonPolicies:
        autoScaler: true
        ebs: true
        efs: true

addons:
  - name: vpc-cni
  - name: coredns
  - name: kube-proxy
  - name: aws-ebs-csi-driver
Create the cluster:
eksctl create cluster -f cluster-config.yaml
Cluster creation takes 15-20 minutes. Monitor progress in the AWS CloudFormation console.

GPU Instance Types

Choose the right GPU instance type for your workload:
Instance TypeGPUVRAMvCPUsRAM$/hour*Recommended For
g5.xlarge1x A10G24 GB416 GB$1.00Development, testing
g5.2xlarge1x A10G24 GB832 GB$1.21Small production
g5.4xlarge1x A10G24 GB1664 GB$1.63Medium production
g5.12xlarge4x A10G96 GB48192 GB$5.67High-volume production
p3.2xlarge1x V10016 GB861 GB$3.06Legacy workloads
Recommendation: Start with g5.xlarge for development and testing. Scale to g5.2xlarge or higher for production.

Verify Cluster

Check Cluster Status

eksctl get cluster --name smallest-cluster --region us-east-1

Verify Node Groups

eksctl get nodegroup --cluster smallest-cluster --region us-east-1

Configure kubectl

aws eks update-kubeconfig --name smallest-cluster --region us-east-1
Verify access:
kubectl get nodes
Expected output:
NAME                         STATUS   ROLES    AGE   VERSION
ip-xxx-cpu-1                 Ready    <none>   5m    v1.28.x
ip-xxx-cpu-2                 Ready    <none>   5m    v1.28.x
ip-xxx-gpu-1                 Ready    <none>   5m    v1.28.x

Verify GPU Nodes

Check GPU availability:
kubectl get nodes -l workload=gpu -o json | \
  jq '.items[].status.capacity'
Look for nvidia.com/gpu in the output:
{
  "cpu": "4",
  "memory": "15944904Ki",
  "nvidia.com/gpu": "1",
  "pods": "29"
}

Install NVIDIA Device Plugin

The NVIDIA device plugin enables GPU scheduling in Kubernetes. The Smallest Self-Host chart includes the NVIDIA GPU Operator. Enable it in your values:
values.yaml
gpu-operator:
  enabled: true

Manual Installation

If installing separately:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.0/nvidia-device-plugin.yml
Verify:
kubectl get pods -n kube-system | grep nvidia

Install EBS CSI Driver

Required for persistent volumes:

Using eksctl

eksctl create addon \
  --name aws-ebs-csi-driver \
  --cluster smallest-cluster \
  --region us-east-1

Using AWS Console

  1. Navigate to EKS → Clusters → smallest-cluster → Add-ons
  2. Click “Add new”
  3. Select “Amazon EBS CSI Driver”
  4. Click “Add”

Verify EBS CSI Driver

kubectl get pods -n kube-system -l app=ebs-csi-controller

Install EFS CSI Driver (Optional)

Recommended for shared model storage across pods.

Create IAM Policy

curl -o iam-policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/master/docs/iam-policy-example.json

aws iam create-policy \
  --policy-name AmazonEKS_EFS_CSI_Driver_Policy \
  --policy-document file://iam-policy.json

Create IAM Service Account

eksctl create iamserviceaccount \
  --cluster smallest-cluster \
  --region us-east-1 \
  --namespace kube-system \
  --name efs-csi-controller-sa \
  --attach-policy-arn arn:aws:iam::YOUR_ACCOUNT_ID:policy/AmazonEKS_EFS_CSI_Driver_Policy \
  --approve
Replace YOUR_ACCOUNT_ID with your AWS account ID.

Install EFS CSI Driver

kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.7"
Verify:
kubectl get pods -n kube-system -l app=efs-csi-controller

Enable Cluster Autoscaler

See the Cluster Autoscaler guide for detailed setup. Quick setup:
eksctl create iamserviceaccount \
  --cluster smallest-cluster \
  --region us-east-1 \
  --namespace kube-system \
  --name cluster-autoscaler \
  --attach-policy-arn arn:aws:iam::aws:policy/AutoScalingFullAccess \
  --approve \
  --override-existing-serviceaccounts

Cost Optimization

Use Spot Instances for GPU Nodes

Reduce costs by up to 70% with Spot instances:
cluster-config.yaml
managedNodeGroups:
  - name: gpu-nodes-spot
    instanceType: g5.xlarge
    minSize: 0
    maxSize: 5
    desiredCapacity: 1
    spot: true
    instancesDistribution:
      maxPrice: 0.50
      instanceTypes: ["g5.xlarge", "g5.2xlarge"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
      spotAllocationStrategy: capacity-optimized
Spot instances can be interrupted with 2-minute warning. Ensure your application handles graceful shutdowns.

Right-Size Node Groups

Start small and scale based on metrics:
managedNodeGroups:
  - name: gpu-nodes
    minSize: 0
    maxSize: 10
    desiredCapacity: 1
Set minSize: 0 to scale down to zero during off-hours.

Enable Cluster Autoscaler

Automatically adjust node count based on demand:
values.yaml
cluster-autoscaler:
  enabled: true
  autoDiscovery:
    clusterName: smallest-cluster
  awsRegion: us-east-1

Security Best Practices

Enable Private Endpoint

eksctl utils update-cluster-endpoint \
  --cluster smallest-cluster \
  --region us-east-1 \
  --private-access=true \
  --public-access=false

Enable Logging

eksctl utils update-cluster-logging \
  --cluster smallest-cluster \
  --region us-east-1 \
  --enable-types all \
  --approve

Update Security Groups

Restrict inbound access to API server:
aws ec2 describe-security-groups \
  --filters "Name=tag:aws:eks:cluster-name,Values=smallest-cluster"
Update rules to allow only specific IPs.

Troubleshooting

GPU Nodes Not Ready

Check NVIDIA device plugin:
kubectl get pods -n kube-system | grep nvidia
kubectl describe node <gpu-node-name>

Pods Stuck in Pending

Check node capacity:
kubectl describe pod <pod-name>
kubectl get nodes -o json | jq '.items[].status.allocatable'

EBS Volumes Not Mounting

Verify EBS CSI driver:
kubectl get pods -n kube-system -l app=ebs-csi-controller
kubectl logs -n kube-system -l app=ebs-csi-controller

What’s Next?