Skip to main content

Overview

Amazon Elastic File System (EFS) provides shared, persistent file storage for Kubernetes pods. This is ideal for storing AI models that can be shared across multiple Lightning ASR pods, eliminating duplicate downloads and reducing startup time.

Benefits of EFS

Shared Storage

Multiple pods can read/write simultaneously (ReadWriteMany)

Automatic Scaling

Storage grows and shrinks automatically

Fast Startup

Models cached once, used by all pods

Cost Effective

Pay only for storage used, no upfront provisioning

Prerequisites

1

EFS CSI Driver

Install the EFS CSI driver (see IAM & IRSA guide)
kubectl get pods -n kube-system -l app=efs-csi-controller
2

VPC and Subnets

Note your EKS cluster’s VPC ID and subnet IDs:
aws eks describe-cluster \
  --name smallest-cluster \
  --region us-east-1 \
  --query 'cluster.resourcesVpcConfig.{vpcId:vpcId,subnetIds:subnetIds}'
3

Security Group

Note your cluster security group ID:
aws eks describe-cluster \
  --name smallest-cluster \
  --region us-east-1 \
  --query 'cluster.resourcesVpcConfig.clusterSecurityGroupId'

Create EFS File System

Using AWS Console

1

Navigate to EFS

Go to AWS Console → EFS → Create file system
2

Configure File System

  • Name: smallest-models
  • VPC: Select your EKS cluster VPC
  • Availability and Durability: Regional (recommended)
  • Click “Customize”
3

File System Settings

  • Performance mode: General Purpose
  • Throughput mode: Bursting (or Elastic for production)
  • Encryption: Enable encryption at rest
  • Click “Next”
4

Network Access

  • Select all subnets where EKS nodes run
  • Security group: Select cluster security group
  • Click “Next”
5

Review and Create

Review settings and click “Create”Note the File system ID (e.g., fs-0123456789abcdef)

Using AWS CLI

VPC_ID=$(aws eks describe-cluster \
  --name smallest-cluster \
  --region us-east-1 \
  --query 'cluster.resourcesVpcConfig.vpcId' \
  --output text)

SG_ID=$(aws eks describe-cluster \
  --name smallest-cluster \
  --region us-east-1 \
  --query 'cluster.resourcesVpcConfig.clusterSecurityGroupId' \
  --output text)

FILE_SYSTEM_ID=$(aws efs create-file-system \
  --region us-east-1 \
  --performance-mode generalPurpose \
  --throughput-mode bursting \
  --encrypted \
  --tags Key=Name,Value=smallest-models \
  --query 'FileSystemId' \
  --output text)

echo "Created EFS: $FILE_SYSTEM_ID"

SUBNET_IDS=$(aws eks describe-cluster \
  --name smallest-cluster \
  --region us-east-1 \
  --query 'cluster.resourcesVpcConfig.subnetIds[*]' \
  --output text)

for subnet in $SUBNET_IDS; do
  aws efs create-mount-target \
    --file-system-id $FILE_SYSTEM_ID \
    --subnet-id $subnet \
    --security-groups $SG_ID \
    --region us-east-1
done

echo "EFS File System ID: $FILE_SYSTEM_ID"

Configure Security Group

Ensure the security group allows NFS traffic (port 2049) from cluster nodes:
SG_ID=$(aws eks describe-cluster \
  --name smallest-cluster \
  --region us-east-1 \
  --query 'cluster.resourcesVpcConfig.clusterSecurityGroupId' \
  --output text)

aws ec2 authorize-security-group-ingress \
  --group-id $SG_ID \
  --protocol tcp \
  --port 2049 \
  --source-group $SG_ID \
  --region us-east-1
If the rule already exists, you’ll see an error. This is safe to ignore.

Deploy with EFS in Helm

Update your values.yaml to enable EFS:
values.yaml
models:
  asrModelUrl: "your-model-url-here"
  volumes:
    aws:
      efs:
        enabled: true
        fileSystemId: "fs-0123456789abcdef"
        namePrefix: "models"
Replace fs-0123456789abcdef with your actual EFS file system ID.

Deploy or Upgrade

helm upgrade --install smallest-self-host smallest-self-host/smallest-self-host \
  -f values.yaml \
  --namespace smallest

Verify EFS Configuration

Check Storage Class

kubectl get storageclass
Should show:
NAME                        PROVISIONER        RECLAIMPOLICY   VOLUMEBINDINGMODE   AGE
models-aws-efs-sc           efs.csi.aws.com    Delete          Immediate           1m

Check Persistent Volume

kubectl get pv
Should show:
NAME                   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM
models-aws-efs-pv      5Gi        RWX            Retain           Bound    smallest/models-aws-efs-pvc

Check Persistent Volume Claim

kubectl get pvc -n smallest
Should show:
NAME                  STATUS   VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS        AGE
models-aws-efs-pvc    Bound    models-aws-efs-pv   5Gi        RWX            models-aws-efs-sc   1m

Verify Mount in Pod

kubectl get pods -l app=lightning-asr -n smallest
kubectl exec -it <lightning-asr-pod> -n smallest -- df -h | grep efs
Should show the EFS mount:
fs-0123456789abcdef.efs.us-east-1.amazonaws.com:/  8.0E   0  8.0E   0% /app/models

Test EFS

Create a test file in one pod and verify it’s visible in another:

Write test file:

kubectl exec -it <lightning-asr-pod-1> -n smallest -- sh -c "echo 'test' > /app/models/test.txt"

Read from another pod:

kubectl exec -it <lightning-asr-pod-2> -n smallest -- cat /app/models/test.txt
Should output: test

How Model Caching Works

With EFS enabled:
  1. First Pod Startup:
    • Pod downloads model from asrModelUrl
    • Saves model to /app/models (EFS mount)
    • Takes 5-10 minutes (one-time download)
  2. Subsequent Pod Startups:
    • Pod checks /app/models for existing model
    • Finds model already downloaded
    • Skips download, loads from EFS
    • Takes 30-60 seconds
This is especially valuable when using autoscaling, as new pods start much faster.

Performance Tuning

Choose Throughput Mode

Best for: Development, testing, variable workloads
  • Throughput scales with storage size
  • 50 MB/s per TB of storage
  • Bursting to 100 MB/s
  • Most cost-effective

Enable Lifecycle Management

Automatically move infrequently accessed files to lower-cost storage:
aws efs put-lifecycle-configuration \
  --file-system-id fs-0123456789abcdef \
  --lifecycle-policies \
    '[{"TransitionToIA":"AFTER_30_DAYS"},{"TransitionToPrimaryStorageClass":"AFTER_1_ACCESS"}]'

Cost Optimization

Monitor EFS Usage

aws efs describe-file-systems \
  --file-system-id fs-0123456789abcdef \
  --query 'FileSystems[0].SizeInBytes'

Estimate Costs

EFS pricing (us-east-1):
  • Standard storage: ~$0.30/GB/month
  • Infrequent Access: ~$0.025/GB/month
  • Data transfer: Free within same AZ
For 50 GB model:
  • Standard: ~$15/month
  • With IA (after 30 days): ~$1.25/month
Use lifecycle policies to automatically move old models to Infrequent Access storage.

Backup and Recovery

Enable AWS Backup

aws backup create-backup-plan \
  --backup-plan '{
    "BackupPlanName": "smallest-efs-backup",
    "Rules": [{
      "RuleName": "daily-backup",
      "TargetBackupVaultName": "Default",
      "ScheduleExpression": "cron(0 2 * * ? *)",
      "Lifecycle": {
        "DeleteAfterDays": 30
      }
    }]
  }'

Manual Backup

EFS automatically creates point-in-time backups. Access via AWS Console → EFS → Backups.

Troubleshooting

Mount Failed

Check EFS CSI driver:
kubectl get pods -n kube-system -l app=efs-csi-controller
kubectl logs -n kube-system -l app=efs-csi-controller
Verify security group rules:
aws ec2 describe-security-groups --group-ids $SG_ID
Ensure port 2049 is open.

Slow Performance

Check throughput mode:
aws efs describe-file-systems \
  --file-system-id fs-0123456789abcdef \
  --query 'FileSystems[0].ThroughputMode'
Consider upgrading to Elastic or Provisioned. Monitor CloudWatch metrics:
  • PermittedThroughput
  • BurstCreditBalance
  • ClientConnections

Permission Denied

Check mount options in PV:
kubectl get pv models-aws-efs-pv -o yaml
Should include:
mountOptions:
  - tls

Alternative: EBS for Single Pod

If you don’t need shared storage (single replica only):
values.yaml
models:
  volumes:
    aws:
      efs:
        enabled: false

scaling:
  replicas:
    lightningAsr: 1

lightningAsr:
  persistence:
    enabled: true
    storageClass: gp3
    size: 100Gi
EBS volumes can only be attached to one pod at a time. This prevents horizontal scaling.

What’s Next?