Overview
Amazon Elastic File System (EFS) provides shared, persistent file storage for Kubernetes pods. This is ideal for storing AI models that can be shared across multiple Lightning ASR pods, eliminating duplicate downloads and reducing startup time.
Benefits of EFS
Shared Storage Multiple pods can read/write simultaneously (ReadWriteMany)
Automatic Scaling Storage grows and shrinks automatically
Fast Startup Models cached once, used by all pods
Cost Effective Pay only for storage used, no upfront provisioning
Prerequisites
EFS CSI Driver
Install the EFS CSI driver (see IAM & IRSA guide) kubectl get pods -n kube-system -l app=efs-csi-controller
VPC and Subnets
Note your EKS cluster’s VPC ID and subnet IDs: aws eks describe-cluster \
--name smallest-cluster \
--region us-east-1 \
--query 'cluster.resourcesVpcConfig.{vpcId:vpcId,subnetIds:subnetIds}'
Security Group
Note your cluster security group ID: aws eks describe-cluster \
--name smallest-cluster \
--region us-east-1 \
--query 'cluster.resourcesVpcConfig.clusterSecurityGroupId'
Create EFS File System
Using AWS Console
Navigate to EFS
Go to AWS Console → EFS → Create file system
Configure File System
Name : smallest-models
VPC : Select your EKS cluster VPC
Availability and Durability : Regional (recommended)
Click “Customize”
File System Settings
Performance mode : General Purpose
Throughput mode : Bursting (or Elastic for production)
Encryption : Enable encryption at rest
Click “Next”
Network Access
Select all subnets where EKS nodes run
Security group: Select cluster security group
Click “Next”
Review and Create
Review settings and click “Create” Note the File system ID (e.g., fs-0123456789abcdef)
Using AWS CLI
VPC_ID = $( aws eks describe-cluster \
--name smallest-cluster \
--region us-east-1 \
--query 'cluster.resourcesVpcConfig.vpcId' \
--output text )
SG_ID = $( aws eks describe-cluster \
--name smallest-cluster \
--region us-east-1 \
--query 'cluster.resourcesVpcConfig.clusterSecurityGroupId' \
--output text )
FILE_SYSTEM_ID = $( aws efs create-file-system \
--region us-east-1 \
--performance-mode generalPurpose \
--throughput-mode bursting \
--encrypted \
--tags Key=Name,Value=smallest-models \
--query 'FileSystemId' \
--output text )
echo "Created EFS: $FILE_SYSTEM_ID "
SUBNET_IDS = $( aws eks describe-cluster \
--name smallest-cluster \
--region us-east-1 \
--query 'cluster.resourcesVpcConfig.subnetIds[*]' \
--output text )
for subnet in $SUBNET_IDS ; do
aws efs create-mount-target \
--file-system-id $FILE_SYSTEM_ID \
--subnet-id $subnet \
--security-groups $SG_ID \
--region us-east-1
done
echo "EFS File System ID: $FILE_SYSTEM_ID "
Ensure the security group allows NFS traffic (port 2049) from cluster nodes:
SG_ID = $( aws eks describe-cluster \
--name smallest-cluster \
--region us-east-1 \
--query 'cluster.resourcesVpcConfig.clusterSecurityGroupId' \
--output text )
aws ec2 authorize-security-group-ingress \
--group-id $SG_ID \
--protocol tcp \
--port 2049 \
--source-group $SG_ID \
--region us-east-1
If the rule already exists, you’ll see an error. This is safe to ignore.
Deploy with EFS in Helm
Update your values.yaml to enable EFS:
models :
asrModelUrl : "your-model-url-here"
volumes :
aws :
efs :
enabled : true
fileSystemId : "fs-0123456789abcdef"
namePrefix : "models"
Replace fs-0123456789abcdef with your actual EFS file system ID.
Deploy or Upgrade
helm upgrade --install smallest-self-host smallest-self-host/smallest-self-host \
-f values.yaml \
--namespace smallest
Verify EFS Configuration
Check Storage Class
Should show:
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE AGE
models-aws-efs-sc efs.csi.aws.com Delete Immediate 1m
Check Persistent Volume
Should show:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM
models-aws-efs-pv 5Gi RWX Retain Bound smallest/models-aws-efs-pvc
Check Persistent Volume Claim
kubectl get pvc -n smallest
Should show:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
models-aws-efs-pvc Bound models-aws-efs-pv 5Gi RWX models-aws-efs-sc 1m
Verify Mount in Pod
kubectl get pods -l app=lightning-asr -n smallest
kubectl exec -it < lightning-asr-po d > -n smallest -- df -h | grep efs
Should show the EFS mount:
fs-0123456789abcdef.efs.us-east-1.amazonaws.com:/ 8.0E 0 8.0E 0% /app/models
Test EFS
Create a test file in one pod and verify it’s visible in another:
Write test file:
kubectl exec -it < lightning-asr-pod- 1> -n smallest -- sh -c "echo 'test' > /app/models/test.txt"
Read from another pod:
kubectl exec -it < lightning-asr-pod- 2> -n smallest -- cat /app/models/test.txt
Should output: test
How Model Caching Works
With EFS enabled:
First Pod Startup :
Pod downloads model from asrModelUrl
Saves model to /app/models (EFS mount)
Takes 5-10 minutes (one-time download)
Subsequent Pod Startups :
Pod checks /app/models for existing model
Finds model already downloaded
Skips download, loads from EFS
Takes 30-60 seconds
This is especially valuable when using autoscaling, as new pods start much faster.
Choose Throughput Mode
Bursting (Default)
Elastic
Provisioned
Best for : Development, testing, variable workloads
Throughput scales with storage size
50 MB/s per TB of storage
Bursting to 100 MB/s
Most cost-effective
Best for : Production with unpredictable load
Automatically scales throughput
Up to 3 GB/s for reads
Up to 1 GB/s for writes
Pay for throughput used
Update via console or CLI: aws efs update-file-system \
--file-system-id fs-0123456789abcdef \
--throughput-mode elastic
Best for : Production with consistent high throughput
Fixed throughput independent of size
Up to 1 GB/s throughput
Higher cost
aws efs update-file-system \
--file-system-id fs-0123456789abcdef \
--throughput-mode provisioned \
--provisioned-throughput-in-mibps 100
Enable Lifecycle Management
Automatically move infrequently accessed files to lower-cost storage:
aws efs put-lifecycle-configuration \
--file-system-id fs-0123456789abcdef \
--lifecycle-policies \
'[{"TransitionToIA":"AFTER_30_DAYS"},{"TransitionToPrimaryStorageClass":"AFTER_1_ACCESS"}]'
Cost Optimization
Monitor EFS Usage
aws efs describe-file-systems \
--file-system-id fs-0123456789abcdef \
--query 'FileSystems[0].SizeInBytes'
Estimate Costs
EFS pricing (us-east-1):
Standard storage : ~$0.30/GB/month
Infrequent Access : ~$0.025/GB/month
Data transfer : Free within same AZ
For 50 GB model:
Standard: ~$15/month
With IA (after 30 days): ~$1.25/month
Use lifecycle policies to automatically move old models to Infrequent Access storage.
Backup and Recovery
Enable AWS Backup
aws backup create-backup-plan \
--backup-plan '{
"BackupPlanName": "smallest-efs-backup",
"Rules": [{
"RuleName": "daily-backup",
"TargetBackupVaultName": "Default",
"ScheduleExpression": "cron(0 2 * * ? *)",
"Lifecycle": {
"DeleteAfterDays": 30
}
}]
}'
Manual Backup
EFS automatically creates point-in-time backups. Access via AWS Console → EFS → Backups.
Troubleshooting
Mount Failed
Check EFS CSI driver :
kubectl get pods -n kube-system -l app=efs-csi-controller
kubectl logs -n kube-system -l app=efs-csi-controller
Verify security group rules :
aws ec2 describe-security-groups --group-ids $SG_ID
Ensure port 2049 is open.
Check throughput mode :
aws efs describe-file-systems \
--file-system-id fs-0123456789abcdef \
--query 'FileSystems[0].ThroughputMode'
Consider upgrading to Elastic or Provisioned.
Monitor CloudWatch metrics :
PermittedThroughput
BurstCreditBalance
ClientConnections
Permission Denied
Check mount options in PV:
kubectl get pv models-aws-efs-pv -o yaml
Should include:
Alternative: EBS for Single Pod
If you don’t need shared storage (single replica only):
models :
volumes :
aws :
efs :
enabled : false
scaling :
replicas :
lightningAsr : 1
lightningAsr :
persistence :
enabled : true
storageClass : gp3
size : 100Gi
EBS volumes can only be attached to one pod at a time. This prevents horizontal scaling.
What’s Next?