If you’ve been wrestling with slow node scaling and surprise
AWS bills from Cluster Autoscaler, you’re not alone. AWS Elastic Kubernetes Service’s (EKS) default autoscaling can feel sluggish when your workloads spike, and those extra nodes add up fast on your monthly invoice.
Karpenter v0.32+, an open source
Kubernetes autoscaler, changes the game entirely. Instead of waiting around for nodes to spin up or managing complex autoscaling groups, you get lightning-fast provisioning that actually thinks about
cost. This guide walks you through ditching Cluster Autoscaler for Karpenter’s newer
NodePool and
EC2NodeClass setup, and shows you how to save money while you’re at it.
Important warning: Please don’t try this on production first. Test everything in a safe environment where breaking things won’t wake you up in a cold sweat at 3 a.m.
Why Bother With Karpenter v0.32+?
The latest
Karpenter architecture fixes a lot of the headaches that come with traditional autoscaling:
- Things actually make sense now: NodePool handles when and how to scale, while EC2NodeClass deals with the AWS nitty-gritty. This means no more tangled configurations that nobody understands six months later.
- Write once, use everywhere: Create an EC2NodeClass template and share it across multiple NodePools. Your future self will thank you when you’re not copy-pasting configs.
- Your wallet will notice: Karpenter spins up exactly what you need, when you need it, and isn’t shy about using spot instances to slash costs. Plus, scaling happens in seconds.
Prerequisites
Before starting, ensure you have:
- An Amazon EKS cluster running version 1.24 or later.
- AWS CLI configured with permissions (or instance, eks:DescribeCluster, ec2:*).
- kubectl installed for cluster management.
- Helm (optional) for installing Karpenter.
- Karpenter v0.32.0 or later available via Helm or source.
- Identity and access management (IAM) roles for Karpenter (created in Step 1).
Tip: Verify your EKS cluster version with `aws eks describe-cluster –name your-cluster-name`.
Step 1: Install Karpenter
Set up Karpenter in your EKS cluster:
1. Create IAM service account: Grant Karpenter permissions to manage EC2 instances.
eksctl create iamserviceaccount \
--cluster=your-cluster-name \
--namespace=karpenter \
--name=karpenter \
--attach-policy-arn=arn:aws:iam::aws:policy/AmazonEKSClusterPolicy \
--approve
2. Verify: Check the IAM role in the AWS Console under IAM > Roles.
3. Install Karpenter via Helm: Deploy Karpenter with your cluster settings.
helm repo add karpenter https://charts.karpenter.sh
helm repo update
helm install karpenter karpenter/karpenter \
--namespace karpenter \
--create-namespace \
--version 0.32.0 \
--set serviceAccount.create=true \
--set serviceAccount.name=karpenter \
--set settings.aws.clusterName=your-cluster-name \
--set settings.aws.clusterEndpoint=$(aws eks describe-cluster --name your-cluster-name --query "cluster.endpoint" --output text)
Tip: Replace `your-cluster-name` with your EKS cluster name. Pin the Helm chart version (0.32.0) for reproducibility.
Step 2: Configure EC2NodeClass
Create an EC2NodeClass to define AWS infrastructure settings:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2 # Use Amazon Linux 2 for node AMIs
role: "KarpenterNodeRole-${CLUSTER_NAME}" # IAM role for nodes (replace CLUSTER_NAME)
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}" # Select subnets tagged for the cluster
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}" # Select security groups tagged for the cluster
tags:
Environment: production # Custom tags for EC2 instances
userData: |
# Bootstrap script to join nodes to the EKS cluster
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"
--BOUNDARY
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
/etc/eks/bootstrap.sh ${CLUSTER_NAME}
--BOUNDARY--
Note: Replace `${CLUSTER_NAME}` with your EKS cluster name. The userData script ensures nodes join the cluster correctly.
Step 3: Configure NodePool
Define a NodePool to control scheduling and life cycle policies:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
nodeClassRef:
name: default # Reference the EC2NodeClass from Step 2
requirements:
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "m", "r"] # Use compute, memory, or general-purpose instances
- key: "karpenter.k8s.aws/instance-generation"
operator: Gt
values: ["5"] # Use instances newer than 5th generation
limits:
cpu: 1000 # Maximum CPU allocation
memory: 1000Gi # Maximum memory allocation
disruption:
consolidationPolicy: WhenEmpty # Consolidate nodes when empty to save costs
consolidateAfter: 30s # Wait 30 seconds before consolidating
Tip: Adjust limits and requirements based on your workload needs to optimize resource usage.
Step 4: Gradually Migrate Workloads
Transition workloads to Karpenter-managed nodes:
1. Label existing nodes: Mark Cluster Autoscaler nodes to distinguish them.
`kubectl label nodes –all cluster-autoscaler=previous`
Purpose: This ensures new workloads prefer Lucia to Karpenter nodes.
2. Add node affinity: Update deployments to schedule on Karpenter nodes.
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-app
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cluster-autoscaler
operator: DoesNotExist # Schedule only on nodes without this label
containers:
- name: app
image: nginx
Note: Apply this affinity rule to all deployments to ensure they use Karpenter nodes.
Step 5: Scale Down Cluster Autoscaler
Once workloads are running on Karpenter nodes, disable Cluster Autoscaler:
1. Reduce Auto Scaling Group (ASG) minimum size:
aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name your-asg-name \
--min-size 0
Caution: Confirm all workloads are on Karpenter nodes before proceeding.
2. Remove Cluster Autoscaler:
`kubectl delete deployment cluster-autoscaler -n kube-system`
Tip: Monitor workloads to ensure they’re unaffected after scaling down.
Step 6: Verify Migration
Confirm Karpenter is working correctly:
1. Check Karpenter Logs: Look for provisioning events or errors.
`kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter`
Expected: Logs should show node provisioning or no errors.
2. Monitor Node Provisioning: Watch for new nodes joining the cluster.
kubectl get nodes -w
Expected: New nodes should appear with Karpenter-managed labels (such as `karpenter.sh/provisioner-name`).
1. Check NodePool Status importers: `kubectl get nodepools`
Expected: The default NodePool should show as Ready.
2. Check EC2NodeClass Status: `kubectl get ec2nodeclasses`
Expected: The default EC2NodeClass should show as Ready.
Advanced Configuration Examples
Tailor Karpenter for specific workloads:
1. Compute-optimized NodePool: Ideal for CPU-intensive workloads like machine learning (ML).
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: compute-optimized
spec:
template:
spec:
nodeClassRef:
name: default
requirements:
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c"] # Compute-optimized instances
- key: "karpenter.k8s.aws/instance-generation"
operator: Gt
values: ["5"] # Newer than 5th generation
2. Spot instance NodePool: Use for cost-sensitive, fault-tolerant workloads like batch processing.
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: spot
spec:
template:
spec:
nodeClassRef:
name: default
requirements:
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "m", "r"]
- key: "karpenter.k8s.aws/instance-generation"
operator: Gt
values: ["5"]
- key: "kubernetes.io/capacity-type"
operator: In
values: ["spot"] # Use Spot instances
Best Practices
- Use multiple NodePools: Create separate NodePools for different workloads (compute-optimized for ML, spot for batch jobs).
- Standardize EC2NodeClass: Reuse a single EC2NodeClass for consistent subnets and security groups.
- Taint and use Node Affinity: Apply taints to control workload placement (`kubectl taint nodes -l karpenter=spot node-role=spot:NoSchedule`).
- Set disruption budgets: Use PodDisruptionBudgets to ensure high availability during node consolidation.
- Monitor costs and resources: Track instance costs with AWS Cost Explorer and cluster metrics with Prometheus.
Troubleshooting
Common issues and fixes:
- Karpenter fails to provision nodes:
- Cause: Missing IAM permissions.
- Fix: Ensure the Karpenter IAM role has `ec2:RunInstances` and `ec2:TerminateInstances` permissions.
- Nodes not joining cluster:
- Cause: Incorrect userData or cluster name in EC2NodeClass.
- Fix: Verify `${CLUSTER_NAME}` matches your EKS cluster name.
- Workloads not scheduling:
- Cause: Misconfigured node affinity or taints.
- Fix: Check node labels and affinity rules with kubectl describe nodes.
- Monitor metrics: Use CloudWatch to track `karpenter_nodes_created` and other metrics.
- Check logs: Run `kubectl logs -n karpenter -l` app.kubernetes.io/name=karpenter for detailed errors.
Conclusion
Migrating to Karpenter v0.32+ transforms EKS node management with faster scaling, cost savings and simplified operations.
By following this guide, you can leverage NodePool and EC2NodeClass for flexible, efficient autoscaling. Test the migration in a nonproduction environment first, and join the
Karpenter Slack community to share feedback or get help!
Additional Resources
Read Andela’s guide and discover how to optimize AWS Lambda to improve cost efficiency and performance.
Andela provides the world’s largest private marketplace for global remote tech talent driven by an AI-powered platform to manage the complete contract hiring lifecycle. Andela helps companies scale teams & deliver projects faster via specialized areas: App Engineering, AI, Cloud, Data & Analytics.
Hear more from our sponsor