![]() |
VOOZH | about |
The process of automatically scaling in and scaling out of resources is called Autoscaling. There are three different types of autoscalers in Kubernetes: cluster autoscalers, horizontal pod autoscalers, and vertical pod autoscalers. In this article, we're going to see Horizontal Pod Autoscaler.
Application running workload can be scaled manually by changing the replicas field in the workload manifest file. Although manual scaling is okay for times when you can anticipate load spikes in advance or when the load changes gradually over long periods of time, requiring manual intervention to handle sudden, unpredictable traffic increases isnβt ideal.
To solve this problem, Kubernetes has a resource called Horizontal Pod Autoscaler that can monitor pods and scale them automatically as soon as it detects an increase in CPU or memory usage (Based on a defined metric). Horizontal Pod Autoscaling is the process of automatically scaling the number of pod replicas managed by a controller based on the usage of the defined metric, which is managed by the Horizontal Pod Autoscaler Kubernetes resource to match the demand.
A HorizontalPodAutoscaler (HPA) in Kubernetes is a tool that automatically adjusts the number of pod replicas in a deployment, replica set, or stateful set based on observed CPU utilization (or other select metrics). Here's a simple breakdown of how it works:
These steps are necessary to use Autoscaling features. By following the below steps, we can start the cluster and deploy the application into the Minikube cluster.
Step 1:Deploy the minikube cluster.
Step 2: Start your cluster.
$ minikube startStep 3: Enable metrics-server addon to collect metrics of resources.
$ minikube addons enable metrics-serverStep 4: Edit metrics-server deployment by adding --kubelet-insecure-tls argument.
$ kubectl -n kube-system edit deploy metrics-servercontainers:
- args:
- --cert-dir=/tmp
- --secure-port=8448
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-insecure-tls
Step 5: Let's create a deployment for our demo purposes. I chose Nginx as our application with 1 replica. This deployment requests 100 millicores of CPU per pod.
apiVersion: apps/v1
kind: Deployment
metadata:
name: webserver
labels:
app: backend
spec:
replicas: 1
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
containers:
- name: nginx
image: nginx:1.23-alpine
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 200m
memory: 20Mi
requests:
cpu: 100m
memory: 10Mi
$ kubectl create -f nginx-deploy.yamlOne of the most important metrics to define autoscaling is CPU usage. Let's say the CPU usage of processes running inside your pod reaches 100% Then they can't match the demand anymore. To solve this problem, either you can increase the amount of CPU a pod can use (Vertical scale) or increase the number of pods (Horizontal scale) so that the average CPU usage comes down, Enough talking, let's create a Horizontal Pod Autoscaler resource based on CPU usage and see it in action.
Step 1: Create a Horizontal Pod Autoscaler resource for our deployment.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: webserver-cpu-hpa
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webserver
targetCPUUtilizationPercentage: 30
Let's understand what are these attributes
Now create the resource
$ kubectl create -f nginx-deploy-cpu-hpa.yamlLet's put some load on our deployment so that we can see scaling in action
Step 2: First of all, expose our application as NodePort service otherwise how can we load test our application
$ kubectl expose deploy webserver \
--type=NodePort --port=8080 \
--target-port=80
Step 3: Now comes the interesting part, which is load testing. For load testing, I'm using the the Horizontal
Here 250 concurrent users simulate the load for 2 minutes, you can change it accordingly.
$ siege -c 250 -t 2m http://127.0.0.1:58421(replace http://127.0.0.1:58421 with the NodePort service address.)
Open another terminal and watch the resources and you will see an increase in the number of pods. Keep an eye on the number of pods, because as soon as HPA detects the CPU usage exceeds, it will create more pods to handle the load.
$ watch -n 1 kubectl get all po,hpaSince the load crosses the limit, HPA increased the number of replicas from 1 to 2
now the CPU usage becomes 0 it scales down replicas to the minimum replicas (1) defined in the HPA manifest file.
This time we'll configure HPA based on memory usage
Step 1: Creating a Horizontal Pod Autoscaler resource based on memory usage
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webserver-mem-hpa
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webserver
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageValue: 2Mi
Step 2: Here we mention averageValue as 2Mi because nginx deployment is very lightweight, so we've got to set it so that we can see scaling based on memory.
$ kubectl create -f nginx-deploy-mem-hpa.yamlStep 3: Again load test and watch resources in another terminal
Again, the memory usage exceeds so the HPA spins new pod replicas.
The Kubectl scale tool can be utilized to manually scale Kubernetes workloads by altering the number of replicas that are desired in the deployment or statefulset demands. This gives users large control over how assets are distributed based on workload demands.
List Deployments: This command displays the current replicas for each deployment in your cluster.
kubectl get deploymentsScale Deployment: Set the --replicas the parameter to the proper amount of replicas and replace my-deployment with the name of your deployment.
kubectl scale deployment my-deployment --replicas=5
A Deployment may handle its underlying ReplicaSets via performing a rolling update. A HorizontalPodAutoscaler (HPA) is attached to a deployment when autoscaling has been set up for it. With its replicas field, which it modifies based on resource use, the HPA controls the number of replicas utilized for the deployment.
During a rolling update:
The scenario differs substantially for StatefulSets. Without the use of a ReplicaSet or similar intermediate resource, StatefulSets directly keep their pods. When performing a rolling update on an autoscaled StatefulSet:
Container resource metrics are used by Kubernetes to track and control how much resource every container in a cluster uses. These metrics aid to make sure resources are used effectively and that initiatives function effectively. CPU and memory use are significant metrics that are often used for autoscaling and performance monitoring. A summary of Kubernetes' container resource metrics is given below:
Viewing Metrics: The Kubernetes Dashboard or the kubectl top commands may be employed to view metrics.
kubectl top nodeskubectl top podsThe HorizontalPodAutoscaler (HPA) in Kubernetes handles scheduling pod scaling up automatically based on resource use metrics as CPU or memory. Below is an overview of how it operates: