Configuring HPA (Horizontal Pod Autoscaler) on Kubernetes

Hemanth M Gowda
4 min readMar 3, 2021

What is HPA?

Autoscaling is one of the key features in the Kubernetes cluster. With Kubernetes, we can autoscale our resources natively with the Horizontal Pod Autoscaler.

It is a feature in which the cluster is capable of increasing the number of pods in a replication controller, deployment, replica set, or stateful set based on observed CPU/MEMORY utilization (or any other custom metrics) as the number of requests increases and decreases the number of pods as the requests decrease.

How does HPA work?

HPA scales the number of pods bound to a replication controller, a replica set, a stateful-set, or a deployment based on resource metrics. The latest Kubernetes supports more options for metrics; per-pod metrics like CPU utilization, Memory Utilization, Object metrics, HTTP requests, or external metrics.

The Horizontal Pod Autoscaler is implemented as a control loop, with a period controlled by the controller manager’s — horizontal-pod-autoscaler-sync-period flag (with a default value of 15 seconds).

During each period, the controller manager queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition. The controller manager obtains the metrics from either the resource metrics API (for per-pod resource metrics).

When to use HPA?

The most appropriate use case for HPA would be preventing down-time and performance problems caused by unexpected traffic spikes. If you define your pod’s container resource and requests and limits the right way, then HPA would be perfect to use for these situations because it works better than manually scaling deployments.

How to setup HPA?

Prerequisites:

  1. Kubernetes Cluster
  2. Resource limit set in the deployment
  3. Metric Server installed

To demonstrate Horizontal Pod Autoscaler we will use a custom docker image based on the php-apache image. The Dockerfile has the following content:

FROM php:5-apache
COPY index.php /var/www/html/index.php
RUN chmod a+rx index.php

It defines an index.php page that performs some CPU intensive computations:

<?php
$x = 0.0001;
for ($i = 0; $i <= 1000000; $i++) {
$x += sqrt($x);
}
echo "OK!";
?>

Create a deployment running the image and expose it as a service using the following configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache

Run the following command to create the service:

kubectl apply -f https://k8s.io/examples/application/php-apache.yaml

Now that the server is running, we will create a Horizontal Pod Autoscaler that maintains between 1 and 10 replicas of the Pods controlled by the php-apache deployment we created. HPA will increase and decrease the number of replicas (via the deployment) to maintain an average CPU utilization across all Pods of 50%.

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

We may check the current status of autoscaler by running:

kubectl get hpa

We will start a container, and send an infinite loop of queries to the php-apache service to increase the load.

kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

Now we should see the higher CPU load by executing:

kubectl get hpa

Here, CPU consumption has increased to 305% of the request. As a result, the deployment was resized to 7 replicas:

We will finish our example by stopping the user load.

In the terminal where we created the container with busybox image, terminate the load generation by typing <Ctrl> + C.

Then we will verify the resulting state (after a minute or so):

kubectl get hpa

Conclusion

In this blog post, we’ve seen what HPA is and it behaves. We have seen what HPA does for our scalable resources (which can be deployments, replica sets, replication controllers, and stateful sets) and how it increases or decreases the number of pods by comparing current metrics with given thresholds.

--

--