Horizontal Pod Autoscaler (HPA) for Kubernetes pods can automatically adjust the number of pod replicas based on CPU usage, memory usage, and other custom metrics to match the overall level of workload services to the user-defined target value. This document introduces the HPA feature of TKE and describes how to use this feature to achieve automatic scaling of pods.
The HPA feature provides TKE with flexible self-adaptation capabilities, allowing it to quickly increase the number of pod replicas within the user-defined scope to cope with a sudden increase in service loads and then scale in when service loads decrease to save computing resources for other services. The entire process is automatic and requires no manual intervention. It’s suitable for service scenarios with large service fluctuations, a large number of services, and frequent scaling, such as e-commerce services, online education, and financial services.
The HPA feature is implemented by Kubernetes API resources and the controller. Resources use metrics to determine the behavior of the controller, whereas the controller periodically adjusts the number of replicas of service pods based on pod resource usage. This matches the level of workloads to the user-defined target value. The following figure shows the scaling process:
The automatic horizontal scaling for pods does not apply to objects that cannot be scaled, such as DaemonSet resources.
HPA Controller: the control component that controls the HPA scaling logic.
Metrics Aggregator: normally, the controller obtains metric values from a series of aggregation APIs (
metrics.k8s.io API is usually provided by the Metrics server. The community edition can provide the basic CPU and memory metric types. Compared with the community edition, the custom Metrics Server collection used by TKE supports a wider range of HPA metric trigger types, providing relevant metrics such as CPU, memory, disk, network, and GPU metrics. For more information, see TKE Auto-scaling Metrics.
The controller can also obtain metrics from Heapster. However, starting from Kubernetes 1.11, the controller can no longer obtain metrics from Heapster.
Here, we use a Deployment-type workload as an example. Create an odd number of replicas, with the service type set to the "test" workload of the web service. For more information about how to create a Deployment-type workload on the TKE console, see Deployment Management.
The following figure shows the creation result in this example:
On the TKE console, bind the test workload with an HPA configuration. For more information about how to bind an HPA configuration, see HPA Directions. As an example, this document describes the configuration of a policy under which scale-out is triggered when the network egress bandwidth reaches 0.15 Mbps (150 Kbps), as shown in the figure below:
Run the following command to launch a temporary pod in the cluster to test the configured HPA feature (simulated client):
kubectl run -it --image alpine hpa-test --restart=Never --rm /bin/sh
Run the following command in the temporary pod to simulate a situation where large numbers of requests accessing the "hpa-test" service in a short period cause the egress traffic bandwidth to increase:
# hpa-test.default.svc.cluster.local is the domain name of the service in the cluster. To stop the script, press Ctrl+C. while true; do wget -q -O - hpa-test.default.svc.cluster.local; done
After running the request simulation command in the test pod, observe the monitored number of pods of the workload. You will see that the number of replicas for the workload increase to 3 at 18:46, which indicates that an HPA scale-out event was been triggered, as shown in the figure below:
Then, through the monitoring of the network egress bandwidth of the workload, you can see that, at 16:21, the network egress bandwidth increased to about 424 Kbps, exceeding the target value of the network egress bandwidth set by HPA. This further indicates that the HPA Scaling Algorithm has been triggered to add a replica to meet the set target value. Therefore, the number of replicas of the workload has changed to 3, as shown in the figure below:
The HPA Scaling Algorithm does not just rely on formula calculation to control the scaling logic but takes multiple dimensions into consideration to decide whether scale-out or scale-in is needed. Therefore, the actual implementation may differ slightly from expectations. For more information, see Algorithm Details.
When simulating the scale-in process, manually stop executing the request simulation command at about 18:49. Through monitoring, you can observe that the network egress bandwidth decreases to the level before scale-out. At this time, according to the HPA logic, the conditions for workload scale-in are met, as shown in the figure below:
However, according to the monitoring of the number of workload pods shown in the figure below, the workload did not trigger HPA scale-in until 18:55. This is because, after HPA is triggered, there is a default 5-minute toleration time algorithm to prevent frequent scaling operations caused by metric fluctuations within a short period of time. For more information, see Cooling/Delay Support. As shown in the figure below, 5 minutes after the command was stopped, the number of workload replicas was decreased back to the initial setting of 1 replica according to the HPA Scaling Algorithm.
When an HPA scaling event occurs in TKE, the event will be displayed in the event list of the corresponding HPA instance. Note that the time on the event notification list includes "First Occurrence Time" and "Last Occurrence Time". "First Occurrence Time" indicates the first time when the same event occurred, while "Last Occurrence Time" indicates the latest time when the same event occurred. Therefore, as you can see in the event list shown in the figure below, the "Last Occurrence Time" field displays 18:46:01 for the scale-out event in this example and 18:54:40 for the scale-in event. The points in time displayed here match those in the workload monitoring.
In addition, the workload event list also records the events of adding/deleting replicas by workloads when HPA occurs. As shown in the figure below, the points in time of workload scale-out and scale-in match those displayed in the HPA event list. The point in time when the number of replicas increased is 18:46:01, and the point in time when the number of replicas decreased is 18:54:40.
This example demonstrates the HPA feature of TKE and shows how to use the TKE custom metric type network egress bandwidth as the metric for triggering workload HPA scaling.