Public clouds are leased instead of purchased services with complete technical support and assurance, greatly contributing to business stability, scalability, and convenience. But more work needs to be done to reduce costs and improve efficiency, for example, adapting to application development, architecture design, management and Ops, and reasonable use in the cloud. Resource utilization is improved after IDC cloud migration, but not that much; the average utilization of containerized resources is only 13%, indicating a long and uphill way towards improvement.
This article details:
To figure out why utilization is low, let's look at a few cases of resource use:
Request field in Kubernetes manages the CPU and memory reservation mechanism, which reserves certain resources in one container from being used by another. For more information, see Resource Management for Pods and Containers. If
Request is set to a small value, resources may fail to accommodate the business, especially when the load becomes high. Therefore, users tend to set
Request to a very high value to ensure the service reliability. However, the business load is not that high most of the time. Taking CPU as an example, the following figure shows the relationship between the resource reservation (request) and actual usage (cpu_usage) of a container in a real-world business scenario:
As you can see, resource reservation is way more than the actual usage, and the excessive part cannot be used by other loads. Obviously, setting
Request to a very high value leads to great waste. In response, you need to set a proper value and limit infinite business requests as needed, so that resources will not be occupied overly by certain businesses. You can refer to
LimitRange discussed later. In addition, TKE will launch a smart request recommendation product to help you narrow the gap between
Usage, effectively improving resource utilization while guaranteeing business stability.
Most businesses see an obvious change pattern in resource utilization. For example, a bus system usually has a high load during the day and a low load at night, and a game often starts to experience a traffic surge on Friday night, which drops on Sunday night.
As you can see, the same business requests different amounts of resources during different time periods. If
Request is set to a fixed value, utilization will be low when the load is low. The solution is to dynamically adjust the number of replicas to sustain different loads. For more information, see HPA, HPC, and CA discussed later.
Online businesses usually have a high load during the day and require a low latency, so they must be scheduled and run first. In contrast, offline businesses generally have low requirements for the operating time period and latency and can run during off-peak hours of online business loads. In addition, some businesses are computing-intensive and consume a lot of CPU resources, while others are memory-intensive and consume a lot of memory resources.
As shown above, online/offline hybrid deployment helps dynamically schedule offline and online businesses in different time periods to improve resource utilization. For computing-intensive and memory-intensive businesses, affinity scheduling can be used to find the right node. For detailed directions, see online/offline hybrid deployment and affinity scheduling discussed later.
TKE has productized a series of tools based on a large number of actual businesses, helping you easily and effectively improve resource utilization. There are two ways: 1. manual resource allocation and limitation based on Kubernetes native capabilities; 2. automatic solution based on business characteristics.
Imagine that you are a cluster admin and your cluster is shared by four business departments. You need to allow for on-demand use while ensuring stability. In order to improve the overall utilization, you need to limit the maximum amount of resources available for each business and prevent excessive usage by setting default values.
Limit values are set as needed. Here,
Request is resource occupation, indicating the minimum amount of resources available for a container;
Limit is resource limit, indicating the maximum amount of resources available for a container. This contributes to healthier container running and higher resource utilization, despite the fact that
Limit are often left unspecified. In the case of cluster sharing by teams/projects,
Limit tend to be set to high values to ensure stability. When you create a load in the TKE console, the following default values will be set for all containers, which are based on actual business analysis and estimation and may deviate from real-world requirements.
To fine-tune resource allocation and management, you can set namespace-level
LimitRange in TKE.
If your cluster has four businesses, you can use the namespace and
ResourceQuota to isolate them and limit resources.
ResourceQuota is used to set a quota on resources in a namespace, which is an isolated partition in a Kubernetes cluster. A cluster usually contains multiple namespaces to house different businesses. You can set different
ResourceQuota values for different namespaces to limit the cluster resource usage by a namespace, thus implementing preallocation and limitation.
ResourceQuota applies to the following. For more information, see Resource Quotas.
Limitvalues of CPU and memory for all containers.
ResourceQuotafor each namespace for allocation.
TKE has productized
ResourceQuota. You can directly use it in the console to limit the resource usage of a namespace. For detailed directions, see Namespace.
Limit are left unspecified or set to high values? If you are the admin, you may set different default values and ranges for different businesses to limit excessive resource preemption by businesses while facilitating creation.
ResourceQuota, which limits the overall resource usage of a namespace,
LimitRange applies to a single container in a namespace. It can prevent creating containers that request too many or too few resources in a namespace and address the situation where
Limit are left unspecified.
LimitRange applies to the following. For more information, see Limit Ranges.
Limitof a resource.
Limitvalues for all containers. If no custom values are specified for a container, the default values will apply.
Limitby namespace can improve resource utilization.
TKE has productized
LimitRange. You can directly manage it by namespace in the console. For detailed directions, see Limit Ranges.
LimitRange for resource allocation and limitation respectively rely on experience and manual operations, mainly addressing unreasonable resource requests and allocation. This section describes how to improve resource utilization through automated dynamic adjustments from the perspectives of elastic scaling, scheduling, and online/offline hybrid deployment.
In scenario 2 of resource waste, if your business goes through peak and off-peak hours, a fixed
Request value is bound to cause resource waste during off-peak hours. In this case, you can consider automatically increasing and decreasing the number of replicas of the business load during peak and off-peak hours respectively to enhance the overall utilization.
Horizontal Pod Autoscaler (HPA) can automatically increase and decrease the number of Pod replicas in Deployment and StatefulSet based on metrics such as CPU and memory utilization to stabilize workloads and achieve truly on-demand usage.
TKE supports many metrics for elastic scaling based on the custom metrics API, covering CPU, memory, disk, network, and GPU in most HPA scenarios. For more information on the list, see HPA Metrics. In complex scenarios such as automatic scaling based on the QPS per replica, the prometheus-adapter can be installed. For detailed directions, see Using Custom Metrics for Auto Scaling in TKE.
Suppose you are planning 11/11 shopping spree promotions for your business on an e-commerce platform. You may consider using HPA for automatic scaling. However, HPA needs to monitor metrics first before responding, which means it may not be fast enough to scale out to promptly sustain heavy traffic. If you expect a traffic surge, consider adding replicas in advance.
Horizontal Pod Cronscaler (HPC) is a proprietary add-on of TKE designed to control the number of replicas on schedule to scale and trigger the impact of insufficient resources during dynamic scale-outs in advance. Compared with community CronHPA, HPC supports:
In the case of gaming services, the number of players skyrockets from Friday night to Sunday night. You can scale out the game server before Friday night and scale it in to the original size after Sunday night to ensure a better experience. If HPA is used, services may suffer because scaling out is not fast enough.
TKE has productized HPC that uses the crontab syntax format, but you need to install it on the Add-On Management page in advance in the following steps:
Both HPA and HPC increase and decrease the number of replicas automatically at the business load level to adapt to traffic fluctuations and improve resource utilization. However, at the cluster level, the total amount of resources is fixed, and HPA and HPC only allow for more spare resources. Is there a way to reclaim some resources during idle hours and scale out the cluster during busy hours? Such cluster-level elastic scaling is quite cost-efficient, as the overall resource usage of a cluster determines the bill.
Cluster Autoscaler (CA) automatically adjusts the number of cluster nodes to truly improve resource utilization and save costs. It is the key to cost reduction and efficiency improvement.
CA in TKE is provided in the form of node pools. We recommend you use CA together with HPA, as the former performs resource-layer (node-layer) scaling, while the latter application-layer scaling. When the overall resources are insufficient after an HPA scale-out, Pods will be pending, and CA will expand the node pool to increase the overall resource volume in the cluster.
For more information on parameter configuration methods and use cases, see Node Pool Overview.
The Kubernetes scheduling mechanism is a native resource allocation mechanism which is efficient and graceful. Its core feature is to find the right node for each Pod. In TKE scenarios, the scheduling mechanism contributes to the transition from application-layer to resource-layer elastic scaling. A reasonable scheduling policy can be configured based on business characteristics by properly leveraging Kubernetes scheduling capabilities to effectively enhance resource utilization in clusters.
If one of your CPU-intensive businesses is scheduled to a memory-intensive node through the Kubernetes scheduler by accident, all the CPU of the node will be taken up, but its memory will be barely used, resulting in serious waste. In this case, you can label such node as CPU-intensive and label a business load during creation to indicate that it needs to run on a CPU-intensive node. The Kubernetes scheduler will then schedule the load to a CPU-intensive node. This way of finding the right node helps effectively improve resource utilization.
When creating Pods, you can set node affinity to specify nodes to which Pods will be scheduled (these nodes are specified with Kubernetes labels).
Node affinity is ideal for scenarios where workloads with different resource requirements run simultaneously in a cluster. For example, CVM nodes can be CPU-intensive or memory-intensive. If certain businesses require much higher CPU usage than memory usage, using general CVM instances will inevitably cause a huge waste of memory. In this case, you can add a batch of CPU-intensive CVM instances to the cluster and schedule CPU-intensive Pods to them, so as to improve the overall utilization. Similarly, you can manage heterogeneous nodes (such as GPU instances) in the cluster, specify the amount of GPU resources needed in the workloads, and have the scheduling mechanism find the right nodes to run these workloads.
TKE provides an identical method to use node affinity as native Kubernetes. You can use this feature in the console or by configuring a YAML file. For detailed directions, see Proper Resource Allocation.
The native Kubernetes scheduling policy, such as the default
LeastRequestedPriority, tends to schedule Pods to nodes with more resources remaining. However, this kind of resource allocation is static, and
Request does not indicate the actual usage, leading to a certain level of waste. If the scheduler can schedule resources based on the actual resource utilization of nodes, it will avoid resource waste to some extent.
The proprietary Dynamic Scheduler of TKE is a solution, which works as shown below:
In addition to reducing resource waste, the dynamic scheduler can well mitigate scheduling hotspots in a cluster.
You can install and use the Dynamic Scheduler in an extended add-on:
For more information on the Dynamic Scheduler guide, see DynamicScheduler.
If you have both online web businesses and offline computing businesses, you can use TKE's online/offline hybrid deployment technology to dynamically schedule and run businesses to improve resource utilization.
In the traditional architecture, big data and online businesses are independent and deployed in different resource clusters. Generally, big data businesses are for offline computing and experience peak hours during nights, during which online businesses are barely loaded. Leveraging complete isolation capabilities of containers (involving CPU, memory, disk I/O, and network I/O) and strong orchestration and scheduling capabilities of Kubernetes, cloud-native technologies implement the hybrid deployment of online and offline businesses to fully utilize resources during idle hours of online businesses.
In the Hadoop architecture, offline and online jobs are in different clusters. Online and streaming jobs experience obvious load fluctuations, which means a lot of resources will be idle during off-peak hours, leading to great waste and higher costs. In clusters with online/offline hybrid deployment, offline tasks are dynamically scheduled to online clusters during off-peak hours, significantly improving resource utilization. Currently, Hadoop YARN can only statically allocate resources based on the static resource status reported by
NodeManager, making it unable to well support hybrid deployment.
Online businesses experience obvious and regular load fluctuations, with a low resource utilization at night. In this case, the big data management platform delivers resource creation requests to Kubernetes clusters to increase the computing power of the big data application.
Besides costs, system stability is another metric that weighs heavily in enterprise Ops. It's challenging to balance the two. On the one hand, the higher the resource utilization, the better for cost reduction; on the other hand, a too high resource utilization may cause overload and thereby OOM errors or CPU jitters.
To help enterprises get rid of the dilemma, TKE provides the DeScheduler to keep the cluster load under control. It is responsible for protecting nodes with risky loads and gracefully draining businesses from them. The relationship between the DeScheduler and the Dynamic Scheduler is as shown below:
You can install and use the DeScheduler in an extended add-on. For detailed directions, see DeScheduler.