tencent cloud

Tencent Kubernetes Engine

Release Notes and Announcements
Release Notes
Announcements
Release Notes
Product Introduction
Overview
Strengths
Architecture
Scenarios
Features
Concepts
Native Kubernetes Terms
Common High-Risk Operations
Regions and Availability Zones
Service Regions and Service Providers
Open Source Components
Purchase Guide
Purchase Instructions
Purchase a TKE General Cluster
Purchasing Native Nodes
Purchasing a Super Node
Getting Started
Beginner’s Guide
Quickly Creating a Standard Cluster
Examples
Container Application Deployment Check List
Cluster Configuration
General Cluster Overview
Cluster Management
Network Management
Storage Management
Node Management
GPU Resource Management
Remote Terminals
Application Configuration
Workload Management
Service and Configuration Management
Component and Application Management
Auto Scaling
Container Login Methods
Observability Configuration
Ops Observability
Cost Insights and Optimization
Scheduler Configuration
Scheduling Component Overview
Resource Utilization Optimization Scheduling
Business Priority Assurance Scheduling
QoS Awareness Scheduling
Security and Stability
TKE Security Group Settings
Identity Authentication and Authorization
Application Security
Multi-cluster Management
Planned Upgrade
Backup Center
Cloud Native Service Guide
Cloud Service for etcd
TMP
TKE Serverless Cluster Guide
TKE Registered Cluster Guide
Use Cases
Cluster
Serverless Cluster
Scheduling
Security
Service Deployment
Network
Release
Logs
Monitoring
OPS
Terraform
DevOps
Auto Scaling
Containerization
Microservice
Cost Management
Hybrid Cloud
AI
Troubleshooting
Disk Full
High Workload
Memory Fragmentation
Cluster DNS Troubleshooting
Cluster kube-proxy Troubleshooting
Cluster API Server Inaccessibility Troubleshooting
Service and Ingress Inaccessibility Troubleshooting
Common Service & Ingress Errors and Solutions
Engel Ingres appears in Connechtin Reverside
CLB Ingress Creation Error
Troubleshooting for Pod Network Inaccessibility
Pod Status Exception and Handling
Authorizing Tencent Cloud OPS Team for Troubleshooting
CLB Loopback
API Documentation
History
Introduction
API Category
Making API Requests
Elastic Cluster APIs
Resource Reserved Coupon APIs
Cluster APIs
Third-party Node APIs
Relevant APIs for Addon
Network APIs
Node APIs
Node Pool APIs
TKE Edge Cluster APIs
Cloud Native Monitoring APIs
Scaling group APIs
Super Node APIs
Other APIs
Data Types
Error Codes
TKE API 2022-05-01
FAQs
TKE General Cluster
TKE Serverless Cluster
About OPS
Hidden Danger Handling
About Services
Image Repositories
About Remote Terminals
Event FAQs
Resource Management
Service Agreement
TKE Service Level Agreement
TKE Serverless Service Level Agreement
Contact Us
Glossary

DynamicScheduler

PDF
포커스 모드
폰트 크기
마지막 업데이트 시간: 2024-12-19 20:02:53
Note
TPS was deactivated on May 16, 2022. For more information, see Notice on TPS Discontinuation on May 16, 2022 at 10:00 (UTC +8). The new Prometheus service will be provided by TMP.
If your Dynamic Scheduler uses TPS as the data source and you don't change it, the Dynamic Scheduler will become invalid. To use TMP as the data source, you need to upgrade the Dynamic Scheduler before associating it with a TMP instance, as TMP adds API authentication capabilities.
If your Dynamic Scheduler uses the self-built Prometheus service, it will not be affected by the TPS deactivation, but you need to guarantee the stability and reliability of the self-built Prometheus service.

Overview

Add-on description

The Dynamic Scheduler is a dynamic scheduler provided by TKE for pre-selection and preferential selection based on actual node loads. It is implemented based on the native kube-scheduler extender mechanism of Kubernetes. After being installed in a TKE cluster, this add-on will work with the kube-scheduler to effectively prevent node load imbalances caused by the native scheduler through the request and limit scheduling mechanisms. This add-on relies on the Prometheus add-on and rule configuration. We recommend you follow the instructions in Deploying dependencies; otherwise, the add-on may not work properly.

Kubernetes objects deployed in a cluster

Kubernetes Object Name
Type
Requested Resource
Namespace
node-annotator
Deployment
100 MB CPU and 100 MiB MEM for each instance; one instance in total
kube-system
dynamic-scheduler
Deployment
400 MB CPU and 200 MiB MEM for each instance; three instances in total
kube-system
dynamic-scheduler
Service
-
kube-system
node-annotator
ClusterRole
-
kube-system
node-annotator
ClusterRoleBinding
-
kube-system
node-annotator
ServiceAccount
-
kube-system
dynamic-scheduler-policy
ConfigMap
-
kube-system
restart-kube-scheduler
ConfigMap
-
kube-system
probe-prometheus
ConfigMap
-
kube-system

Use Cases

Uneven cluster loads

Most of Kubernetes' native schedulers rely on Pod request resources for scheduling, which means that they cannot make decisions based on the actual node loads in the current and past periods of time and may cause the following problems: A large number of remaining resources on some nodes of the cluster can be scheduled (value calculated based on the request and limit values of the running Pods on the nodes), but the actual loads are high; on other nodes, a small number of remaining resources can be scheduled, but the actual loads are low. In this case, the kube-scheduler will preferentially schedule Pods to nodes with more remaining resources (based on the LeastRequestedPriority policy).
The kube-scheduler will schedule the Pod to node 2, despite the fact that node 1 with a lower actual load level is a better choice.



Avoiding scheduling hotspots

To avoid continuing to schedule Pods from low-load nodes, the Dynamic Scheduler supports a policy to avoid scheduling hotspots, that is, to collect the number of Pods scheduled in the past few minutes and lower the node's score during preferential selection. The current policy is as follows:
If more than two Pods are scheduled to the node in the past minute, the node's score for preferential selection is decreased by 1.
If more than five Pods are scheduled to the node in the past five minutes, the node's score for preferential selection is decreased by 1.

Risk Control

This add-on has been interconnected to TKE's monitoring and alarming system.
We recommend you enable event persistence for the cluster to better monitor the add-on for exceptions and locate the problems.
Uninstalling the add-on will only delete the scheduling logic of the Dynamic Scheduler and will not affect the scheduling feature of the native kube-scheduler.

Limits

The TKE is on v1.10.x or later.
If you need to upgrade the Kubernetes master version:
For a managed cluster, you don't need to set the add-on again.
For a self-deployed cluster, master version upgrade will reset the configurations of all the add-ons in the master, which affects the configuration of the Dynamic Scheduler add-on as a scheduler extender. Therefore, you need to uninstall the Dynamic Scheduler and install it again.

How It Works

The Dynamic Scheduler is based on the scheduler extender mechanism to get the node load from the Prometheus data. It adopts a scheduling policy based on the actual node load and performs intervention during pre-selection and preferential selection, so that Pods are preferentially scheduled to low-load nodes. This add-on consists of node-annotator and the dynamic scheduler.

node-annotator

node-annotator is responsible for regularly pulling the metrics of the node load from the monitoring data and sync them to the annotation of the node.
Note:
After the add-on is deleted, the annotation generated by node-annotator will not be cleared automatically and needs to be cleared manually.




Dynamic scheduler

The dynamic scheduler is a scheduler extender that filters and scores the nodes during pre-selection and preferential selection based on the load data of the node annotation.

Pre-selection policy

To avoid scheduling Pods to high-load nodes, you need to filter out some high-load nodes during pre-selection. You can dynamically configure the filter policy and ratio as instructed in Add-On Parameter Description. As both node 2's load in the past five minutes and node 3's load in the past hour exceed the threshold, they will not be included in preferential selection.



Preferential selection policy

To balance the loads on each node in the cluster, the dynamic scheduler will score the nodes based on their load data. The lower the load, the higher the score. Node 1 with the highest score will be preferentially selected for scheduling. You can dynamically configure the scoring policy and weights as instructed in Add-On Parameter Description.



Add-On Parameter Description

Prometheus data query address

Note:
To ensure that the required monitoring data can be pulled by the add-on and the scheduling policy can take effect, follow the Configuring the Prometheus rule step in Deploying dependencies to configure the monitoring data collection rules.
Default values have been set for the pre-selection and preferential selection parameters. If you have no special requirements, you can directly use them.
If you use the self-built Prometheus service, just enter the data query URL (HTTPS/HTTPS).
If you use the managed Prometheus service, just select the managed instance ID, and the system will automatically parse the data query URL of the instance.

Pre-selection parameters

Default Value of the Pre-selection Parameter
Description
Average CPU utilization threshold in five minutes
If the average CPU utilization of the node in the past five minutes exceeds the configured threshold, no Pods will be scheduled to the node.
Maximum CPU utilization threshold in an hour
If the maximum CPU utilization of the node in the past hour exceeds the configured threshold, no Pods will be scheduled to the node.
Average memory utilization threshold in five minutes
If the average memory utilization of the node in the past five minutes exceeds the configured threshold, no Pods will be scheduled to the node.
Maximum memory utilization threshold in an hour
If the maximum memory utilization of the node in the past hour exceeds the configured threshold, no Pods will be scheduled to the node.

Preferential selection parameters

Default Value of the Preferential Selection Parameter
Description
Average CPU utilization weight in five minutes
The greater the weight, the bigger impact the average CPU utilization in the past five minutes has on the node score.
Maximum CPU utilization weight in an hour
The greater the weight, the bigger impact the maximum CPU utilization in the past hour has on the node score.
Maximum CPU utilization weight in a day
The greater the weight, the bigger impact the maximum CPU utilization in the past day has on the node score.
Average memory utilization weight in five minutes
The greater the weight, the bigger impact the average memory utilization in the past five minutes has on the node score.
Maximum memory utilization weight in an hour
The greater the weight, the bigger impact the maximum memory utilization in the past hour has on the node score.
Maximum memory utilization weight in a day
The greater the weight, the bigger impact the maximum memory utilization in the past day has on the node score.

Directions

Deploying dependencies

The Dynamic Scheduler relies on the actual node loads in the current and past periods of time to make scheduling decisions. It needs to get the information of the actual node loads of the system through the Prometheus add-on. Before using the Dynamic Scheduler, you need to deploy the Prometheus add-on. In the TKE, you can use the self-built Prometheus monitoring service or the cloud native monitoring service.
Self-built Prometheus monitoring service
Prometheus monitoring service

Deploying the Node Exporter and Prometheus

You can deploy the Node Exporter and Prometheus as needed to monitor node metrics through the Node Exporter.

Configuring aggregation rules

After getting the node monitoring data from the Node Exporter, you need to aggregate and calculate the data collected in the native Node Exporter through Prometheus. To get metrics such as cpu_usage_avg_5m, cpu_usage_max_avg_1h, cpu_usage_max_avg_1d, mem_usage_avg_5m, mem_usage_max _avg_1h, and mem_usage_max_avg_1d required by the Dynamic Scheduler, you need to configure rules in Prometheus as follows:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: example-record
spec:
groups:
- name: cpu_mem_usage_active
interval: 30s
rules:
- record: cpu_usage_active
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30s])) * 100)
- record: mem_usage_active
expr: 100*(1-node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes)
- name: cpu-usage-5m
interval: 5m
rules:
- record: cpu_usage_max_avg_1h
expr: max_over_time(cpu_usage_avg_5m[1h])
- record: cpu_usage_max_avg_1d
expr: max_over_time(cpu_usage_avg_5m[1d])
- name: cpu-usage-1m
interval: 1m
rules:
- record: cpu_usage_avg_5m
expr: avg_over_time(cpu_usage_active[5m])
- name: mem-usage-5m
interval: 5m
rules:
- record: mem_usage_max_avg_1h
expr: max_over_time(mem_usage_avg_5m[1h])
- record: mem_usage_max_avg_1d
expr: max_over_time(mem_usage_avg_5m[1d])
- name: mem-usage-1m
interval: 1m
rules:
- record: mem_usage_avg_5m
expr: avg_over_time(mem_usage_active[5m])

Configuring the Prometheus file

1. The above section defines the rules to calculate the metrics required by the Dynamic Scheduler. You need to configure the rules to Prometheus as a general Prometheus configuration file. Below is a sample:
global:
evaluation_interval: 30s
scrape_interval: 30s
external_labels:
rule_files:
- /etc/prometheus/rules/*.yml # /etc/prometheus/rules/*.ymlis the defined rules file.
2. Copy the rules configurations to a file (such as dynamic-scheduler.yaml) and put the file under /etc/prometheus/rules/ of the above Prometheus container.
3. Load the Prometheus server to get the metrics required by the Dynamic Scheduler from Prometheus.
Note:
In general, the above Prometheus configuration file and rules configuration file are stored via a ConfigMap before being mounted to a Prometheus server's container. Therefore, you only need to modify the ConfigMap.
1. Log in to the TKE console and select Prometheus Monitoring on the left sidebar.
2. Create a Prometheus instance in the same VPC of the cluster and associate it with the cluster.


3. After associating with the native managed cluster, you can see that the Node Exporter has been installed on each node of the cluster.



4. Set the Prometheus aggregation rules. The content is the same as that configured in Self-built Prometheus monitoring service as follows:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: example-record
spec:
groups:
- name: cpu_mem_usage_active
interval: 30s
rules:
- record: cpu_usage_active
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30s])) * 100)
- record: mem_usage_active
expr: 100*(1-node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes)
- name: cpu-usage-5m
interval: 5m
rules:
- record: cpu_usage_max_avg_1h
expr: max_over_time(cpu_usage_avg_5m[1h])
- record: cpu_usage_max_avg_1d
expr: max_over_time(cpu_usage_avg_5m[1d])
- name: cpu-usage-1m
interval: 1m
rules:
- record: cpu_usage_avg_5m
expr: avg_over_time(cpu_usage_active[5m])
- name: mem-usage-5m
interval: 5m
rules:
- record: mem_usage_max_avg_1h
expr: max_over_time(mem_usage_avg_5m[1h])
- record: mem_usage_max_avg_1d
expr: max_over_time(mem_usage_avg_5m[1d])
- name: mem-usage-1m
interval: 1m
rules:
- record: mem_usage_avg_5m
expr: avg_over_time(mem_usage_active[5m])

Installing the add-on

1. Log in to the TKE console and select Cluster on the left sidebar.
2. On the Cluster management page, click the ID of the target cluster to go to the cluster details page.
3. On the left sidebar, click Add-On Management.
4. On the Add-On List page, select Create. On the Create Add-on page, select DynamicScheduler (dynamic scheduler).
5. Click Parameter Configurations and enter the parameters required by the add-on as instructed in Add-On Parameter Description.
6. Click Done. After the add-on is installed successfully, the Dynamic Scheduler can run normally without extra configurations.

도움말 및 지원

문제 해결에 도움이 되었나요?

피드백