Principles for Adding Preset Resources When Scale-out Rules Are Executed
Each cluster can configure up to 10 types of scaling specifications. When the scale-out rule is triggered, the scale-out will be executed based on the priority of the specifications. If the high-priority specification resources are insufficient, the sub-priority resources will be combined with high-priority resources to supplement the calculation resources (following the same order for pay-as-you-go and spot instances).
When resources are sufficient: 1 > 2 > 3 > 4 > 5
Example:
When 5 types of specifications are preset and resources are sufficient, if the scale-out rule is triggered to scale out 10 nodes, 10 nodes will be scaled out based on specification 1 in sequence, and the other preset specifications will not be selected.
When resources are insufficient: 1+2 > 1+2+3 > 1+2+3+4 > 1+2+3+4+5
Example:
When preset specification 1 has 8 nodes, specification 2 has 4 nodes, and specification 3 has 3 nodes, if the scale-out rule triggers the need to scale out 13 nodes, 8 nodes will be scaled out based on specification 1, 4 nodes will be scaled out based on specification 2, and 1 node will be scaled out based on specification 3 in the sequence.
When the resource specification is out of stock, assuming specification 2 is unavailable: 1+3 > 1+3+4 > 1+3+4+5.
Example:
When preset specification 1 has 8 nodes, specification 2 is unavailable, and specification 3 has 3 nodes, if the scale-out rule is triggered to scale out 10 nodes, 8 nodes will be scaled out based on specification 1, specification 2 will not be selected, and 2 nodes will be scaled out based on specification 3 in the sequence.
When preset specification 1 has 8 nodes and all other preset specifications are unavailable, if the scale-out rule is triggered to scale out 10 nodes, 8 nodes will be scaled out based on specification 1, with partial success in scale-out.
Scale-out methods: You can choose from nodes, memory, or cores. All three methods only support non-zero integer input. When you select cores or memory as the method, the scale-out process ensures maximum computing power by converting the node quantity accordingly.
Example:
When you scale out by cores, if the scale-out rule is set to 10 cores but the specification priority is for 8-core nodes, the rule will trigger the scale-out of two 8-core nodes.
When you scale out by memory, if the scale-out rule is set to 20 GB but the specification priority is for 16 GB nodes, the rule will trigger the scale-out of two 16 GB nodes.
Principles for Scaling out Elastic Nodes When “Achieve Targets as Possible” Is Enabled
During peak order placement, auto scale-out may result in the actual number of scale-out machines failing to reach the elastic target quantity due to resource contention. When the “Achieve Targets as Possible” policy is enabled and the configured scaling specification resources are sufficient, the system will automatically retry resource requests until the target quantity is achieved or approached.
Note:
The maximum retry count is 5. After this limit is reached, no more retries will be made regardless of whether the scale-out quantity has achieved the target quantity.
The priority of preset resource specifications is 1 > 2 > 3 > 4 > 5, assuming that preset specification 1 has 8 nodes, specification 2 has 2 nodes, specification 3 has 6 nodes, and specification 4 has 6 nodes.
Example:
Achieve Targets as Possible is not enabled: If the scale-out rule triggers the need to scale out 15 nodes, 8 nodes will be scaled out on specification 1, 2 nodes will be scaled out on specification 2, and 5 nodes will be scaled out on specification 3 in the sequence. Due to resource contention, the actual allocation is: 8 nodes scaled out on specification 1, 0 nodes scaled out on specification 2, and 5 nodes scaled out on specification 3. A total of 13 nodes are scaled out, indicating that the scale-out is partially successful.
Achieve Targets as Possible is enabled: If the scale-out rule triggers the need to scale out 15 nodes, 8 nodes will be scaled out on specification 1, 2 nodes will be scaled out on specification 2, and 5 nodes will be scaled out on specification 3 in the sequence. Due to resource contention, the actual initial allocation is: 8 nodes scaled out on specification 1, 0 nodes scaled out on specification 2, and 5 nodes scaled out on specification 3. A total of 13 nodes are scaled out. Then, the supplement retry is triggered, polling specifications 1, 2, 3, 4, and 5. The final supplementary allocation is: 1 node scaled out on specification 2 and 1 node scaled out on specification 3. The total allocation of 2 times is: 8 nodes scaled out on specification 1, 1 node scaled out on specification 2, and 6 nodes scaled out on specification 3. A total of 15 nodes are scaled out, indicating that the scale-out is successful.
Principles for Scaling in Elastic Nodes When Scale-in Rules Are Executed
Elastic nodes added through the auto-scaling feature will prioritize scaling in idle nodes first, following the principle of scaling in nodes in reverse order of creation time. If the required scale-in number is not met, nodes running containers will then be selected for scale-in. For load-based scale-in, nodes running services with load metrics will be prioritized, and idle nodes will be scaled in first. This also follows the principle, and if the required scale-in number is not met, nodes running containers will be selected for scale-in. Non-elastic nodes will not be affected by scale-in rules, and scale-in actions will not be triggered. Non-elastic nodes only support manual scale-in.
Note
Scheduled termination of nodes will not be constrained by the principles of scaling in nodes in reverse order of creation time and minimum number of cluster nodes. The scale-in will be executed once the set time is reached, with a default graceful scale-in period of 30 minutes.
The criteria for determining an idle node is that there are no running containers within the last 5 minutes.
Scale in based on load, assuming the node creation time is from earliest to latest: A > B > C > D > E.
Example:
When you set YARN load metric scale-in for 5 nodes, with C, D, and E deployed with YARN components and D and E running containers, the scale-in order when the rule is triggered will be C > E > D > B > A.
When you set Trino load metric scale-in for 5 nodes, with C, D, and E deployed with Trino components and D and B running containers, the scale-in order will be E > C > D > A > B, when the rule is triggered.
Scale in based on time, assuming the node creation time is from earliest to latest: A > B > C > D > E.
Example:
When you set node-based scale-in for 5 nodes, with D and E running containers, the scale-in order will be C > B > A > E > D, when the rule is triggered.
Scale-in methods: Support for three options including nodes, memory, and cores. Only non-zero integer values are allowed for all three options. When you select cores or memory, the scale-in process ensures business continuity by calculating the minimum number of nodes required for scale-in. If no tasks are running on the nodes, they will be scaled in following a reverse chronological order, ensuring at least one node is scaled in.
Example:
When you scale in by cores, setting a scale-in of 20 cores. When the scale-in rule is triggered, with the cluster having elastic nodes consisting of three 8-core 16 GB nodes and two 4-core 8 GB nodes (in reverse chronological order), the system will successfully scale in two 8-core 16 GB nodes.
When you scale in by memory, setting a scale-in of 30 GB. When the scale-in rule is triggered, with the cluster having elastic nodes consisting of three 8-core 16 GB nodes and two 4-core 8 GB nodes (in reverse chronological order), the system will successfully scale in one 8-core 16 GB node.
If performing a scale-in by CPU cores, set the scale-in to 20 cores. When the scale-in rule is triggered, the cluster contains the following elastic nodes in descending order: 3 units of 32-core 64 GB nodes and 2 units of 64-core 128 GB nodes. If no elastic nodes meet the condition, the scale-in rule will not be triggered.
Principles for Triggering and Executing Scaling Rules
Elastic scaling rules can be set based on both time and load metrics. The rules follow the first triggered, first executed principle, and if multiple rules are triggered simultaneously, they are executed based on their priority order. The rule status indicates whether the rule is active or not. By default, it is enabled, but the status can be set to disabled when you want to keep the configuration without executing the rule.
Scaling based on load only.
1.1 Follows the first triggered, first executed, and if multiple rules are triggered simultaneously, they are executed based on their priority order principle, such as 1 > 2 > 3 > 4 > 5.
1.2 A single load-based scaling rule can support multiple metrics. You can choose to trigger the rule either when all metrics meet the conditions or when any metric meets the condition.
1.3 Load-based scaling can be set to monitor cluster load changes within a specific time period.
Scaling based on time only.
1.1 Follows the first triggered, first executed, and if multiple rules are triggered simultaneously, they are executed based on their priority order principle, such as 1 > 2 > 3 > 4 > 5.
1.2 The rule can be set to execute repeatedly. Once the rule expires, it becomes inactive. Alarms will be sent before expiration; see the alarm configuration. Scaling based on both load and time.
Follows the first triggered, first executed, and if multiple rules are triggered simultaneously, they are executed based on their priority order principle, such as 1 > 2 > 3 > 4 > 5.
Corresponding Relationships of Queue Load Metrics
Load-based scaling supports the following cluster types, components, and version information:
Hadoop clusters: YARN component and Trino component (EMR-V2.7.0, EMR-V3.40 and later versions).
Ray cluster: Uniffle component.
StarRocks cluster: StarRocks component.
|
YARN | AvailableVCores | root | AvailableVCores#root | Number of available virtual cores in the Root queue |
|
| root.default | AvailableVCores#root.default | Number of available virtual cores in the root.default queue |
|
| Custom sub-queue | For example: AvailableVCores#root.test | Number of available virtual cores in the root.test queue |
| PendingVCores | root | PendingVCores#root | Number of virtual cores needed for upcoming tasks in the Root queue |
|
| root.default | PendingVCores#root.default | Number of virtual cores needed for upcoming tasks in the root.default queue |
|
| Custom sub-queue | For example: PendingVCores#root.test | Number of virtual cores needed for upcoming tasks in the root.test queue |
| AvailableMB | root | AvailableMB#root | Available memory in the Root queue (MB) |
|
| root.default | AvailableMB#root.default | Available memory in the root.default queue (MB) |
|
| Custom sub-queue | For example: AvailableMB#root.test | Available memory in the root.test queue (MB) |
| PendingMB | root | PendingMB#root | Available memory needed for upcoming tasks in the Root queue (MB) |
|
| root.default | PendingMB#root.default | Available memory needed for upcoming tasks in the root.default queue (MB) |
|
| Custom sub-queue | For example: PendingMB#root.test | Available memory needed for upcoming tasks in the root.test queue (MB) |
| AvailableMemPercentage | Cluster | AvailableMemPercentage | Percentage of available memory |
| ContainerPendingRatio | Cluster | ContainerPendingRatio | Ratio of pending containers to allocated containers |
| AppsRunning | root | AppsRunning#root | Number of running tasks in the root queue |
|
| root.default | AppsRunning#root.default | Number of tasks running in the root.default queue |
|
| Custom sub-queue | For example: AppsRunning#root.test | Number of tasks running in the root.test queue |
| AppsPending | root | AppsPending#root | Number of pending tasks in the root queue |
|
| root.default | AppsPending#root.default | Number of pending tasks in the root.default queue |
|
| Custom sub-queue | For example: AppsPending#root.test | Number of pending tasks in the root.test queue |
| PendingContainers | root | PendingContainers#root | Number of pending containers in the root queue |
|
| root.default | PendingContainers#root.default | Number of pending containers in the root.default queue |
|
| Custom sub-queue | For example: PendingContainers#root.test | Number of pending containers in the root.test queue |
| AllocatedMB | root | AllocatedMB#root | Allocated memory in the root queue |
|
| root.default | AllocatedMB#root.default | Allocated memory in the root.default queue |
|
| Custom sub-queue | For example: AllocatedMB#root.test | Allocated memory in the root.test queue |
| AllocatedVCores | root | AllocatedVCores#root | Number of virtual cores allocated to the root queue |
|
| root.default | AllocatedVCores#root.default | Number of virtual cores allocated to the root.default queue |
|
| Custom sub-queue | For example: AllocatedVCores#root.test | Number of virtual cores allocated to the root.test queue |
| ReservedVCores | root | ReservedVCores#root | Number of virtual cores reserved in the root queue |
|
| root.default | ReservedVCores#root.default | Number of virtual cores reserved in the root.default queue |
|
| Custom sub-queue | For example: ReservedVCores#root.test | Number of virtual cores reserved in the root.test queue |
| AllocatedContainers | root | AllocatedContainers#root | Number of containers allocated in the root queue |
|
| root.default | AllocatedContainers#root.default | Number of containers allocated in the root.default queue |
|
| Custom sub-queue | For example: AllocatedContainers#root.test | Number of containers allocated in the root.test queue |
| ReservedMB | root | ReservedMB#root | Amount of memory reserved in the root queue |
|
| root.default | ReservedMB#root.default | Amount of memory reserved in the root.default queue |
|
| Sub Queue Definition | e.g., ReservedMB#root.test | Amount of reserved memory in the root.test queue |
| AppsKilled | root | AppsKilled#root | Number of terminated tasks in the root queue |
|
| root.default | AppsKilled#root.default | Number of terminated tasks in the root.default queue |
|
| Sub Queue Definition | e.g., AppsKilled#root.test | Number of terminated tasks in the root.test queue |
| AppsFailed | root | AppsFailed#root | Number of failed tasks in the root queue |
|
| root.default | AppsFailed#root.default | Number of failed tasks in the root.default queue |
|
| Sub Queue Definition | For example: AppsFailed#root.test | Number of failed tasks in the root.test queue |
| AppsCompleted | root | AppsCompleted#root | Number of completed tasks in the root queue |
|
| root.default | AppsCompleted#root.default | Number of completed tasks in the root.default queue |
|
| Sub Queue Definition | e.g., AppsCompleted#root.test | Number of completed tasks in the root.test queue |
| AppsSubmitted | root | AppsSubmitted#root | Number of tasks submitted to the root queue |
|
| root.default | AppsSubmitted#root.default | Number of tasks submitted to the root.default queue |
|
| Sub Queue Definition | e.g., AppsSubmitted#root.test | Number of tasks submitted in the root.test queue |
| AvailableVCoresPercentage | Cluster | Cluster | AvailableVCoresPercentage |
Percentage of available virtual cores in the cluster | MemPendingRatio | root | MemPendingRatio#root |
Percentage of available memory waiting in the root queue
|
Percentage of available memory waiting in the root queue |
| root.default | MemPendingRatio#root.default |
Percentage of available memory waiting in the root.default queue
|
Percentage of available memory waiting in the root.default queue |
| Sub Queue Definition | e.g., MemPendingRatio#root.test |
Percentage of available memory waiting in the root.test queue
|
| usageRatioVCores | root | usageRatioVCores#root | CPU resources usage rate of the root queue |
|
| root.default | usageRatioVCores#root.default | CPU resources usage rate of the root.default queue |
|
| Sub Queue Definition | e.g., usageRatioVCores#root.test | CPU resources usage rate of the root.test queue |
| usageRatioMem | root | usageRatioMem#root | Memory resource usage rate of the root queue |
|
| root.default | usageRatioMem#root.default | Memory resource usage rate of the root.default queue |
|
| Sub Queue Definition | e.g., usageRatioMem#root.test | Memory resource usage rate of the root.test queue |
Trino | FreeDistributed | Cluster | FreeDistributed |
Available Distributed memory in the cluster
|
Available Distributed memory in the cluster | QueuedQueries | Cluster | QueuedQueries |
Total number of queries waiting to be executed in the queue
|
StarRocks | CNUsedVCores | Cluster | CNUsedVCores | The number of virtual cores used by all CN nodes in a cluster |
|
| compute group |
| Specify the number of virtual cores used by ALL CN nodes in the compute group |
| CNUsedMB | Cluster | CNUsedMB | Used memory on all CN nodes in a cluster (MB) |
|
| compute group |
| Specify the used memory on ALL CN nodes in the compute group (MB) |
| CNUsedVCoresPercentage | Cluster | CNUsedVCoresPercentage | Node CPU utilization of all CN nodes in a cluster |
|
| compute group |
| Designate ALL CN node CPU utilization in the compute group |
| CNUsedMBPercentage | Cluster | CNUsedMBPercentage | Node memory usage of all CNs in a cluster |
|
| compute group |
| Specify node memory usage for ALL CN nodes in the compute group |
Uniffle | DataDiskUsedPercentage | Cluster | DataDiskUsedPercentage | Average usage rate of shuffle server data disk space |
| DataDiskIOPercentage | Cluster | DataDiskIOPercentage | Average utilization of IO device on shuffle server data disk |