Information | Description | |
Basic Info | Rule Name | Alarm rule name, 1-128 characters, only Chinese, English, digits, and underscores. |
| Rule description | Alarm rule description, optional, no more than 500 characters. |
Monitoring Object | | Select the monitored object for rule alarm. Currently supports tasks and projects. When selecting a task as the configuration object, alarm rules can be configured for tasks through computing task, workflow, or project. When selecting a project as the configuration object, configure alarm rules for it by checking the project. When selecting a task as the monitored object, configure alarm rules via: Configure by task: Click the Add button, and you can check the submitted operational compute task nodes in the orchestration space in the pop-up dialog box to configure alarm rules for them. ![]() Configure by workflow: Click the Add button, and you can check the submitted operational workflows in the orchestration space in the pop-up dialog box to configure alarm rules for the computing tasks within the workflows. Provide allow list capability. After the computing tasks in the workflow are added to the allow list, the allow listed tasks will not be monitored by the alarm rules. ![]() Configure by project: Click the Add button, and you can check the submitted operational computational tasks in the orchestration space of the current project in the pop-up dialog box to set alarm rules for them. Provide allow list capability. After the computational tasks are added to the allow list, the allow listed tasks will not be monitored by the alarm rules. ![]() When selecting a project as the monitored object, configure alarm rules via: Configure by project: Click the Add button, and you can check the projects the current user has joined (non-visitor) in the pop-up dialog box to set alarm rules. Users can select multiple projects for monitoring, and the subsequent alert conditions will determine based on the sum of the corresponding monitor values of the selected items. |
Alarm condition (monitored object: project) | Failed instance count upward fluctuation rate (compared to the 7-day average) | Use the fluctuation rate and the cumulative failed instance count on the day as conditions to determine whether to trigger an alarm based on the consecutive or cumulative count of conditions met within the input number of periods. ![]() The fluctuation rate calculation formula is: (The number of failed instances on the current day at the current moment - The average number of failed instances at the same time in the past 7 days) / The average number of failed instances at the same time in the past 7 days If this value is in both of the following statuses, it meets the alarm conditions: 1. The average of this value is positive 2. exceeds the set threshold |
| Failed instance count downward fluctuation rate (compared to the 7-day average) | Use the fluctuation rate and the cumulative failed instance count on the day as conditions to determine whether to trigger an alarm based on the consecutive or cumulative count of conditions met within the input number of periods. ![]() The fluctuation rate calculation formula is: (The number of failed instances on the current day at the current moment - The average number of failed instances at the same time in the past 7 days) / The average number of failed instances at the same time in the past 7 days If this numeric value is in both of the following statuses, it meets the alarm conditions: 1. The average of this value is negative 2. Absolute value exceeds the set threshold |
Alarm condition Monitored object is an offline task. | Execution Failed | An alert is triggered when the monitored task's instance fails to run. Configuration can be made for cycle execution, data replenishment, or rerun execution. The rule trigger condition can be either "All retries still fail after completion" or "The First Execution failed". All retries still fail after completion: According to the computational task scheduling strategy, if failure retries are configured, the alarm rule is triggered when all retries fail, and the instance fails. The First Execution Failed: According to the computational task scheduling strategy, the alarm rule is triggered when the instance generated from the first run fails. ![]() ![]() |
| Running Timeout | An alarm is triggered when the monitored task's instance scheduling or running exceeds the preset time. Configuration can be made for periodic execution, data replenishment, or rerun execution. Rule trigger conditions can be set for four key time requirements: "task running time", "task completion time", "total task waiting time", and "task not completed within the period". Task running time: Selecting task running time (cycle) means calculating from the task cycle's start time. If it exceeds the threshold and is not completed, an alarm is triggered. Selecting task running time (rerun, replenishment) means calculating from the task's replenishment or rerun start time. If it exceeds the threshold and is not completed, an alarm is triggered. A "fixed value" or "historical average" can be used as the alarm condition. Fixed value: If the instance is not completed within the designated Hour Minutes time requirement, the alarm rule is triggered. Historical average: Calculate the instance running time of the most recent 10 successful executions of the computing task, remove the maximum and minimum values, then calculate the average of the rest. If there are fewer than 10 executions, the setting is invalid. ![]() Task completion time (cycle): Calculate from the task instance run start time. If not completed by the prescribed time point, an alarm is triggered. A "fixed value" or "historical average" can be used as the alarm condition. Fixed value: If the instance is not completed by the designated hour/minute time point, the alarm rule is triggered. Historical average: Calculate the average instance running time of the most recent 10 successful executions after removing the maximum and minimum values. The setting is invalid if there are fewer than 10 executions. ![]() Total waiting duration for tasks: Selecting total waiting duration for tasks (period) indicates the interval time from the scheduled task time to the start of scheduling run time. If it exceeds the threshold and is not running, an alarm will be triggered. Selecting total waiting duration for tasks (rerun, supplementary entry) indicates the interval time from the submission time to the start of scheduling run time for rerun and supplementary entry instances. If it exceeds the threshold and is not running, an alarm will be triggered. A "fixed value" or "historical average" can be used as the alarm condition. Fixed value: If the instance does not start running within the designated Hour Minutes (HH:MM) time requirement, the alarm rule is triggered. Historical average: Calculate the average waiting duration of the most recent 10 successful executions after removing the maximum and minimum values. The setting is invalid if there are fewer than 10 executions. ![]() Task not completed within period (period): Trigger an alarm when a task instance does not complete during its current runtime cycle. Period = interval * period unit, for example: Minute-based task: For a 15-minute interval task, the cycle is 15 minutes. If the task runs for more than 15 minutes and is not completed, trigger an alarm. Hourly task: If the specified hour or interval is 1, the period is 1 hr; if the interval is 2, the period is 2 hr, and so on. Day, week, month, and year tasks: The period is 1 day. ![]() |
| Run Successful | An alarm is triggered when the monitored task's instance runs successfully. Configuration can be made for periodic execution, data replenishment, or rerun execution. |
Alarm condition Monitored object is a real-time task. | Data ingestion/egress interval | Real-time task alert is only supported on the latest version of standard Spark engine. To upgrade the engine, submit a DLC ticket. When a real-time task (DLC Spark Streaming) has a continuous data inflow/outflow amount of 0 within the preset time, the system will trigger an alarm. The minimum interval for alarm notification follows the configured alarm triggering frequency. Alarm trigger frequency configurable range: 5-1440 minutes. ![]() |
| Number of retries | Real-time task alert is only supported on the latest version of standard Spark engine. To upgrade the engine, submit a DLC ticket. When a real-time task (DLC Spark Streaming) exceeds the preset retry count within the preset time, the system triggers an alarm. The minimum interval for alarm notification follows the configured alarm triggering frequency. Alarm trigger frequency configurable range: 5-1440 minutes. ![]() |
Alarm Notification | Alarm Severity | Differentiate alarm message content based on alarm severity of different types. Currently provides ordinary, important, and critical type selection. ![]() |
| Alert Method | After an alarm rule is triggered, the sending channels for alarm information. Currently supports push methods such as mail, SMS, WeChat, call, WeCom, HTTP, WeCom group, Lark Group, DingTalk Group, Slack Group, and Teams Group. Mobile phone, WeChat, and mail accounts can be configured in the Tencent Cloud Personal Center > Cloud Access Management > User module, while WeCom accounts are configured in the Tencent Cloud Personal Center > Cloud Access Management > Associated Account module. Only select one from WeCom group, Lark Group, DingTalk Group, Slack Group, or Teams Group. ![]() |
| Alarm Recipient | After an alarm rule is triggered, it will send alert messages to recipients. Currently supports three methods to set alert message recipients: "specified personnel", "task owner", and "duty schedule". Specified personnel: You can specify any one or more users as alert message recipients. Task owner: Set the computing task owner as the alert message recipient. Duty schedule: Set the scheduled duty schedule as the recipient to send alarm messages to duty users. Once a user in the duty schedule clicks confirm alarm for an alarm message, other users under the same duty schedule will no longer receive messages for this alarm. ![]() |
| Alarm escalation recipient | When selecting the specified personnel or task owner option as the alarm recipient, you can add alarm escalation recipients. Alarm escalation recipients support adding project members through a drop-down list, with up to 5 additions. If the alarm recipient or superior escalation recipient does not confirm this item in the alarm information of the Ops center within the alarm interval, the system will send the alarm to the next-level escalation recipient. The hierarchy of alarm escalation recipients is determined based on the positional sequence configured. ![]() |
| Notification Frequency | Support defining the number of times an alarm is sent and the interval time between each message. If at least one alarm escalation recipient is configured, the notification frequency cannot be configured, only displaying the alarm interval. ![]() |
| Do Not Disturb | Support setting a do not disturb period. Alarms will not be sent during this period. Users can view alarm records in the alarm information. Do Not Disturb supports configuration based on week and time, and supports multiple do not disturb periods. ![]() |
| Add alarm receipt configuration | Alarm reception configuration supports adding multiple groups to configure different alert modes for different recipients. Click Add alarm reception configuration at the bottom of the alarm notification configuration to add a new configuration. A single rule can support adding up to 10 groups of alarm reception configurations. |

Information | Description |
Rule Name | Display the alarm rule name and ID. |
Monitoring Object | Display tasks, workflows, and projects where alarm rules take effect, and check computing tasks involved in alarm rules under monitored objects. |
Alarm Type | Display the monitoring type of alarm rules: fail, timeout, success. |
Alarm Severity | Display the alarm level of alarm rules: ordinary, important, critical. |
Alarm On-Off | Display the current running state of alarm rules, which can be manually enabled or disabled. When stopped, alarm rules will not take effect and alarm information will not be generated. |
Alert Method | Display the sending channel for alarm information of alarm rules. |
Alarm Recipient | Display the alert message recipient and alarm escalation person in the alarm rule configuration. |
Created by | Display the creator of the current alarm rule. |

Information | Description |
Rule Details | View rule details to check parameters configured for the alarm rule, including rule name, monitored object, monitoring task, alarm condition, and alarm notification. |
Alert Information | Navigate to the alarm information list page generated after rule trigger to view alert information details for each alarm rule trigger. |
Delete | Delete this alarm rule. |


Information | Description |
Alarm time | Display the alarm information generation time. |
Alarm Entity | Display the task instance name and instance ID that triggered the alarm. Click the instance name to navigate to the corresponding instance management page. ![]() |
Alarm Cause | Display the triggering cause of the current alarm. |
Alarm Severity | Display the alarm level of alarm information: ordinary, important, critical. |
Rule Name | Display the alarm rule that triggered this item of alarm information. Click the rule name to navigate to the corresponding alarm rule management page. ![]() |
Alert Method | Display the notification channel for alarm information. |
Alarm Recipient | Display the alarm message recipient. |


Information | Description |
Alarm object | Display the task instance name and instance ID that triggered the alarm. Task name: Display the computing task name that triggered the alarm. Click the task name to navigate to the instance location webpage. Instance ID: Display the instance ID that triggered the alarm. Click to view logs to navigate to the log information page of the corresponding instance. |
Alarm Cause | Display the triggering cause of the current alarm based on the configured alarm rules trigger condition. For example, if the alarm condition is set to execution timeout > estimated completion time, the alarm cause displayed after the rule is triggered will be expected completion time timeout. |
Send Status | Display the send time, recipient, and notification channel of the current alarm. From the channel status, you can see whether it was sent successfully for each channel. Send time: displays the time when alarm messages are sent to recipients after rule trigger. Recipient: displays the alarm message recipient. Notification channel: Different icons are used to display the message sending status for each channel. |
Operation | Description(Optional) |
Alarm time | Filter conditions: choose today, yesterday, last 7 days, last 30 days, all, or custom date range. |
Alarm Cause | You can filter alerts for failed, successful, timed out, or fluctuating number of instances. |
Task name/ID | Filter by task name/ID. Separate multiple task names/IDs with |. |
Rule name/ID | Filter by rule name/ID. Separate multiple rule names/IDs with |. |
Feedback