Alarm management refers to the mechanism that automatically detects whether various monitoring metrics data reach the abnormal threshold according to the preset alarm policy, and pushes alarm notifications to target personnel in the specified manner. Alarm management enables real-time monitoring of the system operating status and rapid response to abnormal situations, helping to improve Ops efficiency, reduce manual monitoring costs, and enabling timely detection and addressing of potential issues.
Function Entries
Alarm management provides two access levels: enterprise-level and workspace-level. Enterprise-level alarm management allows configuration of all alarm policies under the current enterprise, while workspace-level alarm management allows configuration of alarm policies within a specific workspace.
1. Enterprise-Level Alarm Management
1.2 Select Ops Management to go to the enterprise-level alarm management page and configure enterprise-level alarm policies.
Note:
Only space administrators are supported to access the Enterprise Management > Ops Management module.
2. Workspace-Level Alarm Management
2.1 After you go to the specific workspace, locate Ops Management in the left sidebar.
2.2 Click Alarm Management to go to the alarm management page for the current workspace.
Note:
1. Only users with corresponding feature permissions can view alarm policies to ensure data security.
2. Please go to "Platform-side user permissions" to configure the space-level alarm management feature permissions.
Creating an alarm policy
Click Create Alarm Policy to go to the configuration page. Complete the settings in the three sections: Basic Information, Trigger Conditions, and Alarm Notifications. After configuration, click Confirm to create the policy.
Basic Information Configuration
|
Alarm policy name | Custom policy name, used to distinguish different alarm rules. |
Alarm Severity | Indicates the severity level of this alarm. Supported levels are "Warning", "Severe", and "Critical". Default is "Warning". |
Effective Time | Indicates the effective time range of this policy. During this period, abnormal conditions will be monitored and alarms will be triggered. |
Trigger Condition Configuration
Trigger conditions are the core evaluation rules for alarms, supporting configuration of single or multiple conditions. Up to 10 conditions can be added, and they are in an "OR" relationship, meaning the alarm is triggered when any condition is met. For any alarm policy, the trigger conditions and frequency must be simultaneously satisfied for the alarm to be triggered.
|
Condition | The core metric being monitored requires setting "Monitoring Object", "Logical Condition", and "Threshold". When the monitoring object meets the threshold of the logical condition, the alarm policy is triggered. |
Frequency | Frequency can be configured with "Time Range", "Calculation Method", and "Count". The logical meaning is that a certain number of abnormal events are accumulated or consecutively triggered within a period of time. |
Condition List
In the condition configuration, the following conditions can be configured:
|
Daily Token Usage of a Large Language Model | Yes | Yes |
Knowledge Base Remaining Capacity | No | Yes |
TPM of a Large Model | Yes | No |
QPM of an application | Yes | No |
Call success rate of an application | Yes | No |
Alarm Notification
Alarm notification is the abnormal alert information that the platform pushes to Ops personnel through specified channels such as SMS and email when it detects that metrics reach the preset threshold of the alarm policy, achieving timely notification of abnormalities. Currently, it supports alarm notification methods such as email, WeCom, DingTalk, Lark, and Webhook.
Before configuring alarm notifications, users need to authorize the relevant products/services on Tencent Cloud.
Integrate
Users need to activate the Tencent Cloud SES feature, configure the sending domain, sending address, and sending template. For detailed procedures, see Email Configuration. |
Notification Templates | Specific email body content template for alarm notifications. |
Sender's email address | Sender's email address. |
Recipient's email address | The email addresses of notification targets. When multiple email addresses are added, separate them with commas or semicolons. |
Integrate WeCom
|
URL address | The Webhook information obtained by enabling WeCom group bot permissions must be filled in, starting with http:// or https://. |
Integrate DingTalk
|
URL address | The Webhook information obtained by enabling DingTalk group bot permissions must be filled in, starting with http:// or https://. |
Integrate Lark
Alarm information can be sent to Lark via group bots. For the procedure to obtain Lark group Webhooks, see Get the Webhook for Lark. |
URL address | The Webhook information obtained by enabling Feishu group bot permissions must be filled in, starting with http:// or https://. |
Connect Webhook
To set up a Webhook, the client must provide a unique URL to the server API and specify the events of interest. Users can freely customize the Webhook source.
|
URL address | The Webhook information must be filled in, starting with http:// or https://. |
The alarm service sends fixed-format HTTP POST requests to the Webhook URL you provide. The specific format is as follows:
Text type request format:
{
"msgtype": "text",
"text": {
"content": "Specific message content"
}
}
Rich text type request format:
{
"msgtype": "markdown",
"markdown": {
"content": "Specific message content"
}
}
|
msgtype | string | Yes | Supports text and markdown. |
text.content | string | No | Text format message content |
markdown.content | string | No | Rich text format message conten |