tencent cloud

Feedback

Configuring Alarm Policies

Last updated: 2024-01-20 17:59:36

    Overview

    This document describes how to configure an alarm policy based on logs so that alarms can be sent when certain conditions are met, such as when there are too many error logs or the API response time is too long.

    Prerequisites

    You have uploaded the log to a log topic and configured the index.
    The log topic is not in STANDARD_IA storage, which doesn't support alarm policy configuration. An alarm policy requires SQL statements. We recommend that you structure logs as instructed in Collection Overview.
    You have logged in to the CLS console and entered the Alarm Policy page.

    Directions

    On the Alarm Policy page, click Create and configure the following items.

    Configuring the monitoring object and monitoring task

    Monitoring Object: Select the target log topic(s). It can be determined whether the trigger conditions are met separately for each log topic. You can select up to 20 log topics in the same region. If multiple log topics meet the trigger conditions at the same time, multiple alarms will be generated at a time.
    Monitoring Task
    Query Statement: It is used for log topics and needs to contain the analysis statement (i.e., SQL statement as described in Overview and Syntax Rules).
    Example 1: To count logs with errors, use status:error | select count(*) as ErrCount.
    Example 2: To calculate the average response time of the domain name "domain:aaa.com", enter domain:"aaa.com" | select avg(request_time) as Latency.
    Query Time Range: It indicates the time range of data for query by the query statement, which can be up to the last 24 hours.
    Trigger Condition: An alarm is triggered when the trigger condition is met. In the condition expression, $N.keyname is used to reference the query statement result. Here, $N indicates the Nth query statement in the current alarm policy, and keyname indicates the corresponding field name. For more information on the expression syntax, see Trigger Condition Expression.
    Example 1: To trigger an alarm when the number of logs with errors exceeds 10, enter $1.ErrCount > 10. Here, $1 indicates the first query statement, and ErrCount indicates the ErrCount field in the result.
    Example 2: To trigger an alarm when the domain name "domain:aaa.com" takes more than 5 seconds on average to respond, enter $2.Latency > 5. Here, $2 indicates the first query statement, and Latency indicates the Latency field in the result.
    Trigger by Group: It specifies whether the trigger condition expression should trigger alarms by group. When it is enabled, if multiple results of the query statement meet the trigger condition, the results will be grouped based on the group field, and an alarm will be triggered for each group. For example, if the query statement 2 is * | select avg(request_time) as Latency,domain group by domain order by Latency desc limit 5, and multiple results are returned:
    Latency
    Domain
    12.56
    aaa.com
    9.45
    bbb.com
    7.23
    ccc.com
    5.21
    ddd.com
    4.78
    eee.com
    If the trigger condition is `$2.Latency > 5`, then it is met by four results.
    
    If triggering by group is not enabled, only one alarm will be triggered when the trigger condition is met by one of the above execution results.
    
    If it is enabled and the results are grouped by the `domain` field, four alarms will be triggered separately for the above execution results.
    Note:
    When triggering by group is enabled, the trigger condition may be met by multiple results, and a large number of alarms will be triggered, leading to an alarm storm. Therefore, configure the group field and trigger condition appropriately.
    When specifying the group field, you can divide execution results into up to 1,000 groups. No alarms will be triggered for excessive groups.>
    Execution Cycle: It indicates the execution frequency of the monitoring task, which can be configured in the following two ways:
    Period Configuration Method
    Description
    Example
    Fixed frequency
    Monitoring tasks are performed at fixed intervalsInterval: 1–1,440 minutes. Granularity: Minute
    Monitoring tasks are performed once every 5 minutes
    Fixed time
    Monitoring tasks are performed once at fixed points in timeTime point range: 00:00–23:59. Granularity: Minute
    Monitoring tasks are performed once at 02:00 every day

    Configuring multi-dimensional analysis

    When an alarm is triggered, raw logs can be further analyzed through multi-dimensional analysis, and the analysis result can be added to the alarm notification to facilitate root cause discovery. The multi-dimensional analysis doesn't affect the alarm trigger condition.
    Multi-dimensional Analysis Type
    Description
    Related raw logs
    Get the raw logs that meet the search condition of the query statement. The log field, quantity, and display form can be configured.
    For example, when an alarm is triggered by too many error logs, you can view the detailed logs in the alarm.
    Top 5 field values by occurrence and their percentages
    For all the logs within the time range when the alarm is triggered, group them based on the specified field and get the top 5 field values and their percentages.
    For example, when an alarm is triggered by too many error logs, you can get the top 5 URLs and top 5 response status codes.
    Custom search and analysis
    Execute the custom search and analysis statement for all the logs within the time range when the alarm is triggered.
    Example 1: `*
    Note:
    The "related raw logs" and "top 5 field values by occurrence and their percentages" options support the automatic association with the search condition of the specified query statement (excluding the analysis statement, i.e., SQL filter condition), so as to indicate to perform multi-dimensional analysis on raw logs that meet what conditions.

    Configuring an alarm notification

    Alarm Frequency:
    Duration: A notification will be sent only after the trigger condition is met constantly a certain number of times (which can be 1–10 and is 1 by default).
    Interval: No notifications will be sent within the specified interval after the last notification. For example, the an alarm will be triggered every 15 minutes option indicates that only one alarm will be sent within 15 minutes.
    Notification Group: The notification channels and objects can be set by associating a notification channel group. Notifications can be sent by SMS, email, phone call, Weixin, WeCom, and custom callback API (webhook). For more information, please see Managing Notification Groups.
    Notification Content: By adding preset variables to the notification content, you can add specified information to the alarm notification. For more information on variables, see Alarm Notification Variable.
    Custom Webhook Configuration: If the selected notification group contains a custom webhook, the custom webhook input box will be displayed. You can customize the request header and request body there, which will be used by CLS to call the specified API when an alarm is triggered. In the request header and body, you can use notification content variables to send relevant data to the specified API.

    Best Practices

    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support