tencent cloud


Alerting Rule Description

Last updated: 2024-01-29 16:01:55
    You can set alert conditions based on Prometheus expressions to monitor the service status in real time and receive prompt notifications when the service is exceptional.

    Defining Alerting Rule

    Defining an alerting rule in TMP is very similar to defining a recording rule. Below is a sample alerting rule:
    - name: example
    - alert: HighRequestLatency
    expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
    for: 10m
    severity: page
    summary: High request latency
    In an alerting rule file, you can define a set of relevant rules in the same group. In each group, you can define multiple alerting rules. A rule mainly consists of the following parts:
    alert: alerting rule name.
    expr: alert trigger condition based on a PromQL expression, which is used to calculate whether there is time series data meeting the condition.
    for: assessment wait time, which is optional. It indicates how long a trigger condition can last before an alert is sent. New alerts generated during the wait time are in "Pending" status.
    labels: custom labels, which are a set of specified labels to be added to alerts.
    annotations: it is used to specify a set of additional information, such as text that describes alert details. It will be sent to Alertmanager as a parameter when an alert is generated.


    Generally, annotations in an alerting rule file uses summary to describe the summary of alerts and description to describe alert details. In addition, Alertmanager UI will also display the alert information based on the two label values. To make the alert information more readable, TMP allows you to convert label values in labels and annotations into a template.
    You can use the $labels.<labelname> variable to access the value of the specified label on the current alert instance and use $value to get the sample value calculated through the current PromQL expression.
    # To insert a firing element's label values:
    {{ $labels.<labelname> }}
    # To insert the numeric expression value of the firing element:
    {{ $value }}
    For example, you can use a template to optimize the readability of the content of summary and description:
    - name: example
    # Alert for any instance that is unreachable for >5 minutes.
    - alert: InstanceDown
    expr: up == 0
    for: 5m
    severity: page
    summary: "Instance {{ $labels.instance }} down"
    description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
    # Alert for any instance that has a median request latency >1s.
    - alert: APIHighRequestLatency
    expr: api_http_request_latencies_second{quantile="0.5"} > 1
    for: 10m
    summary: "High request latency on {{ $labels.instance }}"
    description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support