tencent cloud

Tencent Cloud Observability Platform

Release Notes and Announcements
Release Notes
Product Introduction
Overview
Strengths
Basic Features
Basic Concepts
Use Cases
Use Limits
Purchase Guide
Tencent Cloud Product Monitoring
Application Performance Management
Mobile App Performance Monitoring
Real User Monitoring
Cloud Automated Testing
Prometheus Monitoring
Grafana
EventBridge
PTS
Quick Start
Monitoring Overview
Instance Group
Tencent Cloud Product Monitoring
Application Performance Management
Real User Monitoring
Cloud Automated Testing
Performance Testing Service
Prometheus Getting Started
Grafana
Dashboard Creation
EventBridge
Alarm Service
Cloud Product Monitoring
Tencent Cloud Service Metrics
Operation Guide
CVM Agents
Cloud Product Monitoring Integration with Grafana
Troubleshooting
Practical Tutorial
Application Performance Management
Product Introduction
Access Guide
Operation Guide
Practical Tutorial
Parameter Information
FAQs
Mobile App Performance Monitoring
Overview
Operation Guide
Access Guide
Practical Tutorial
Tencent Cloud Real User Monitoring
Product Introduction
Operation Guide
Connection Guide
FAQs
Cloud Automated Testing
Product Introduction
Operation Guide
FAQs
Performance Testing Service
Overview
Operation Guide
Practice Tutorial
JavaScript API List
FAQs
Prometheus Monitoring
Product Introduction
Access Guide
Operation Guide
Practical Tutorial
Terraform
FAQs
Grafana
Product Introduction
Operation Guide
Guide on Grafana Common Features
FAQs
Dashboard
Overview
Operation Guide
Alarm Management
Console Operation Guide
Troubleshooting
FAQs
EventBridge
Product Introduction
Operation Guide
Practical Tutorial
FAQs
Report Management
FAQs
General
Alarm Service
Concepts
Monitoring Charts
CVM Agents
Dynamic Alarm Threshold
CM Connection to Grafana
Documentation Guide
Related Agreements
Application Performance Management Service Level Agreement
APM Privacy Policy
APM Data Processing And Security Agreement
RUM Service Level Agreement
Mobile Performance Monitoring Service Level Agreement
Cloud Automated Testing Service Level Agreement
Prometheus Service Level Agreement
TCMG Service Level Agreements
PTS Service Level Agreement
PTS Use Limits
Cloud Monitor Service Level Agreement
API Documentation
History
Introduction
API Category
Making API Requests
Monitoring Data Query APIs
Alarm APIs
Legacy Alert APIs
Notification Template APIs
TMP APIs
Grafana Service APIs
Event Center APIs
TencentCloud Managed Service for Prometheus APIs
Monitoring APIs
Data Types
Error Codes
Glossary

Alerting Rule Description

PDF
Focus Mode
Font Size
Last updated: 2024-08-07 22:05:17
You can set alert conditions based on Prometheus expressions to monitor the service status in real time and receive prompt notifications when the service is exceptional.

Defining Alerting Rule

Defining an alerting rule in TMP is very similar to defining a recording rule. Below is a sample alerting rule:
groups:
- name: example
rules:
- alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
for: 10m
labels:
severity: page
annotations:
summary: High request latency
In an alerting rule file, you can define a set of relevant rules in the same group. In each group, you can define multiple alerting rules. A rule mainly consists of the following parts:
alert: alerting rule name.
expr: alert trigger condition based on a PromQL expression, which is used to calculate whether there is time series data meeting the condition.
for: assessment wait time, which is optional. It indicates how long a trigger condition can last before an alert is sent. New alerts generated during the wait time are in "Pending" status.
labels: custom labels, which are a set of specified labels to be added to alerts.
annotations: it is used to specify a set of additional information, such as text that describes alert details. It will be sent to Alertmanager as a parameter when an alert is generated.

Template

Generally, annotations in an alerting rule file uses summary to describe the summary of alerts and description to describe alert details. In addition, Alertmanager UI will also display the alert information based on the two label values. To make the alert information more readable, TMP allows you to convert label values in labels and annotations into a template.
You can use the $labels.<labelname> variable to access the value of the specified label on the current alert instance and use $value to get the sample value calculated through the current PromQL expression.
# To insert a firing element's label values:
{{ $labels.<labelname> }}
# To insert the numeric expression value of the firing element:
{{ $value }}
For example, you can use a template to optimize the readability of the content of summary and description:
groups:
- name: example
rules:

# Alert for any instance that is unreachable for >5 minutes.
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

# Alert for any instance that has a median request latency >1s.
- alert: APIHighRequestLatency
expr: api_http_request_latencies_second{quantile="0.5"} > 1
for: 10m
annotations:
summary: "High request latency on {{ $labels.instance }}"
description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback