tencent cloud

TencentCloud Managed Service for Prometheus

Product Introduction
Overview
Strengths
Use Cases
Concepts
Use Limits
Features
Service Regions
Purchase Guide
Billing Overview
Pay-as-You-Go (Postpaid)
Free Trial Introduction
Managed Collector Billing Introduction
Archive Storage Billing Introduction
Purchase Methods
Payment Overdue
Getting Started
Integration Guide
Scrape Configuration Description
Custom Monitoring
EMR Integration
Java Application Integration
Go Application Integration
Exporter Integration
Nacos Integration
Common Exporter
Health Check
Instructions for Installing Components in the TKE Cluster
Cloud Monitoring
Non-Tencent Cloud Host Monitoring
Read Cloud-Hosted Prometheus Instance Data via Remote Read
Agent Self-Service Access
Pushgateway Integration
Security Group Open Description
Operation Guide
Instance
TKE
Integration Center
Data Multi-Write
Recording Rule
Instance Diagnosis
Archive Storage
Alerting Rule
Tag
Access Control
Grafana
API Guide
TKE Metrics
Resource Usage and Billing Overview
Practical Tutorial
Migration from Self-Built Prometheus
Custom Integration with CVM
TKE Monitoring
Enabling Public Network Access for TKE Serverless Cluster
Connecting TMP to Local Grafana
Enabling Public Network Access for Prometheus Instances
Configuring a Public Network Address for a Prometheus Instance
Terraform
Terraform Overview
Managing Prometheus Instances Using Terraform
Managing the Integration Center of Prometheus Instances Using Terraform
Collecting Container Monitoring Data Using Terraform
Configuring Alarm Policies Using Terraform
FAQs
Basic Questions
Integration with TKE Cluster
Product Consulting
Use and Technology
Cloud Monitor FAQs
Service Level Agreement
TMP Policy
Privacy Policy
Data Processing And Security Agreement

Alerting Rule Description

PDF
Focus Mode
Font Size
Last updated: 2024-01-29 16:01:55
You can set alert conditions based on Prometheus expressions to monitor the service status in real time and receive prompt notifications when the service is exceptional.

Defining Alerting Rule

Defining an alerting rule in TMP is very similar to defining a recording rule. Below is a sample alerting rule:
groups:
- name: example
rules:
- alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
for: 10m
labels:
severity: page
annotations:
summary: High request latency
In an alerting rule file, you can define a set of relevant rules in the same group. In each group, you can define multiple alerting rules. A rule mainly consists of the following parts:
alert: alerting rule name.
expr: alert trigger condition based on a PromQL expression, which is used to calculate whether there is time series data meeting the condition.
for: assessment wait time, which is optional. It indicates how long a trigger condition can last before an alert is sent. New alerts generated during the wait time are in "Pending" status.
labels: custom labels, which are a set of specified labels to be added to alerts.
annotations: it is used to specify a set of additional information, such as text that describes alert details. It will be sent to Alertmanager as a parameter when an alert is generated.

Template

Generally, annotations in an alerting rule file uses summary to describe the summary of alerts and description to describe alert details. In addition, Alertmanager UI will also display the alert information based on the two label values. To make the alert information more readable, TMP allows you to convert label values in labels and annotations into a template.
You can use the $labels.<labelname> variable to access the value of the specified label on the current alert instance and use $value to get the sample value calculated through the current PromQL expression.
# To insert a firing element's label values:
{{ $labels.<labelname> }}
# To insert the numeric expression value of the firing element:
{{ $value }}
For example, you can use a template to optimize the readability of the content of summary and description:
groups:
- name: example
rules:

# Alert for any instance that is unreachable for >5 minutes.
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

# Alert for any instance that has a median request latency >1s.
- alert: APIHighRequestLatency
expr: api_http_request_latencies_second{quantile="0.5"} > 1
for: 10m
annotations:
summary: "High request latency on {{ $labels.instance }}"
description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"

Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback