tencent cloud

Tencent Cloud Observability Platform

Release Notes and Announcements
Release Notes
Product Introduction
Overview
Strengths
Basic Features
Basic Concepts
Use Cases
Use Limits
Purchase Guide
Tencent Cloud Product Monitoring
Application Performance Management
Mobile App Performance Monitoring
Real User Monitoring
Cloud Automated Testing
Prometheus Monitoring
Grafana
EventBridge
PTS
Quick Start
Monitoring Overview
Instance Group
Tencent Cloud Product Monitoring
Application Performance Management
Real User Monitoring
Cloud Automated Testing
Performance Testing Service
Prometheus Getting Started
Grafana
Dashboard Creation
EventBridge
Alarm Service
Cloud Product Monitoring
Tencent Cloud Service Metrics
Operation Guide
CVM Agents
Cloud Product Monitoring Integration with Grafana
Troubleshooting
Practical Tutorial
Application Performance Management
Product Introduction
Access Guide
Operation Guide
Practical Tutorial
Parameter Information
FAQs
Mobile App Performance Monitoring
Overview
Operation Guide
Access Guide
Practical Tutorial
Tencent Cloud Real User Monitoring
Product Introduction
Operation Guide
Connection Guide
FAQs
Cloud Automated Testing
Product Introduction
Operation Guide
FAQs
Performance Testing Service
Overview
Operation Guide
Practice Tutorial
JavaScript API List
FAQs
Prometheus Monitoring
Product Introduction
Access Guide
Operation Guide
Practical Tutorial
Terraform
FAQs
Grafana
Product Introduction
Operation Guide
Guide on Grafana Common Features
FAQs
Dashboard
Overview
Operation Guide
Alarm Management
Console Operation Guide
Troubleshooting
FAQs
EventBridge
Product Introduction
Operation Guide
Practical Tutorial
FAQs
Report Management
FAQs
General
Alarm Service
Concepts
Monitoring Charts
CVM Agents
Dynamic Alarm Threshold
CM Connection to Grafana
Documentation Guide
Related Agreements
Application Performance Management Service Level Agreement
APM Privacy Policy
APM Data Processing And Security Agreement
RUM Service Level Agreement
Mobile Performance Monitoring Service Level Agreement
Cloud Automated Testing Service Level Agreement
Prometheus Service Level Agreement
TCMG Service Level Agreements
PTS Service Level Agreement
PTS Use Limits
Cloud Monitor Service Level Agreement
API Documentation
History
Introduction
API Category
Making API Requests
Monitoring Data Query APIs
Alarm APIs
Legacy Alert APIs
Notification Template APIs
TMP APIs
Grafana Service APIs
Event Center APIs
TencentCloud Managed Service for Prometheus APIs
Monitoring APIs
Data Types
Error Codes
Glossary

Alarm Suppression

PDF
포커스 모드
폰트 크기
마지막 업데이트 시간: 2024-08-22 16:20:00

Foreword

To avoid additional Ops workload caused by hundreds of similar alarms due to the same issue, we have introduced the alarm suppression feature. Alarm suppression means that if an alarm of a certain type is triggered, other related similar alarms will be suppressed. For example, if the alarm content is that a certain cluster is inaccessible, you can configure Inhibition rules to silence all other alarms related to that cluster.

Directions

1. Log in to TMP Console.
2. In the Prometheus instance list, click Instance ID/Name.
3. Enter the Prometheus Management Center, and click Alarm Management > Inhibit Rules > Create in the top navigation bar.



4. After navigating to the Create page, configure the suppression rules as prompted by the page, then click Save.




Parameter Description

Parameter
Description
Source Matcher
Triggered alarm. Select Label name, Condition, and Label value.
Target Matcher
Alarms to be silenced. Select Label name, Condition, and Label value.
Equal
The target and source alarm must have the same label value for the label name in the matching criteria. Select Label name.
Note:
Inhibition rules configuration: When there is an alarm (source) that meets a certain rule, the suppression rule will silence another alarm (target) that meets a different rule. The target and source alarm must have the same label value for the label name in the matching criteria.
To prevent self-suppression alarms, alarms that match both the target and source rules cannot be suppressed by other alarms that also match both target and source rules (including themselves). Therefore, it is recommended to design the source and target rules of alarms in such a way that no alarm matches both the source and target rules simultaneously.

Example

Use Cases: Alarm on High Server CPU Load

Scenario Description:

In a monitoring system, two alarms are configured:
Alarm A: CPU load exceeds 90%.
Alarm B: System response time exceeds 500 ms.
Both alarms are triggered by the same cause: high CPU load on the server, leading to degraded system performance. The policy rules for Alarm A are as follows: alert: HighCPUUsage expr: avg(rate(cpu_usage_seconds_total[5m])) by (instance) > 0.9 The policy rules for Alarm B are as follows: alert: HighResponseTime expr: avg(response_time_seconds) by (instance) > 0.5 The Inhibition rule configuration is as follows:
Source: alert=HighCPUUsage
Target: alert=HighResponseTime
Matching criteria: instance

Overall Effect:

The average rate of the cpu_usage_seconds_total metric over 5 minutes is 95%. If the metric's label instance=instanceX, Alarm A will be triggered, and an alarm notification will be sent.
The average value of the response_time_seconds metric is 0.8s. If the metric's label instance=instanceX, Alarm B will be triggered, but no alarm notification will be sent because the Inhibition rule is matched.


도움말 및 지원

문제 해결에 도움이 되었나요?

피드백