tencent cloud

Tencent Cloud Observability Platform

Release Notes and Announcements
Release Notes
Product Introduction
Overview
Strengths
Basic Features
Basic Concepts
Use Cases
Use Limits
Purchase Guide
Tencent Cloud Product Monitoring
Application Performance Management
Mobile App Performance Monitoring
Real User Monitoring
Cloud Automated Testing
Prometheus Monitoring
Grafana
EventBridge
PTS
Quick Start
Monitoring Overview
Instance Group
Tencent Cloud Product Monitoring
Application Performance Management
Real User Monitoring
Cloud Automated Testing
Performance Testing Service
Prometheus Getting Started
Grafana
Dashboard Creation
EventBridge
Alarm Service
Cloud Product Monitoring
Tencent Cloud Service Metrics
Operation Guide
CVM Agents
Cloud Product Monitoring Integration with Grafana
Troubleshooting
Practical Tutorial
Application Performance Management
Product Introduction
Access Guide
Operation Guide
Practical Tutorial
Parameter Information
FAQs
Mobile App Performance Monitoring
Overview
Operation Guide
Access Guide
Practical Tutorial
Tencent Cloud Real User Monitoring
Product Introduction
Operation Guide
Connection Guide
FAQs
Cloud Automated Testing
Product Introduction
Operation Guide
FAQs
Performance Testing Service
Overview
Operation Guide
Practice Tutorial
JavaScript API List
FAQs
Prometheus Monitoring
Product Introduction
Access Guide
Operation Guide
Practical Tutorial
Terraform
FAQs
Grafana
Product Introduction
Operation Guide
Guide on Grafana Common Features
FAQs
Dashboard
Overview
Operation Guide
Alarm Management
Console Operation Guide
Troubleshooting
FAQs
EventBridge
Product Introduction
Operation Guide
Practical Tutorial
FAQs
Report Management
FAQs
General
Alarm Service
Concepts
Monitoring Charts
CVM Agents
Dynamic Alarm Threshold
CM Connection to Grafana
Documentation Guide
Related Agreements
Application Performance Management Service Level Agreement
APM Privacy Policy
APM Data Processing And Security Agreement
RUM Service Level Agreement
Mobile Performance Monitoring Service Level Agreement
Cloud Automated Testing Service Level Agreement
Prometheus Service Level Agreement
TCMG Service Level Agreements
PTS Service Level Agreement
PTS Use Limits
Cloud Monitor Service Level Agreement
API Documentation
History
Introduction
API Category
Making API Requests
Monitoring Data Query APIs
Alarm APIs
Legacy Alert APIs
Notification Template APIs
TMP APIs
Grafana Service APIs
Event Center APIs
TencentCloud Managed Service for Prometheus APIs
Monitoring APIs
Data Types
Error Codes
Glossary

Default Alarm Policy

PDF
포커스 모드
폰트 크기
마지막 업데이트 시간: 2024-01-27 17:35:59


Overview

Currently, the default alarm policy is only supported for CVM (basic monitoring), TencentDB for MongoDB (server monitoring), TencentDB for MySQL (server monitoring), TencentDB for Redis, TDSQL for MySQL, TDSQL for PostgreSQL, CKafka (instance monitoring), ES, DTS, EMR, and CLB.
When you successfully purchase a Tencent Cloud service that supports the default policy for the first time, Tencent Cloud Observability Platform will automatically create the default alarm policy for you. For more information on the metrics/events supported by the default policy or alarm rules, see the default policy description.
You can also manually create an alarm policy and set it as the default alarm policy. After the default policy is set, newly purchased instances will be automatically associated with the default policy without requiring manual addition.



Default Metric Description

Product Name
Alarm Type
Metric/Event Name
Alarm Rule
CVM
Metric alarm
CPU utilization
The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points
Memory utilization
The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points
Disk utilization
The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points
Public network bandwidth utilization
The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points
Event alarm
Read-only disk
-
TencentDB for MySQL (server monitoring)
Metric alarm
Disk utilization
The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
CPU utilization
The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
Event alarm
OOM
-
TencentDB for MongoDB
Metric alarm
Disk utilization
The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
Connection utilization
The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
TencentDB for Redis - CKV version/community version
Metric alarm
Capacity utilization
The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
TDSQL for MySQL
Event alarm
OOM
-
Instance read-only status (disk overrun)
TDSQL for PostgreSQL
Event alarm
Insufficient memory
-
OOM
CKafka - instance
Metric alarm
Disk utilization
The statistical period is 1 minute, the threshold is >85%, and the continuous monitoring duration is 5 monitoring data points
ES
Metric alarm
Average disk utilization
The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
Average CPU utilization
The statistical period is 1 minute, the threshold is >90%, and the continuous monitoring duration is 5 monitoring data points
Average JVM memory utilization
The statistical period is 1 minute, the threshold is >85%, and the continuous monitoring duration is 5 monitoring data points
Cluster health
The statistical period is 1 minute, the threshold is >=1, and the continuous monitoring duration is 5 monitoring data points
DTS
Event alarm
Data migration task interruption
-
Data sync task interruption
-
Data subscription task interruption
-
EMR (server monitoring - disk)
Metric alarm
Disk utilization (used_all)
The statistical period is 1 minute, the threshold is >80%, and an alarm will be triggered once every 5 consecutive times the conditions are met
inode utilization
The statistical period is 1 minute, the threshold is >50%, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (server monitoring - CPU)
Metric alarm
CPU utilization (idle)
The statistical period is 1 minute, the threshold is <2%, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (server monitoring - memory)
Metric alarm
Memory utilization (used_percent)
The statistical period is 1 minute, the threshold is >95%, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (server monitoring - network)
Event alarm
Metadatabase ping failure
-
EMR (cluster monitoring)
Event alarm
Elastic scaling failure
-
EMR (HBase - overview)
Metric alarm
Number of cluster RSs (numDeadRegionServers)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
Number of cluster regions in RIT state (ritCountOverThreshold)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HBase - HMaster)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HBase - RegionServer)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
Number of regions (regionCount)
The statistical period is 1 minute, the threshold is >600, and an alarm will be triggered once every 5 consecutive times the conditions are met
Number of requests in operation queue (compactionQueueLength)
The statistical period is 1 minute, the threshold is >500, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HDFS - NameNode)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
Number of missing blocks (NumberOfMissingBlocks)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
Event alarm
NameNode master/slave switch
-
EMR (HDFS - DataNode)
Metric alarm
Number of XCeivers (XceiverCount)
The statistical period is 1 minute, the threshold is >1,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HDFS - overview)
Metric alarm
Disk failure
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
Number of cluster DataNodes (NumDeadDataNodes)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
Number of cluster DataNodes (NumStaleDataNodes)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
HDFS storage space utilization (capacityusedrate)
The statistical period is 1 minute, the threshold is 90%, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Presto - Presto_Coordinator)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Presto - Presto_Worker)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Presto - overview)
Metric alarm
Number of nodes (Failed)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (ClickHouse - server)
Metric alarm
Number of largest active data blocks in partition
The statistical period is 1 minute, the threshold is >250, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Hive - HiveMetaStore)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
DaemonThreadCount
The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
ThreadCount
The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Hive - HiveServer2)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
DaemonThreadCount
The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
ThreadCount
The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (YARN - overview)
Metric alarm
Number of nodes (NumUnhealthyNMs)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
Number of nodes (NumLostNMs)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (YARN - NodeManager)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (YARN - ResourceManger)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
Event alarm
ResourceManager master/slave switch
-
EMR (ZooKeeper - ZooKeeper)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
Number of Znodes (zk_znode_count)
The statistical period is 1 minute, the threshold is >100,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
Number of queuing requests (zk_outstanding_requests)
The statistical period is 1 minute, the threshold is >50, and an alarm will be triggered once every 5 consecutive times the conditions are met
CLB (public network CLB instance)
Metric alarm
Discarded connections
The statistical period is 1 minute, the threshold is >10, and an alarm will be triggered once every 3 consecutive times the conditions are met
Discarded inbound data packets
The statistical period is 1 minute, the threshold is >10, and an alarm will be triggered once every 3 consecutive times the conditions are met
Discarded inbound bandwidth
The statistical period is 1 minute, the threshold is >10 MB, and an alarm will be triggered once every 3 consecutive times the conditions are met
Discarded outbound bandwidth
The statistical period is 1 minute, the threshold is >10 MB, and an alarm will be triggered once every 3 consecutive times the conditions are met
Inbound bandwidth utilization
The statistical period is 1 minute, the threshold is >80%, and an alarm will be triggered once every 3 consecutive times the conditions are met
Outbound bandwidth utilization
The statistical period is 1 minute, the threshold is >80%, and an alarm will be triggered once every 3 consecutive times the conditions are met


도움말 및 지원

문제 해결에 도움이 되었나요?

피드백