Release Notes and Announcements

Release Notes

Product Introduction

Overview

Strengths

Basic Features

Basic Concepts

Use Cases

Use Limits

Purchase Guide

Tencent Cloud Product Monitoring

Application Performance Management

Mobile App Performance Monitoring

Real User Monitoring

Cloud Automated Testing

Prometheus Monitoring

Grafana

EventBridge

PTS

Quick Start

Monitoring Overview

Instance Group

Tencent Cloud Product Monitoring

Application Performance Management

Real User Monitoring

Cloud Automated Testing

Performance Testing Service

Prometheus Getting Started

Grafana

Dashboard Creation

EventBridge

Alarm Service

Cloud Product Monitoring

Tencent Cloud Service Metrics

Operation Guide

CVM Agents

Cloud Product Monitoring Integration with Grafana

Troubleshooting

Practical Tutorial

Application Performance Management

Product Introduction

Access Guide

Operation Guide

Practical Tutorial

Parameter Information

FAQs

Mobile App Performance Monitoring

Overview

Operation Guide

Access Guide

Practical Tutorial

Tencent Cloud Real User Monitoring

Product Introduction

Operation Guide

Connection Guide

FAQs

Cloud Automated Testing

Product Introduction

Operation Guide

FAQs

Performance Testing Service

Overview

Operation Guide

Practice Tutorial

JavaScript API List

FAQs

Prometheus Monitoring

Product Introduction

Access Guide

Operation Guide

Practical Tutorial

Terraform

FAQs

Grafana

Product Introduction

Operation Guide

Guide on Grafana Common Features

FAQs

Dashboard

Overview

Operation Guide

Alarm Management

Console Operation Guide

Troubleshooting

FAQs

EventBridge

Product Introduction

Operation Guide

Practical Tutorial

FAQs

Report Management

FAQs

General

Alarm Service

Concepts

Monitoring Charts

CVM Agents

Dynamic Alarm Threshold

CM Connection to Grafana

Documentation Guide

Related Agreements

Application Performance Management Service Level Agreement

APM Privacy Policy

APM Data Processing And Security Agreement

RUM Service Level Agreement

Mobile Performance Monitoring Service Level Agreement

Cloud Automated Testing Service Level Agreement

Prometheus Service Level Agreement

TCMG Service Level Agreements

PTS Service Level Agreement

PTS Use Limits

Cloud Monitor Service Level Agreement

API Documentation

History

Introduction

API Category

Making API Requests

Monitoring Data Query APIs

Alarm APIs

Legacy Alert APIs

Notification Template APIs

TMP APIs

Grafana Service APIs

Event Center APIs

TencentCloud Managed Service for Prometheus APIs

Monitoring APIs

Data Types

Error Codes

Glossary

Default Alarm Policy

PDF

포커스 모드

폰트 크기

마지막 업데이트 시간: 2024-01-27 17:35:59

﻿
Overview
Currently, the default alarm policy is only supported for CVM (basic monitoring), TencentDB for MongoDB (server monitoring), TencentDB for MySQL (server monitoring), TencentDB for Redis, TDSQL for MySQL, TDSQL for PostgreSQL, CKafka (instance monitoring), ES, DTS, EMR, and CLB.
When you successfully purchase a Tencent Cloud service that supports the default policy for the first time, Tencent Cloud Observability Platform will automatically create the default alarm policy for you. For more information on the metrics/events supported by the default policy or alarm rules, see the default policy description.
You can also manually create an alarm policy and set it as the default alarm policy. After the default policy is set, newly purchased instances will be automatically associated with the default policy without requiring manual addition.
﻿
﻿
Default Metric Description
Product Name
Alarm Type
Metric/Event Name
Alarm Rule
CVM
Metric alarm
CPU utilization
The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points
﻿
﻿
Memory utilization
The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points
﻿
﻿
Disk utilization
The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points
﻿
﻿
Public network bandwidth utilization
The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points
﻿
Event alarm
Read-only disk
-
TencentDB for MySQL (server monitoring)
Metric alarm
Disk utilization
The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
﻿
﻿
CPU utilization
The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
﻿
Event alarm
OOM
-
TencentDB for MongoDB
Metric alarm
Disk utilization
The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
﻿
﻿
Connection utilization
The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
TencentDB for Redis - CKV version/community version
Metric alarm
Capacity utilization
The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
TDSQL for MySQL
Event alarm
OOM
-
﻿
﻿
﻿
Instance read-only status (disk overrun)
TDSQL for PostgreSQL
Event alarm
Insufficient memory
-
﻿
﻿
﻿
OOM
CKafka - instance
Metric alarm
Disk utilization
The statistical period is 1 minute, the threshold is >85%, and the continuous monitoring duration is 5 monitoring data points
ES
Metric alarm
Average disk utilization
The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
﻿
﻿
Average CPU utilization
The statistical period is 1 minute, the threshold is >90%, and the continuous monitoring duration is 5 monitoring data points
﻿
﻿
Average JVM memory utilization
The statistical period is 1 minute, the threshold is >85%, and the continuous monitoring duration is 5 monitoring data points
﻿
﻿
Cluster health
The statistical period is 1 minute, the threshold is >=1, and the continuous monitoring duration is 5 monitoring data points
DTS
Event alarm
Data migration task interruption
-
﻿
﻿
Data sync task interruption
-
﻿
﻿
Data subscription task interruption
-
EMR (server monitoring - disk)
Metric alarm
Disk utilization (used_all)
The statistical period is 1 minute, the threshold is >80%, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
inode utilization
The statistical period is 1 minute, the threshold is >50%, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (server monitoring - CPU)
Metric alarm
CPU utilization (idle)
The statistical period is 1 minute, the threshold is <2%, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (server monitoring - memory)
Metric alarm
Memory utilization (used_percent)
The statistical period is 1 minute, the threshold is >95%, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (server monitoring - network)
Event alarm
Metadatabase ping failure
-
EMR (cluster monitoring)
Event alarm
Elastic scaling failure
-
EMR (HBase - overview)
Metric alarm
Number of cluster RSs (numDeadRegionServers)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
Number of cluster regions in RIT state (ritCountOverThreshold)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HBase - HMaster)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HBase - RegionServer)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
Number of regions (regionCount)
The statistical period is 1 minute, the threshold is >600, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
Number of requests in operation queue (compactionQueueLength)
The statistical period is 1 minute, the threshold is >500, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HDFS - NameNode)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
Number of missing blocks (NumberOfMissingBlocks)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
Event alarm
NameNode master/slave switch
-
EMR (HDFS - DataNode)
Metric alarm
Number of XCeivers (XceiverCount)
The statistical period is 1 minute, the threshold is >1,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HDFS - overview)
Metric alarm
Disk failure
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
Number of cluster DataNodes (NumDeadDataNodes)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
Number of cluster DataNodes (NumStaleDataNodes)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
HDFS storage space utilization (capacityusedrate)
The statistical period is 1 minute, the threshold is 90%, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Presto - Presto_Coordinator)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Presto - Presto_Worker)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Presto - overview)
Metric alarm
Number of nodes (Failed)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (ClickHouse - server)
Metric alarm
Number of largest active data blocks in partition
The statistical period is 1 minute, the threshold is >250, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Hive - HiveMetaStore)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
DaemonThreadCount
The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
ThreadCount
The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Hive - HiveServer2)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
DaemonThreadCount
The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
ThreadCount
The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (YARN - overview)
Metric alarm
Number of nodes (NumUnhealthyNMs)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
Number of nodes (NumLostNMs)
The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (YARN - NodeManager)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (YARN - ResourceManger)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
Event alarm
ResourceManager master/slave switch
-
EMR (ZooKeeper - ZooKeeper)
Metric alarm
GC time (FGCT)
The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
Number of Znodes (zk_znode_count)
The statistical period is 1 minute, the threshold is >100,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
﻿
﻿
Number of queuing requests (zk_outstanding_requests)
The statistical period is 1 minute, the threshold is >50, and an alarm will be triggered once every 5 consecutive times the conditions are met
CLB (public network CLB instance)
Metric alarm
Discarded connections
The statistical period is 1 minute, the threshold is >10, and an alarm will be triggered once every 3 consecutive times the conditions are met
﻿
﻿
Discarded inbound data packets
The statistical period is 1 minute, the threshold is >10, and an alarm will be triggered once every 3 consecutive times the conditions are met
﻿
﻿
Discarded inbound bandwidth
The statistical period is 1 minute, the threshold is >10 MB, and an alarm will be triggered once every 3 consecutive times the conditions are met
﻿
﻿
Discarded outbound bandwidth
The statistical period is 1 minute, the threshold is >10 MB, and an alarm will be triggered once every 3 consecutive times the conditions are met
﻿
﻿
Inbound bandwidth utilization
The statistical period is 1 minute, the threshold is >80%, and an alarm will be triggered once every 3 consecutive times the conditions are met
﻿
﻿
Outbound bandwidth utilization
The statistical period is 1 minute, the threshold is >80%, and an alarm will be triggered once every 3 consecutive times the conditions are met
﻿

도움말 및 지원

문제 해결에 도움이 되었나요?

더 자세한 내용은 문의하기 또는 티겟 제출 을 통해 문의할 수 있습니다.

피드백

Product Name	Alarm Type	Metric/Event Name	Alarm Rule
CVM	Metric alarm	CPU utilization	The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points
				Memory utilization	The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points
				Disk utilization	The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points
				Public network bandwidth utilization	The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points
		Event alarm	Read-only disk	-
TencentDB for MySQL (server monitoring)	Metric alarm	Disk utilization	The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
	Metric alarm			CPU utilization	The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
		Event alarm	OOM	-
TencentDB for MongoDB	Metric alarm	Disk utilization	The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
TencentDB for MongoDB	Metric alarm			Connection utilization	The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
TencentDB for Redis - CKV version/community version	Metric alarm	Capacity utilization	The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
TDSQL for MySQL	Event alarm	OOM	-
TDSQL for MySQL	Event alarm		-			Instance read-only status (disk overrun)
TDSQL for PostgreSQL	Event alarm	Insufficient memory	-
TDSQL for PostgreSQL	Event alarm		-			OOM
CKafka - instance	Metric alarm	Disk utilization	The statistical period is 1 minute, the threshold is >85%, and the continuous monitoring duration is 5 monitoring data points
ES	Metric alarm	Average disk utilization	The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points
				Average CPU utilization	The statistical period is 1 minute, the threshold is >90%, and the continuous monitoring duration is 5 monitoring data points
				Average JVM memory utilization	The statistical period is 1 minute, the threshold is >85%, and the continuous monitoring duration is 5 monitoring data points
				Cluster health	The statistical period is 1 minute, the threshold is >=1, and the continuous monitoring duration is 5 monitoring data points
DTS	Event alarm	Data migration task interruption	-
				Data sync task interruption	-
				Data subscription task interruption	-
EMR (server monitoring - disk)	Metric alarm	Disk utilization (used_all)	The statistical period is 1 minute, the threshold is >80%, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (server monitoring - disk)	Metric alarm			inode utilization	The statistical period is 1 minute, the threshold is >50%, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (server monitoring - CPU)	Metric alarm	CPU utilization (idle)	The statistical period is 1 minute, the threshold is <2%, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (server monitoring - memory)	Metric alarm	Memory utilization (used_percent)	The statistical period is 1 minute, the threshold is >95%, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (server monitoring - network)	Event alarm	Metadatabase ping failure	-
EMR (cluster monitoring)	Event alarm	Elastic scaling failure	-
EMR (HBase - overview)	Metric alarm	Number of cluster RSs (numDeadRegionServers)	The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HBase - overview)	Metric alarm			Number of cluster regions in RIT state (ritCountOverThreshold)	The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HBase - HMaster)	Metric alarm	GC time (FGCT)	The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HBase - RegionServer)	Metric alarm	GC time (FGCT)	The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
				Number of regions (regionCount)	The statistical period is 1 minute, the threshold is >600, and an alarm will be triggered once every 5 consecutive times the conditions are met
				Number of requests in operation queue (compactionQueueLength)	The statistical period is 1 minute, the threshold is >500, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HDFS - NameNode)	Metric alarm	GC time (FGCT)	The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
	Metric alarm			Number of missing blocks (NumberOfMissingBlocks)	The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
		Event alarm	NameNode master/slave switch	-
EMR (HDFS - DataNode)	Metric alarm	Number of XCeivers (XceiverCount)	The statistical period is 1 minute, the threshold is >1,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HDFS - DataNode)	Metric alarm			GC time (FGCT)	The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (HDFS - overview)	Metric alarm	Disk failure	The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
				Number of cluster DataNodes (NumDeadDataNodes)	The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
				Number of cluster DataNodes (NumStaleDataNodes)	The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
				HDFS storage space utilization (capacityusedrate)	The statistical period is 1 minute, the threshold is 90%, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Presto - Presto_Coordinator)	Metric alarm	GC time (FGCT)	The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Presto - Presto_Worker)	Metric alarm	GC time (FGCT)	The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Presto - overview)	Metric alarm	Number of nodes (Failed)	The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (ClickHouse - server)	Metric alarm	Number of largest active data blocks in partition	The statistical period is 1 minute, the threshold is >250, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Hive - HiveMetaStore)	Metric alarm	GC time (FGCT)	The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
				DaemonThreadCount	The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
				ThreadCount	The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (Hive - HiveServer2)	Metric alarm	GC time (FGCT)	The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
				DaemonThreadCount	The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
				ThreadCount	The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (YARN - overview)	Metric alarm	Number of nodes (NumUnhealthyNMs)	The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (YARN - overview)	Metric alarm			Number of nodes (NumLostNMs)	The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (YARN - NodeManager)	Metric alarm	GC time (FGCT)	The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (YARN - ResourceManger)	Metric alarm	GC time (FGCT)	The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
EMR (YARN - ResourceManger)		Event alarm	ResourceManager master/slave switch	-
EMR (ZooKeeper - ZooKeeper)	Metric alarm	GC time (FGCT)	The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met
				Number of Znodes (zk_znode_count)	The statistical period is 1 minute, the threshold is >100,000, and an alarm will be triggered once every 5 consecutive times the conditions are met
				Number of queuing requests (zk_outstanding_requests)	The statistical period is 1 minute, the threshold is >50, and an alarm will be triggered once every 5 consecutive times the conditions are met
CLB (public network CLB instance)	Metric alarm	Discarded connections	The statistical period is 1 minute, the threshold is >10, and an alarm will be triggered once every 3 consecutive times the conditions are met
				Discarded inbound data packets	The statistical period is 1 minute, the threshold is >10, and an alarm will be triggered once every 3 consecutive times the conditions are met
				Discarded inbound bandwidth	The statistical period is 1 minute, the threshold is >10 MB, and an alarm will be triggered once every 3 consecutive times the conditions are met
				Discarded outbound bandwidth	The statistical period is 1 minute, the threshold is >10 MB, and an alarm will be triggered once every 3 consecutive times the conditions are met
				Inbound bandwidth utilization	The statistical period is 1 minute, the threshold is >80%, and an alarm will be triggered once every 3 consecutive times the conditions are met
				Outbound bandwidth utilization	The statistical period is 1 minute, the threshold is >80%, and an alarm will be triggered once every 3 consecutive times the conditions are met

tencent cloud

Tencent Cloud Observability Platform

Default Alarm Policy

Overview

Default Metric Description

도움말 및 지원