tencent cloud

Monitoring and Alerting Capabilities Overview
Last updated: 2025-09-19 15:59:28
Monitoring and Alerting Capabilities Overview
Last updated: 2025-09-19 15:59:28
TDMQ for CKafka provides a comprehensive observability system, including monitoring alarms, event records, and one-click diagnosis, to help customers quickly detect issues, troubleshoot, and resolve problems, ensuring stable business operations.

Monitoring alarm

Monitoring Capability

CKafka, based on Tencent Cloud Observability Platform (TCOP), provides monitoring capabilities for Cloud services. It can monitor resources created under your account in real time, such as instances, Topics, and Consumer Groups. Through these monitoring metrics, you can understand cluster resource usage, number of connections, and message backlog, helping you better assess cluster capacity levels and detect risks in advance.

Based on the purchased instance version, the supported monitoring ability range of CKafka is as follows:
Type
Applicable Version
Capability Description
Use Cases
Basic Monitoring
full series
With basic monitoring, you can view monitoring metrics for instances, topics, and Consumer groups in three dimensions.
Cluster-level metric observation for assisting in abnormal issue detection, cluster capacity planning, and basic ops requirements.
Advanced Monitoring
Professional Edition
With Advanced Monitoring, you can view node-level monitoring metrics of an instance, such as core services, production, consumption, instance resources, and Broker GC.
Node-level metric observation for problem localization, stream analysis, duration analysis, and business troubleshooting requirements.
Dashboard
Professional Edition
Through Dashboard, you can view all TCP connections on the Broker, unsynced replica details and node distribution of Topics, as well as key metrics such as Topic traffic, disk usage, and Consumer Group consumption speed in the Top Ranking data.
Key metrics Top Ranking for production consumption hot spot analysis, disk usage analysis, and business optimization analysis scenario requirements.
Prometheus Monitoring
Professional Edition
Provides open-source standard-based Prometheus exporter access methods, including instance-level metrics and node-level metrics, a series of open-source Kafka monitorable metrics.
Provides open-source compatible monitoring integrated solutions, supporting integration and connection with user self-owned ops platforms.

Alarm Capabilities

TDMQ for CKafka provides alarm capabilities for cloud services based on the Tencent Cloud observability platform. You can configure alarm rules for monitoring metrics on the observability platform. When a monitoring metric reaches the set alarm threshold, you will be notified via email, SMS, WeChat, or call. You can take appropriate preventive or remedial measures in a timely manner. Properly configuring alarm rules can help improve the robustness and reliability of your application.

Event Record

The event center capability of TDMQ for CKafka supports centralized management, storage, analysis, and visualized display of various operational events, diagnostic events, and Broker change events that occur while the instance is running, making it easy for later inquiry, audit, and traceback. It also supports event alarms. You can configure alarm rules for key events (such as node decommissioning and disk scale-out failures) on the Tencent Cloud observability platform, enabling O&M personnel to promptly deal with them.

One-Click Diagnosis

TDMQ CKafka Professional Edition supports the one-click diagnosis function. The feature enables active troubleshooting of cluster risks and potential risks, provides Problem Resolution based on Tencent Cloud Expert experience, and automatically summarizes health check results to generate a diagnostic report. The one-click diagnostic capability can extract key information for users, locate issues, and offer professional resolution recommendations, achieving closed-loop operation and maintenance experience.

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback