TDMQ for CKafka provides a comprehensive observability system, including monitoring alarms, event records, and one-click diagnosis, to help customers quickly detect issues, troubleshoot, and resolve problems, ensuring stable business operations.
Monitoring alarm
Monitoring Capability
CKafka, based on Tencent Cloud Observability Platform (TCOP), provides monitoring capabilities for Cloud services. It can monitor resources created under your account in real time, such as instances, Topics, and Consumer Groups. Through these monitoring metrics, you can understand cluster resource usage, number of connections, and message backlog, helping you better assess cluster capacity levels and detect risks in advance.
Based on the purchased instance version, the supported monitoring ability range of CKafka is as follows:
|
Basic Monitoring | full series | With basic monitoring, you can view monitoring metrics for instances, topics, and Consumer groups in three dimensions. | Cluster-level metric observation for assisting in abnormal issue detection, cluster capacity planning, and basic ops requirements. |
Advanced Monitoring | Professional Edition | With Advanced Monitoring, you can view node-level monitoring metrics of an instance, such as core services, production, consumption, instance resources, and Broker GC. | Node-level metric observation for problem localization, stream analysis, duration analysis, and business troubleshooting requirements. |
Dashboard | Professional Edition | Through Dashboard, you can view all TCP connections on the Broker, unsynced replica details and node distribution of Topics, as well as key metrics such as Topic traffic, disk usage, and Consumer Group consumption speed in the Top Ranking data. | Key metrics Top Ranking for production consumption hot spot analysis, disk usage analysis, and business optimization analysis scenario requirements. |
Prometheus Monitoring | Professional Edition | Provides open-source standard-based Prometheus exporter access methods, including instance-level metrics and node-level metrics, a series of open-source Kafka monitorable metrics. | Provides open-source compatible monitoring integrated solutions, supporting integration and connection with user self-owned ops platforms. |
Alarm Capabilities
TDMQ for CKafka provides alarm capabilities for cloud services based on the Tencent Cloud observability platform. You can configure alarm rules for monitoring metrics on the observability platform. When a monitoring metric reaches the set alarm threshold, you will be notified via email, SMS, WeChat, or call. You can take appropriate preventive or remedial measures in a timely manner. Properly configuring alarm rules can help improve the robustness and reliability of your application.
Event Record
The event center capability of TDMQ for CKafka supports centralized management, storage, analysis, and visualized display of various operational events, diagnostic events, and Broker change events that occur while the instance is running, making it easy for later inquiry, audit, and traceback. It also supports event alarms. You can configure alarm rules for key events (such as node decommissioning and disk scale-out failures) on the Tencent Cloud observability platform, enabling O&M personnel to promptly deal with them.
One-Click Diagnosis
TDMQ CKafka Professional Edition supports the one-click diagnosis function. The feature enables active troubleshooting of cluster risks and potential risks, provides Problem Resolution based on Tencent Cloud Expert experience, and automatically summarizes health check results to generate a diagnostic report. The one-click diagnostic capability can extract key information for users, locate issues, and offer professional resolution recommendations, achieving closed-loop operation and maintenance experience.