tencent cloud

TDSQL-C for MySQL

Release Notes and Announcements
Release Notes
Product Announcements
Beginner's Guide
Product Introduction
Overview
Strengths
Use Cases
Architecture
Product Specifications
Instance Types
Product Feature List
Database Versions
Regions and AZs
Common Concepts
Use Limits
Suggestions on Usage Specifications
Kernel Features
Kernel Overview
Kernel Version Release Notes
Optimized Kernel Version
Functionality Features
Performance Features
Security Features
Stability Feature
Analysis Engine Features
Inspection and Repair of Kernel Issues
Purchase Guide
Billing Overview
Product Pricing
Creating Cluster
Specification Adjustment Description
Renewal
Payment Overdue
Refund
Change from Pay-as-You-Go to Yearly/Monthly Subscription
Change from Pay-as-You-Go to Serverless Billing
Value-Added Services Billing Overview
Viewing Billing Statements
Getting Started
Database Audit
Overview
Viewing Audit Instance List
Enabling Audit Service
Viewing Audit Logs
Log Shipping
Post-Event Alarm Configuration
Modifying Audit Rule
Modifying Audit Service
Disabling Audit Service
Audit Rule Template
Viewing Audit Task
Authorizing Sub-User to Use Database Audit
Serverless Service
Serverless Introduction
Creating and Managing a Serverless Cluster
Elastic Scaling Management Tool
Serverless Resource Pack
Multi-AZ Deployment
Configuration Change
FAQs
Serverless Cost Estimator
Operation Guide
Operation Overview
Switching Cluster Page View in Console
Database Connection
Instance Management
Configuration Adjustment
Instance Mode Management
Cluster Management
Scaling Instance
Database Proxy
Account Management
Database Management
Database Management Tool
Parameter Configuration
Multi-AZ Deployment
GD
Backup and Restoration
Operation Log
Data Migration
Parallel Query
Columnar Storage Index (CSI)
Analysis Engine
Database Security and Encryption
Monitoring and Alarms
Basic SQL Operations
Connecting to TDSQL-C for MySQL Through SCF
Tag
Practical Tutorial
Classified Protection Practice for Database Audit of TDSQL-C for MySQL
Upgrading Database Version from MySQL 5.7 to 8.0 Through DTS
Usage Instructions for TDSQL-C MySQL
New Version of Console
Implementing Multiple RO Groups with Multiple Database Proxy Connection Addresses
Strengths of Database Proxy
Selecting Billing Mode for Storage Space
Creating Remote Disaster Recovery by DTS
Creating VPC for Cluster
Data Rollback
Solution to High CPU Utilization
How to Authorize Sub-Users to View Monitoring Data
White Paper
Security White Paper
Performance White Paper
Troubleshooting
Connection Issues
Performance Issues
API Documentation
History
Introduction
API Category
Making API Requests
Instance APIs
Multi-Availability Zone APIs
Other APIs
Audit APIs
Database Proxy APIs
Backup and Recovery APIs
Parameter Management APIs
Billing APIs
serverless APIs
Resource Package APIs
Account APIs
Performance Analysis APIs
Data Types
Error Codes
FAQs
Basic Concepts
Purchase and Billing
Compatibility and Format
Connection and Network
Features
Console Operations
Database and Table
Performance and Log
Database Audit
Between TDSQL-C for MySQL and TencentDB for MySQL
Service Agreement
Service Level Agreement
Terms of Service
TDSQL-C Policy
Privacy Policy
Data Privacy and Security Agreement
General References
Standards and Certifications
Glossary
Contact Us

Monitoring Alarm Best Practices

PDF
フォーカスモード
フォントサイズ
最終更新日: 2025-11-26 11:17:44
The health and performance of a database directly impact application stability and user experience. A comprehensive monitoring system not only enables real-time detection and resolution of potential issues but also provides early warning of risks, providing solid data support for system optimization and resource planning. This document describes the construction method of a read-only analysis engine monitoring system and explains key focus metrics such as performance, capacity, and synchronization.

Common Performance Evaluation Metrics

Metric 1: Analysis Engine Average Response Time

Average response time is a core metric for measuring engine performance, reflecting the average execution time of all SQL queries during the monitoring period. If this metric shows abnormal fluctuation, it usually originates from the following scenarios:
High-consumption SQL queries are added, prolonging the overall execution time.
The business traffic growth and the increase in queries per second (QPS) lead to increased processing latency.
The database system itself encounters an exception.
Monitoring recommendations:
The alarm threshold can be set based on the highest execution latency during stable business operations (static threshold). You can also select a dynamic threshold based on actual latency requirements (a medium sensitivity is recommended, with two or more abnormal data points).



Metric 2: Analysis Engine QPS

QPS directly reflects the pressure scale of business requests and is a key metric to assess the instance processing capacity.
Monitoring recommendations:
Assess in advance the QPS capacity of the corresponding instance specifications, and use this as a benchmark to set alarms.
Analyze comprehensively with average response time: If the QPS increases but the response time remains stable, it indicates that the current load is controllable; if both increase simultaneously, scale-out or optimization may be required.

Metric 3: Analysis Engine CPU Utilization

The analysis engine usually adopts a multi-thread parallel execution mode, with naturally high CPU utilization. Therefore, it is not recommended to use this metric as a core performance evaluation metric.
Monitoring recommendations:
Node CPU utilization can be monitored in multi-node instance scenarios to observe whether load balancing is achieved on each node.
If the CPU utilization remains higher than 90% for a long time (multiple data points), it may indicate system pressure approaching the limit. Stay vigilant for slow query and response time degradation.
If an instance has no query load but high CPU utilization, it indicates high data synchronization pressure, with most resources used for data synchronization. Consider traffic throttling or scale-out.

Metric 4: Size of the Result Set Returned by the Analysis Engine

This metric reflects the data volume returned per query. An excessively large result set may cause the client to experience receive latency or even out-of-memory (OOM) issues.
Monitoring recommendations:
If the result set shows an abnormal increase, troubleshoot whether there is a lack of pagination mechanism or unoptimized query logic.

Metric 5: Analysis Engine Memory Utilization

The memory usage is mainly composed of Block Cache (configurable) and runtime memory (Runtime Mem).
Monitoring recommendations:
The Block Cache usage is usually stable. A sudden increase in Runtime Mem indicates an oversized SQL intermediate result set. Optimize the query or adjust the cache policy.

Capacity Evaluation Metrics

Metric 1: Analysis Engine Storage Utilization/Usage

Disk space is pre-allocated. When the utilization exceeds 90%, the protection mechanism will be triggered (data synchronization is disabled, and only read operations are allowed).
Monitoring recommendations:
Set the utilization alarm threshold at 80% and plan in advance for scale-out to avoid business interruption.

Synchronization Evaluation Metrics

Metric 1: Analysis Engine Data Latency

This metric is used to monitor data synchronization latency between row-based nodes and columnar nodes and is critical for ensuring real-time consistency of data.
Monitoring recommendations:
If a latency exception occurs, it is required to promptly troubleshoot the network, load, or synchronization linkage failures.

ヘルプとサポート

この記事はお役に立ちましたか?

フィードバック