Scenarios
TDMQ for RabbitMQ provides multiple monitoring metrics for the running resources to monitor the running status of clusters. It also offers the alarm configuration feature for key metrics. You can configure alarm rules for monitoring metrics. Based on the created alarm rules, the system compares the monitoring metrics against the specified thresholds in a certain period. If a monitoring metric reaches the preset alarm threshold, Tencent Cloud Observability Platform (TCOP) will notify you through emails, Short Message Service (SMS), WeChat, or phone calls. This allows you to take preventive or remedial actions promptly. Proper configuration of alarm rules can help you enhance application robustness and reliability.
Alarm Configuration Recommendations
This section introduces key metrics that require special attention when you use TDMQ for RabbitMQ. It also provides alarm configuration recommendations. You can configure alarm rules based on your business requirements.
Metric | Dimension | Recommended Alarm Configuration | Alarm Handling Suggestion |
Cluster Consumption TPS | Cluster | Set the statistics cycle to 1 minute. If the cluster consumption TPS exceeds 80% of the consumption traffic throttling value for 5 consecutive data points, trigger an alarm every 30 minutes. | Check whether the cluster consumption TPS is about to exceed the upper limit of the currently purchased/configured specification. If the upper limit is exceeded, traffic throttling is triggered. Based on this, you can appropriately choose to upgrade the specification. |
Cluster Production TPS | Cluster | Set the statistics cycle to 1 minute. If the cluster production TPS exceeds 80% of the production traffic throttling value for 5 consecutive data points, trigger an alarm every 30 minutes. | Check whether the cluster production TPS is about to exceed the upper limit of the currently purchased/configured specification. If the upper limit is exceeded, traffic throttling is triggered. Based on this, you can appropriately choose to upgrade the specification. |
Number of Online Connections | Cluster | Set the statistics cycle to 1 minute. If the number of online connections exceeds 8,000 for 5 consecutive data points, trigger an alarm every 30 minutes. | The number of connections between clients and servers can reflect the stability and performance of clusters. When the number of connections is too large, it is recommended to optimize the number of clients. |
Number of Backlogged Messages | Cluster | Set the statistical granularity to 1 minute. If the value of Message Backlog Count exceeds the expected number of backlogged messages for 5 consecutive data points, trigger an alarm every 30 minutes. | The number of backlogged messages exceeding the expected value may indicate potential business risks. When the number of backlogged messages becomes excessive, it is recommended to scale out consumer instances, increase consumption concurrency, and check for failed instances or delayed message consumption. |
|
node_rabbitmq_disk_usage(%) | Node | Set the statistical granularity to 1 minute. If disk utilization exceeds 80% for 5 consecutive data points, trigger an alarm every 30 minutes. | If disk utilization is too high, nodes may lack sufficient disk space to handle assigned messages, causing message persistence to fail. It is recommended to clean up data or scale out the cluster promptly when average disk utilization exceeds 80%. |
node_rabbitmq_mem_usage (%) | Node | Set the statistical granularity to 1 minute. If memory utilization exceeds 50% for 5 consecutive data points, trigger an alarm every 30 minutes. | High memory utilization can block message production. When memory utilization exceeds 50%, speed up consumption, apply traffic control to production, or scale out clusters. |
node_rabbitmq_cpu_usage (%) | Node | Set the statistical granularity to 1 minute. If CPU utilization exceeds 70% for 5 consecutive data points, trigger an alarm every 30 minutes. | High CPU utilization can affect the message production speed. When CPU utilization exceeds 70%, scale out clusters promptly. |
Number of Backlogged Messages | Node | Set the statistical granularity to 1 minute. If the value of Message Backlog Count exceeds the expected number of backlogged messages for 5 consecutive data points, trigger an alarm every 30 minutes. | Excessive backlogged messages cause rapid disk utilization increases on broker nodes, preventing them from accepting more messages. Scaling out the cluster is required. |
Node Liveness Status | Node | Set the statistical granularity to 1 minute. If the value of Node Liveness Status is 1 for 3 consecutive data points, trigger an alarm every 15 minutes. | Node liveness exceptions or downtime can cause message loss, especially when persistence or mirrored queue features are disabled. This also increases the load on remaining nodes, potentially degrading overall cluster performance. It is recommended to check the cause in combination with other metrics and alarm information. |
Number of Consumers | Queue | Set the statistical granularity to 5 minutes. If the number of consumers decreases by 10% or more compared to the previous five minutes for 1 consecutive data point, trigger an alarm every 30 minutes. | A sudden drop in the number of consumers indicates potential disconnection and reconnection failures in the business. It is recommended to check for failed consumer instances. |
Number of Backlogged Messages | Queue | Set the statistical granularity to 1 minute. If the value of Message Backlog Count exceeds the expected number of backlogged messages for 5 consecutive data points, trigger an alarm every 30 minutes. | The number of backlogged messages exceeding the expected value may indicate potential business risks. When the number of backlogged messages becomes excessive, it is recommended to scale out consumer instances, increase consumption concurrency, and check for failed instances or delayed message consumption. |
Going to the Alarm Configuration Page
Entry 1: Log in to the TDMQ for RabbitMQ console. In the Cluster List, choose More > Configure Alarm in the Operation column of the target cluster. This will redirect you to the alarm configuration page, where the alarm object is set to the current cluster by default. Entry 2: Log in to the TDMQ for RabbitMQ console. On the Monitoring Dashboard page, click Configure Alarms in the upper-right corner of the monitoring chart. This will redirect you to the alarm configuration page, where the alarm object is set to the monitoring metric displayed in the monitoring chart before redirection. Entry 3: Log in to the Tencent Cloud Observability Platform (TCOP) console, go to the Alarm Configuration page to view all preset and custom policies. Click Create Policy, set the monitoring type to Cloud Product Monitoring, set the policy type to TDMQ. For Managed Edition, select RabbitMQ_PRO; for Serverless Edition, select RabbitMQ Serverless. Configuring Alarm Policies
TDMQ for RabbitMQ Managed Edition has preconfigured comprehensive default alarm policies in the node dimension. You can view and configure these policies in the TCOP console, and flexibly adjust them based on actual needs. Currently, Serverless Edition does not support default alarm policies. Stay tuned for updates.
1. After going to the alarm policy configuration page, set the policy type to TDMQ/RabbitMQ Dedicated Edition/Node.
2. Select the object for which you want to configure the alarm.
Tag: You can batch filter resources by tag to configure alarm policies for them.
Alarm Object: Select the RabbitMQ resources for which you want to configure alarm policies.
3. For Trigger Condition, keep the default settings of Configure manually. Check Use Preset Trigger Conditions, and the system-preset alarm trigger conditions (as shown in the figure below) will appear. You can quickly configure alarm rules based on these default policies.
4. Click Next step: Configure Alarm Notification to configure alarm recipients.
1. After you select the policy type on the alarm policy configuration page, the policies available for the current resource type are automatically displayed. You can customize alarm policies based on default alarm policies and your actual business requirements.
Note:
The monitoring policies of Managed Edition clusters cover multiple dimensions: cluster, node, vhost, exchange, and queue.
Due to optimizations in the underlying technical architecture, legacy alarm metrics for Managed Edition will be deprecated. We recommend that you use new metrics to configure alarm rules. This operation will not affect your business operations. You can proceed without concern.
The monitoring policies of Serverless Edition clusters cover multiple dimensions: cluster, vhost, queue, and exchange.
2. Select the object for which you want to configure the alarm.
Tag: You can batch filter resources by tag to configure alarm policies for them.
Alarm Object: Select the RabbitMQ resources for which you want to configure alarm policies.
3. Set alarm trigger conditions. Select Template and Configurate manually are supported. By default, Manual Configuration is selected.
Metric: For example, disk utilization. If you set the statistical granularity to 1 minute, an alarm will be triggered when disk utilization exceeds the threshold for N consecutive data points within 1 minute.
Alarm Frequency: For example, "alarm triggered once every 30 minutes" indicates that an alarm is triggered once every 30 minutes if a metric exceeds the threshold in multiple consecutive statistical periods. Another alarm will be triggered only if the metric exceeds the threshold again in the next 30 minutes.
1. Check Select Template. Then, click New Trigger Condition Template to redirect to the trigger condition template setup page.
2. In the upper-left corner, click New Trigger Condition Template. On the new template page, configure the alarm policy.
Policy Type: Select TDMQ, then select RabbitMQ_PRO for Managed Edition, and select RabbitMQServerless for Serverless Edition.
Trigger Condition: Set the alarm policy according to the alarm policy recommendations or your actual business requirements.
3. After confirmation, click Save. Return to the Create Alarm Policy page and click Refresh. The alarm policy template that you configured will be displayed.
4. Click Next: Configure Alarm Notification to configure alarm recipients.
Configuring an Alarm Notification
On the alarm notification configuration page, you can select a system preset notification template. In normal cases, the alarm recipient of a preset template is the owner of the root account. If you need to notify the instance owner or other personnel, you can also click Add Notification Template to create a notification template and set alarm recipients and alarm receiving channels.