Resource Dimension | Metric | Recommended Alarm Configuration | Description |
Cluster | rocketmq_namespace_consumer_lag_messages(Count) | Set the statistical period to 1 minute. If the number of backlogged messages in 1 minute is greater than 1000 for 3 consecutive data points, trigger an alarm. | Excessive backlogged messages cause a rapid disk utilization increase. As a result, no more messages can be received, and the service stops. Scale-out is required. |
| rocketmq_namespace_expense_pull_limit_tps(Count/s) | Set the statistical period to 1 minute. If the throttled consumption TPS in 1 minute is greater than 0 for 3 consecutive data points, trigger an alarm. | Determine whether the cluster TPS exceeds the purchased traffic upper limit. Based on this, consider operations such as specification upgrades accordingly. |
| rocketmq_namespace_expense_send_limit_tps(Count/s) | Set the statistical period to 1 minute. If the throttled consumption TPS in 1 minute is greater than 0 for 3 consecutive data points, trigger an alarm. | Determine whether the cluster TPS exceeds the purchased traffic upper limit. Based on this, consider operations such as specification upgrades accordingly. |
| rocketmq5_public_network_in_drop_bits(Bit/s) | Set the statistics cycle to 1 minute. If the public network discarded inbound bandwidth in 1 minute is greater than 0 bit/s for 3 consecutive data points, trigger an alarm. | When the inbound traffic exceeds the public network bandwidth upper limit of the cluster, the excess traffic will be discarded. This indicates that the public network bandwidth may not meet business requirements, and scale-out is required. |
| rocketmq5_public_network_out_drop_bits(Bit/s) | Set the statistics cycle to 1 minute. If the public network discarded inbound bandwidth in 1 minute is greater than 0 bit/s for 3 consecutive data points, trigger an alarm. | When the outbound traffic exceeds the public network bandwidth upper limit of the cluster, the excess traffic will be discarded. This indicates that the public network bandwidth may not meet business requirements, and scale-out is required. |
Topic | rocketmq_msg_backlog | Set the statistical period to 1 minute. If the number of backlogged messages in 1 minute is greater than 1000 for 3 consecutive data points, trigger an alarm. | Excessive backlogged messages cause a rapid disk utilization increase. As a result, no more messages can be received, and the service stops. Scale-out is required. |
Group | rocketmq_group_consumer_lag_messages | Set the statistical period to 1 minute. If the number of backlogged messages in 1 minute is greater than 1000 for 3 consecutive data points, trigger an alarm. | Excessive backlogged messages cause a rapid disk utilization increase. As a result, no more messages can be received, and the service stops. Scale-out is required. |
| rocketmq_topic_group_group_diff | Set the statistics cycle to 1 minute. If the consumption processing lag time in 1 minute is greater than 1s for 3 consecutive data points, trigger an alarm. | The consumption processing lag time reflects the timeliness of message consumption by consumer clients. Excessive lag time indicates that consumers cannot consume messages or experience performance bottlenecks. |
| Dead letter message TPS | Set the statistical period to 1 minute. If the throttled consumption TPS in 1 minute is greater than 0 for 3 consecutive data points, trigger an alarm. | Number of new dead letter messages per second. Dead letter messages are messages that fail to be consumed after the maximum number of retries is reached, indicating that consumers cannot consume messages or experience business issues. |





Feedback