tencent cloud

Tencent Cloud Smart Advisor

Release Notes
Product Introduction
Overview
Features
Product Strengths
Scenarios
Customer Cases
Purchase Guide
Getting Started
Using TSA to Perform a Cloud Risk Assessment
Using TSA to Execute a Chaos Experiment on CFG
Operation Guide
Operation Guide to TSA-Cloud Architecture
Operation Guide to TSA-Cloud Risk Assessment
Operation Guide to TSA-Chaotic Fault Generator
Operation Guide to TSA-Digital Assets
Permission Management
API Documentation
History
Introduction
API Category
Making API Requests
Other APIs
Task APIs
Cloud Architecture Console APIs
Data Types
Error Codes
FAQs
FAQs: TSA
FAQs: TSA-Cloud Risk Assessment
FAQs: TSA-Cloud Architecture
FAQs: TSA-Chaotic Fault Generator
Related Protocol
Tencent Cloud Smart Advisor Service Level Agreement
PRIVACY POLICY MODULE CHAOTIC FAULT GENERATOR
DATA PRIVACY AND SECURITY AGREEMENT MODULE CHAOTIC FAULT GENERATOR
Contact Us

High CPU Load on CKafka Broker Nodes

PDF
포커스 모드
폰트 크기
마지막 업데이트 시간: 2026-03-31 23:00:18

Background

Message middleware often plays a critical role in distributed systems. However, in actual production environments, various factors can lead to high CPU load on Broker nodes. Here are some common scenarios:
High message throughput: If a topic or partition in a CKafka cluster receives very high message throughput, the Broker nodes need to handle a large number of read and write operations.
Large number of consumer groups: If a large number of consumer groups subscribe to the same topic or partition, the Broker nodes need to handle message distribution and management for each consumer group.
Replication and synchronization: If the data replication and synchronization feature is enabled in the CKafka cluster, the Broker nodes need to handle replicated read and write operations and synchronize with other Broker nodes.
Compression and decompression: If messages are stored in compressed format, the Broker nodes need to compress and decompress them, which may consume a significant amount of CPU resources.
Index and log compression: CKafka uses indexes to accelerate message lookup. If the index volume is too large or needs to be compressed, the Broker nodes need to maintain and compress the indexes.
High concurrent connections: If a large number of producers and consumers want to connect to the Broker nodes, the Broker nodes need to establish and maintain connections, increasing CPU load.
When Broker nodes are under high CPU load, the following issues may occur:
Increased latency: High CPU load may slow down message processing, thereby increasing message transmission and processing latency. This lowers consumers' speed to read messages from CKafka, which may prevent consumers from obtaining the latest messages in a timely manner.
Decreased throughput: Since CPU resources are consumed by high-load tasks, CKafka Broker nodes may not process additional messages, resulting in a decreased in overall throughput. This reduces producers' speed of sending messages and consumers' speed of consuming messages.
Network congestion: High CPU load may prevent CKafka Broker nodes from processing network requests promptly, leading to network congestion. This affects data replication and synchronization with other Broker nodes, potentially causing increased replication latency or untimely data synchronization.
Increased response time: Due to high CPU load, CKafka Broker nodes may fail to respond promptly to client requests, resulting in increased wait time for clients. This affects the performance and response time of applications accessing the CKafka cluster.
To prevent these issues, TSA-Chaotic Fault Generator (TSA-CFG) provides a high CPU load experiment action of CKafka Broker nodes to test the response and recovery capabilities of business systems when they are facing unexpected situations such as latency caused by high load on CKafka Broker nodes, thereby enhancing the security and stability of the business.

Must-Knows

Instance type: This action only enables fault injection capabilities for CKafka Professional Edition instances. CKafka Standard Edition instances are not currently supported for experiments.
Instance status: It is recommended that instances undergoing experiments have active message production and consumption traffic, with more than 3 topic partitions. This enables users to better observe the impact of faults on the business. (Optional)

Experiment Preparation

Prepare a CKafka Professional Edition instance available for experiments.

Step 1: Creating an Experiment

1. Log in to the Tencent Cloud Smart Advisor (TSA) console, choose Architecture Governance, select Governance Mode, and click CFG. (For details about how to create an experiment, see Using TSA to Execute a Chaos Experiment on CFG.)
2. Click Create Experiment, enter the basic information about the experiment, and click Next.
3. Choose Middleware > Ckafka from the Experiment Instance drop-down list, click Add via Search, and add instance resources. Alternatively, click Add via Architecture Diagram, click the Ckafka resources on the architecture diagram, select the required instance, and add it.
4. After the instance is added, click Add Action, select Broker High CPU Load as the experiment action, and click Next.
5. Set action parameters. For example, select a CPU load rate of 80% and a duration of 200s, and then click OK.
6. After completing the parameter configuration, set Execution Mode and Guardrail Policy, and add metrics for Observability Metrics in the Global Configuration section. After the configuration is complete, click Submit to complete the experiment creation.

Step 2: Executing the Experiment

1. Observe the instance monitoring data before the experiment. You can go to the TDMQ for CKafka console to view the monitoring metrics in Advanced Monitoring.
2. Go to the experiment details panel, and click Execute in the fault action group or Start Experiment in the lower part of the panel to inject a fault.
3. During fault injection, you can click the link in the log to go to Advanced Monitoring for observation.
4. Observe that the CPU utilization has reached the set value.
5. After the fault is injected, click Recovery Action to recover from the injected fault.


도움말 및 지원

문제 해결에 도움이 되었나요?

피드백