tencent cloud

High Latency in Client Message Production
Last updated: 2025-09-29 14:51:53
High Latency in Client Message Production
Last updated: 2025-09-29 14:51:53

Issue Description

Client symptoms show increased latency when messages are sent by the Producer:
Message write speed slows down, and sending latency increases.
CPU utilization is too high.

Troubleshooting Steps

Step 1: Traffic Throttling Check

Check if a single Topic is rate limited: If the throttling value is set, it can cause the message sending rate to be limited.
Optimization suggestion: add Topic throttling value (based on actual business need).

Check if the instance is throttled. If throttling occurs, increase bandwidth.


Step 2: Cluster Performance Check

Check cluster CPU load. If the cluster as a whole has high load, it directly causes higher sending latency.
Optimization scheme: Scale-out

Step 3: Client Parameter Optimization

Reasonably set Batch parameters to reduce fragmented requests and enhance delivery performance. Small batches cause the client to frequently initiate requests, increasing server queue pressure and further raising latency and CPU consumption.
Optimization Suggestions:
Set acks, batch.size, and linger.ms reasonably, adjust parameters according to business needs. Recommended value:
acks=1
batch.size=16384
linger.ms=1000
Detailed explanation is as follows:
1. ack parameter adjustment
Acks parameter controls the confirmation mechanism for waiting after the Producer sends messages:
acks=0: No need for server confirmation, high performance, but high risk of data loss.
acks=1: The primary node returns confirmation once written successfully, medium performance, suitable for most scenarios.
acks=all or -1: Confirmation is returned after both the primary node and replicas are written successfully, ensuring high data reliability but with poor performance.
Optimization Suggestions:
To enhance delivery performance, recommend setting acks=1.
2. Adjust batch send parameter
Batch sending can reduce request count and improve network throughput.
batch.size: total size of the messages cached in each partition (unit: byte). When it reaches the set value, batch send is triggered.
linger.ms: the maximum duration a message stays in cache. Messages are sent immediately when exceeding this value, even if batch.size is not reached.
Recommended Configuration:
batch.size=16384 (default value: 16KB).
linger.ms=1000 (default value: 0, meaning send immediately without waiting).
3. Adjust cache size
buffer.memory: Controls the total size of cached messages. When the cache limit is exceeded, it forces sending and ignores the batch.size and linger.ms limits.
Default value is 32MB, which ensures sufficient performance for a single Producer.
Recommended configuration: buffer.memory ≧ batch.size * number of partitions * 2.
Note:
Warning: If you start multiple Producers in the same JVM, set buffer.memory with caution to avoid OOM issues.

Key Issues

1. High CPU load causes
High Kafka CPU load usually comes from the following aspects:
Production and consumption request processing:
High I/O throughput producers and consumers may cause large CPU occupancy
Data replication:
Replica synchronization process needs to perform data copy operation
High-frequency control requests:
High-frequency operations such as Offset management and Metadata queries occupy Broker CPU resources.
2. How to handle high load in a processing cluster
Common performance of high load includes:
Message sending delay increases.
Request queue depth is too large and remains persistently high.
Broker CPU or disk I/O remains under high load.
Solution:
2.1 Scale-out:
Increase the number of Broker nodes, spread partition load, and reduce single server pressure.
Optimize partition distribution to ensure partitions are evenly distributed across all brokers.
2.2 Traffic throttling:
Set a reasonable production traffic throttling value to avoid traffic surge on the production side causing Broker overload.
3. How to optimize client implementation for low latency under high-load cluster conditions
When a Kafka cluster does not support scale-out or the workload cannot be effectively reduced, you can optimize the client configuration to get low delay as much as possible.
Client parameter optimization
Set acks, batch.size, and linger.ms reasonably, adjust parameters according to business needs. Recommended value:
acks=1
batch.size=16384
linger.ms=1000

Summary

Through troubleshooting steps and parameter optimization methods, the high load issue of the Kafka cluster and message production latency can be effectively resolved.
For high cluster load, scaling out is the most direct solution.
Optimize client parameters, adjust batch sending and buffer size to significantly improve performance.
If the problem remains unresolved, you can contact Online Customer Service.




Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback