How do I check whether data heaps up during dump?
CKafka data dump refers to the process where CKafka data is distributed to other sources such as ES and ClickHouse.
The sync service consumes messages in the CKafka instance, so the corresponding consumer group will be generated, which can be viewed on the consumer group management page in the console. Generally, this consumer group is named
datahub-task-xxx. After the sync service consumes messages, it will write them to the service of the dump target and then submit the offset position corresponding to the number of written messages.
Therefore, to determine whether the dumped data heaps up, you can simply check whether the number of unconsumed messages in the consumer group keeps increasing.
What should I do if data heaps up?
There are two types of data heap:
If the consumption capacity of the sync service is limited, you can increase task concurrency so that the sync service on the backend can add more consumers. You can also increase the number of topic partitions as needed to improve consumer throughput. If the consumption traffic of the instance reaches the limit and gets throttled, you also need to upgrade the bandwidth specification of the instance.
If the heap problem persists after you increase the CKafka consumption capacity, the problem may be that the rate of writing data to the target is limited, preventing the sync service from quickly completing the process of writing data and submitting the offset. For example, when a large number of writes hit the bottleneck in ES, a lock may be generated to protect the service, which may reject external writes or even cause sync task exceptions; or, if the number of writes per second reaches the upper limit in TDW, writes will be blocked. In this case, you should determine the write bottleneck of the target and increase its write rate.