DataHub offers data distribution capabilities. You can distribute CKafka data to ClickHouse for further storage, query, and analysis.
On the settings page, set the following CKafka configuration items:
After completing the above settings, click Preview Data, and the first message from the specified Source Topic will be obtained and parsed.
Currently, message parsing must meet the following requirements:
- The message is a JSON string.
- The message after parsing is a single-level JSON string. Currently, JSON strings with a nested structure cannot be parsed.
If the message is not a single-level JSON string, we recommend you use data processing for message format conversion first.
Click Preview Topic Message, and the parsed message fields will be displayed in the console. You can modify the
type attribute in the preview result to set the type of the target column for data delivery.
When you select
DateTime as the
type, if the source message format is integer, the
unix timestamp format will be used for parsing; if it is string, a common time format pattern string will be used for parsing.
Supported ClickHouse database types for data distribution include CDWCH and self-built ClickHouse databases.
As a CDWCH instance has been encapsulated with a private connection during creation, you can directly select the corresponding CDWCH instance in the console, and the data distribution feature will automatically connect to the instance's VPC.
As the CKafka instance is a managed instance and EMR ClickHouse creates a public network route on the purchased CVM instance directly, you need to manually create a CLB instance to connect to the VPC. The following steps use EMR ClickHouse as an example to create a CLB instance:
Go to the EMR console, select the target cluster, click Cluster Resource > Node Status, and find the ClickHouse node IP on the status page.
Go to the CLB console, create a CLB instance, click Listener Management on the top, click TCP/UDP/TCP SSL Listener on the page, and enter the port used during data distribution as the port.
After creating a listener, click Bind Backend Service and enter the TCP port of ClickHouse, which is 9000 by default.
After binding, you can select the created CLB instance and enter the port listened on by the CLB instance on the data distribution page in the CKafka console.
Currently, you can create a data distribution to ClickHouse task only in the same region as the CLB instance.
After the network is connected, you need to set the following configuration items of the data distribution target instance:
For security reasons, the ClickHouse password is required for data distribution.
Currently, the password after instance creation may be empty, in which case you need to modify the password in the
user.xmlconfiguration file. For detailed directions, see User Settings.
The dump speed is subject to the limit of the peak bandwidth of the CKafka instance. If the consumption is too slow, check the peak bandwidth settings or increase the number of CKafka partitions.