tencent cloud

Feedback

Data Distribution to ClickHouse

Last updated: 2022-05-18 18:58:23

    Overview

    DataHub offers data distribution capabilities. You can distribute CKafka data to ClickHouse for further storage, query, and analysis.

    Prerequisites

    • To use CDWCH, you need to activate it in advance. In addition, data distribution to self-built ClickHouse is also supported.
    • Create a table in ClickHouse and specify a column and type during table creation.

    Directions

    Creating task

    1. Log in to the CKafka console.
    2. Click Data Distribution on the left sidebar, select the region, and click Create Task.
    3. Select ClickHouse as the Target Type.

    Configuring CKafka data source

    On the settings page, set the following CKafka configuration items:

    1. Task Name: It can only contain letters, digits, underscores, or symbols ("-" and ".").
    2. CKafka Instance: Select the source CKafka instance.
    3. Source Topic: Select a topic under the instance. A data distribution task supports up to five source topics. Data in this topic can be successfully dumped only if it is in the same format.

    Parsing message

    After completing the above settings, click Preview Data, and the first message from the specified Source Topic will be obtained and parsed.

    Note:

    Currently, message parsing must meet the following requirements:

    • The message is a JSON string.
    • The message after parsing is a single-level JSON string. Currently, JSON strings with a nested structure cannot be parsed.

    If the message is not a single-level JSON string, we recommend you use data processing for message format conversion first.

    Click Preview Topic Message, and the parsed message fields will be displayed in the console. You can modify the type attribute in the preview result to set the type of the target column for data delivery.
    When you select Date or DateTime as the type, if the source message format is integer, the unix timestamp format will be used for parsing; if it is string, a common time format pattern string will be used for parsing.

    Configuring data distribution

    Supported ClickHouse database types for data distribution include CDWCH and self-built ClickHouse databases.

    As a CDWCH instance has been encapsulated with a private connection during creation, you can directly select the corresponding CDWCH instance in the console, and the data distribution feature will automatically connect to the instance's VPC.

    After the network is connected, you need to set the following configuration items of the data distribution target instance:

    • Username: Target ClickHouse username, which is default by default.
    • Password: Target ClickHouse password.
      Note:

      For security reasons, the ClickHouse password is required for data distribution.
      Currently, the password after instance creation may be empty, in which case you need to modify the password in the user.xml configuration file. For detailed directions, see User Settings.

    • Cluster: ClickHouse cluster name, which is default_cluster by default.
    • Database: Database name set in ClickHouse.
    • Table: Name of the table created in the database. Currently, no table will be created automatically during data distribution to ClickHouse, so you need to manually create the current target table in ClickHouse.
    • Discard Message with Parsing Failure: A message parsing failure may occur if the message field type differs from that of the target database. If you don't discard the message that can't be parsed, exceptions may occur and data dumping will be stopped.

    Click Submit.

    Configuring Monitoring

    1. Log in to the CKafka console.
    2. Click Data Distribution on the left sidebar and click the ID of the target task to enter its basic information page.
    3. At the top of the task details page, click Monitoring, select the resource to be viewed, and set the time range to view the corresponding monitoring data.

    Restrictions and Billing

    The dump speed is subject to the limit of the peak bandwidth of the CKafka instance. If the consumption is too slow, check the peak bandwidth settings or increase the number of CKafka partitions.

    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support