Data Synchronization | Instance-Level Data Synchronization | Topic-Level Data Synchronization |
Data source | Private network connection: CKafka clusters. Public network connection: CKafka clusters or self-built Kafka clusters. Cross-network connection: self-built Kafka clusters or clusters of other cloud vendors. | CKafka clusters. |
Data target | Private network connection: CKafka clusters. Public network connection: CKafka clusters or self-built Kafka clusters. | CKafka clusters. |
Operation steps | 1. Create a data source connection. 2. Create a data target connection. 3. Create a data synchronization task. 4. Configure the data source. 5. Configure the data target. 6. View the data synchronization progress. | 1. Create a data replication task. 2. Configure the data source. 3. Process data. 4. Configure the data target. 5. View the data synchronization progress. |
Type | Item | Limit |
Connection dimension | Number of connections per UIN | 150 |
Task dimension | Number of tasks per UIN | 150 |
| Concurrency per task | min (Total number of partitions in the data source topics, 20) |
Data Type | Rule Description |
Synchronizing metadata | The process consists of two phases: initialization synchronization and scheduled synchronization. Initialization synchronization: When a task starts, it checks whether a corresponding topic of the upstream instance exists in the downstream instance. If not, it creates a topic in the downstream instance (configurations will match the upstream instance as closely as possible). If a corresponding topic already exists in the downstream instance, initialization synchronization is not triggered. Scheduled synchronization: After the task starts, it periodically (every 3 minutes) synchronizes certain metadata configurations from the upstream instance to the downstream instance. Note: Scheduled synchronization does not support synchronizing the number of replicas. During scheduled synchronization, the number of partitions can only be increased in one direction and cannot be decreased. If the number of partitions in the downstream instance is already greater than that in the upstream instance, the number of partitions will not be synchronized. For stability reasons, the retention.ms and retention.bytes metadata of the target topic will be synchronized only when their value is -1 (Note: -1 is an internal Kafka definition representing unlimited retention). In other cases, these two metadata items will not be synchronized periodically. In terms of topic-level configurations, for stability reasons, the metadata of newly added topics will be fully synchronized only once during task initialization by default. Subsequent changes to topic configurations will not be synchronized to the downstream instance. The reason is that if a user modifies the upstream topic configuration (for example, shortening the message retention period) without being aware of an active data synchronization task, synchronizing this change to the downstream instance may result in significant data loss before messages are consumed. Solution for synchronizing configuration changes: If you need to change topics within an existing task and synchronize the configuration change, it is recommended to manually modify the configurations of both the upstream and downstream topics to avoid data loss or stability issues. |
Synchronizing message data | Message data stored in the upstream Kafka instance is synchronized to the corresponding topic in the downstream Kafka instance. If synchronizing to the same partition is enabled, messages are consistently synchronized to the corresponding partition in the downstream instance. |
Synchronizing consumption offsets | When synchronizing message data stored in the upstream Kafka instance to the corresponding topic in the downstream Kafka instance, the system simultaneously synchronizes relevant consumer groups and their committed offset information for the topic. Note that this offset reflects a mapped correspondence. |
Data Synchronization Type | Parameter | Limit | Default Value |
Metadata | Number of Partitions | 1. When metadata needs to be synchronized, two parameters that do not meet the criteria will not be synchronized. If the number of partitions in the target topic is greater than that in the source topic, the number of partitions cannot be synchronized. If the number of replicas in the target topic differs from that in the source topic, the number of replicas cannot be synchronized. 2. When the length of the topic name in the source instance exceeds 128 characters, the target topic will use the first 128 characters as its name. | / |
| Number of Replicas | | / |
| retention.ms | | 604800000 (7 days) |
| cleanup.policy | | delete |
| min.insync.replicas | | 1 |
| unclean.leader.election.enable | | false |
| segment.ms | | 604800000 |
| retention.bytes | | The default value depends on the Kafka configuration. |
| max.message.bytes | | 1048588 |
| Consumer Group | If automatic creation of consumer groups is disabled for the target instance, consumer groups cannot be synchronized. | / |
Message data | / | Message data can be synchronized to the same partition. | / |
Consumption offset | / | 1. When consumption offset synchronization is required, the following situations may cause inaccurate offset alignment: The source and target instances have a topic with the same name, and the target topic has other message writers. The source and target instances have a topic with the same name, and the task is re-created. Each time a task is created, the data for the latest position read at the start of the new task is synchronized to the downstream instance without synchronizing historical data. Historical data is discarded in this case. 2. Instances of version 0.10.2.1 and earlier do not support consumption offset synchronization. If a data synchronization task involves such a version either in the upstream or downstream instance, creating a task for consumption offset synchronization is not supported. | / |
Kafka Type | Description |
CKafka | If the client and the CKafka cluster are deployed in the same Virtual Private Cloud (VPC) network, the network is connected by default. You can directly select the pre-created CKafka instance from the drop-down list under the corresponding region and CKafka instance. |
Public network connection | If the client and the Kafka cluster are deployed in different network environments, you can use the public network for cross-network production and consumption. Public network connections support both self-built Kafka clusters and CKafka clusters. When you use a public network connection, it is recommended to configure security policies to ensure data transmission security. Broker Address: Enter the broker address of your Kafka cluster. If multiple brokers exist, you only need to enter the IP address and port of one broker, for example, 127.0.0.1:5664. The connector will establish network connectivity for all brokers. The entered IP address must remain accessible during data synchronization. ACL Configuration: If the source cluster has the access control list (ACL) enabled, configure the corresponding access information (ACL username and password) for this parameter. ACL configuration only supports statically configured users (PLAIN mechanism) and does not support dynamically configured users (SCRAM mechanism). |
Cross-network connection | Cross-network connection enables synchronizing data and metadata from Kafka clusters of other cloud vendors or self-built Kafka clusters to CKafka clusters. VPC Network: Select the VPC network ID of your self-built Kafka cluster or the ID of the VPC network established for cross-cloud connectivity. Subnet: Select the VPC subnet of your self-built Kafka cluster or the VPC subnet established for cross-cloud connectivity. Cloud Connect Network ID: Cross-cloud synchronization usually requires a dedicated connection established via Cloud Connect Network (CCN). Cross-Cloud Resource ID: typically the upstream instance ID of the connector, identifying a unique resource in the cross-cloud synchronization linkage. When you create a connection, the system automatically detects the node information under this resource ID, establishes network connectivity, and associates relevant routing rules. When you delete this connection, the automatically established routing rules under this resource ID are also deleted. Broker Address: Enter the broker address of your Kafka cluster. If multiple brokers exist, you only need to enter the IP address and port of one broker, for example, 127.x.x.1:5664. The connector will establish network connectivity for all brokers. The entered IP address must remain accessible during data synchronization. ACL Configuration: If the source cluster has the access control list (ACL) enabled, configure the corresponding access information (ACL username and password) for this parameter. ACL configuration only supports statically configured users (PLAIN mechanism) and does not support dynamically configured users (SCRAM mechanism). |
Parameter | Description |
Task Name | Enter a task name to distinguish different data synchronization tasks. The task name must comply with the naming rule: it can contain only letters, digits, underscores (_), hyphens (-), and periods (.). |
Data Source Type | Select Full Kafka Instance. |
Connect to region | Select the region of the pre-configured data source connection from the drop-down list. |
Kafka Connection | Select the pre-configured data source connection from the drop-down list. |
Type of Synced Data | Sync metadata only: synchronizes the metadata of topic and consumer group structures within the source instance. Sync metadata and message data: synchronizes the metadata of topic and consumer group structures within the source instance, along with the message data in topics. Sync metadata, message data, and consumption offset: synchronizes the metadata of topics and consumer groups, the message data in topics, and the consumption offsets of consumer groups in the source instance. Updates to the consumption offsets of the source instance's consumer groups will be synchronized to the consumer groups with the same name in the target instance. |
Start Offset | If you select Sync metadata and message data or Sync metadata, message Data, and consumption offset, you need to configure the topic offset to set the processing policy for historical messages during the dump. Two methods are supported: Start consumption from the latest offset: the maximum offset. Consumption starts from the latest data (skipping historical messages). Start consumption from the start offset: the minimum offset. Consumption starts from the earliest data (synchronizing all historical messages). |
Topic sync range | If you select Sync metadata and message data or Sync metadata, message data, and consumption offset, you need to configure the topic scope for data synchronization. Sync metadata and message data: You can select All Topics or specify certain topics. If you choose to specify certain topics, you need to match topics using a regular expression. After the expression passes validation, you can proceed to the next step. Note: When you use a regular expression to synchronize certain topics, the consumer groups corresponding to these topics will not be synchronized. Sync metadata, message data, and consumption offset: You can select All Topics only. |
Parameter | Description |
Task Name | Enter a task name to distinguish different data synchronization tasks. The task name must comply with the naming rule: it can contain only letters, digits, underscores (_), hyphens (-), and periods (.). |
Data Source Type | Select Topic in the CKafka Instance. |
Data Source Region | Select the region where the data source instance resides from the drop-down list. |
CKafka Instance | Select the pre-configured data source CKafka instance from the drop-down list. |
Source Topic | Select the pre-configured data source topic from the drop-down list. If an ACL policy is configured for the data source instance, ensure that you have read/write permissions on the selected source topic. |
Start Offset | Configure the topic offset to set the processing policy for historical messages during the dump. Three methods are supported: Start consumption from the latest offset: the maximum offset. Consumption starts from the latest data (skipping historical messages). Start consumption from the start offset: the minimum offset. Consumption starts from the earliest data (processing all historical messages). Start consumption from the specified time point: Consumption starts from a user-defined point in time. |
Parsing Mode | Description |
JSON | Parse data in standard JSON format, support nested fields, and output in key-value pairs. |
Separator | Parse unstructured text based on specified delimiters. Supported delimiters include Space, Tab, ,, ;, |, :, and Custom. |
Regex | It is suitable for extracting specific fields from long array-type messages. You can manually enter a regular expression or use the regular expression auto-generation feature. For more information, see Regular Expression Extraction. Note: When the input regular expression contains capture groups such as (?<name>expr) or (?P<name>expr), it is treated as a pattern string for matching. When a message successfully matches the pattern string, the capture group content is parsed. Otherwise, the entire input regular expression is treated as a capture group, extracting all matching content from the message. |
JSON object array - single-row output | Each object in the array has a consistent format. Only the first object is parsed, and the output is a single JSON object of map type. |
JSON object array - multi-row output | Each object in the array has a consistent format. Only the first object is parsed, and the output is an array type. |

Operation | Description |
Mapping | You can select an existing key, and the final output value is mapped from the specified key. |
JSONPATH | Parse multi-layer nested JSON data, starting with the $ symbol and using the . symbol to locate specific fields in multi-layer JSON. For more information, see JSONPath. |
Current system time | You can select a system-preset value. DATE (timestamp) is supported. |
Custom | You can enter a custom value. |

Output Row Content | Description |
VALUE | Only output the values in the above test results, separated by delimiters. The delimiter between values defaults to the None option. |
KEY&VALUE | Output the key and value in the above test results. Neither the delimiter between the key and value nor the delimiter between values can be None. |
Handling Method | Description |
Discard | It is suitable for production environments. When a task fails to run, the current failed message will be ignored. It is recommended to use the Retain mode for testing until no errors are detected, and then edit the task in Discard mode for production. |
Retain | It is suitable for test environments. When a task fails to run, it will be terminated, no retries will be performed, and the failure reasons will be recorded in Event Center. |
Put to dead letter queue | You need to specify a topic for the dead letter queue. It is suitable for strict production environments. When a task fails to run, the failed messages along with metadata and failure reasons will be delivered to the specified CKafka topic. |
Feedback