Overview
DLC Native Table (TC-Iceberg) is a batch-stream integrated table format provided by Tencent Cloud based on Iceberg. It is compatible with and contains all the strengths of Apache Iceberg, and provides performance enhancement and near-real-time lakehouse construction capabilities. Compared with Apache Iceberg, TC-Iceberg has the following characteristics:
Compatibility with Apache Iceberg: TC-Iceberg is a non-intrusive extension based on the Iceberg format. It supports all features of Apache Iceberg V2 tables, including time travel queries and upsert operations.
Near-real-time lakehouse capability extension: Compared with Apache Iceberg, where updated data from streaming writes cannot be consumed downstream in a streaming manner, TC-Iceberg supports streaming writes while allowing for streaming incremental data reads in CDC (Change Data Capture) format. Additionally, it provides a scalable merge process to handle scenarios such as specified column updates.
Performance enhancement: TC-Iceberg improves the merge-on-read performance in update scenarios through an automated bucketing mechanism.
Intelligent data optimization: TC-Iceberg supports real-time monitoring of write and query operations on tables. Based on the monitoring information, it automatically triggers optimization tasks as needed, schedules optimization resources, and adjusts the priority of optimization tasks for reasonable and intelligent scheduling, thereby enhancing the quality and efficiency of optimization.
Note:
Currently, the TC-Iceberg format is in the public beta stage. It only supports the primary key table type and has the following constraints and limitations:
1. Currently, Data Lake Compute (DLC) only supports using the standard engine Standard-S 1.1 (Standard-S 1.1 Native is not supported) to perform DDL, DML, and merge queries on TC-Iceberg tables. Other versions of the engine support querying existing TC-Iceberg data (data in BaseStore).
2. Only the standard engine Standard-S is supported for data optimization; After enabling data optimization for TC-Iceberg tables, the optimization engine will generate resident monitoring resource consumption (with a default value of 1 CU), and the execution resources for optimization tasks will perform auto scaling according to actual needs.
3. Currently, the test version does not support Java SDK writes and InLong data lake ingestion.
Native Table (TC-Iceberg) Principle Parsing
TC-Iceberg is initially designed to provide a unified storage format with complete batch-stream scenario capabilities for modern data lake scenarios, on the basis of being fully compatible with the open-source Iceberg table format.
The TC-Iceberg primary key table consists of two Iceberg tables:
1. ChangeStore: An independent Iceberg table that stores incremental data of the table. All operations on the table are written into ChangeStore in append mode. The written data will be automatically merged into BaseStore. The data in ChangeStore is generally retained for a specified period and the expired data will be automatically deleted.
2. BaseStore: An independent Iceberg table that stores the existing data of the table and is compatible with native reads and writes on Iceberg tables. Since BaseStore only consists of Insert File and Position Delete File, it has better data analysis performance, but there will be some delay in the data.
Meanwhile, there are also two important processes in the TC-Iceberg format:
1. Merge-On-Read: The data in ChangeStore and BaseStore will be merged when reading. This ensures data latency in data analysis scenarios.
2. Auto Compaction: The intelligent optimization service will also automatically merge the data in ChangeStore into BaseStore periodically to ensure the performance of the merge-on-read process.
Both BaseStore and ChangeStore use the Apache Iceberg format and remain consistent with Iceberg in Schema, data format, data type, partition usage, and other aspects.
Native Table (TC-Iceberg) Creation Attributes
For better management and use of DLC Native Table (TC-Iceberg), you need to specify some attributes when creating this type of table. These attributes are as follows. Users can specify these attribute values when creating a table or modify the attribute values of the table. For detailed operations, see native table (TC-Iceberg) operation configurations. |
base.file-index.hash-bucket | Number of buckets used for hash file indexing in BaseStore | The default value is 4. The value must be a power of 2. It is recommended to make an assessment according to the data amount in a partition. Generally, it is recommended that each bucket stores 1 GB - 2 GB of data. |
change.file-index.hash-bucket | Number of buckets used for hash file indexing in ChangeStore | The default value is 4. The value must be a power of 2. It is recommended to make an assessment according to the incremental data written in a partition. Generally, it is recommended that 1 MB - 2 MB of incremental data is written into each bucket each time. |
Core Capabilities of Native Tables (TC-Iceberg)
TC-Iceberg has the same capabilities as Iceberg in ACID transactions, hidden partitioning, metadata queries, and storage procedures. For details, see native table (Iceberg) format description. Additionally, TC-Iceberg provides the following core capabilities: CDC Streaming Consumption
In the table structure of TC-Iceberg with a primary key, ChangeStore is specialized for storing CDC data in the table. Stream computing engines such as Apache Flink periodically append the CDC data generated by the upstream to ChangeStore in the Append manner. Flink tasks downstream continuously refresh the newly added snapshots in the table and consume the CDC data in the table in a streaming manner. Note that three additional metadata fields will be added to the table structure of ChangeStore.
1. _change_action: tags the operation type of this row of data. Possible values include INSERT, UPDATE_BEFORE, UPDATE_AFTER, and DELETE.
2. _transaction_id: tags the transaction ID generated for this row of data. Transaction ID is a global ID that starts from 1 and increments. The larger the transaction ID, the later the data was written.
3. _file_offset: tags the order of this row of data in this transaction. The larger the order, the later the data was written.
Auto Bucketing for Accelerated Data Merging
Native tables (TC-Iceberg) provide an auto bucketing mechanism to enhance the performance of merge-on-read and auto compaction simultaneously. Auto bucketing specifically refers to splitting data into different buckets according to the primary key in BaseStore and ChangeStore. In this way, only data in the same bucket needs to be merged. This not only narrows the range of data merging but also enhances the degree of parallelism of data merging.
The number of buckets split in BaseStore and ChangeStore does not have to be the same. BaseStore generally splits buckets according to the total amount of data in the table. Generally, a bucket stores 1 GB to 2 GB of data. ChangeStore splits buckets according to the amount of incremental data written. Generally, ensure that the data written into a bucket at one time is no less than 1 MB.
Scalable Merge Process
The process of merging data in ChangeStore into BaseStore in native tables (TC-Iceberg) is scalable. Different merge processes can be specified by setting different merge-function parameters on the table. The currently supported merge processes include:
1. replace: The content in ChangeStore overwrites the content in BaseStore according to the primary key. This is the most common and default merge method.
2. partial-update: When the content in ChangeStore overwrites the content in BaseStore according to the primary key, not all fields are overwritten, but only some fields are updated. This merge process is usually used in scenarios where a real-time large table is widened. The upstream uses different write tasks to write the primary key and some fields in the table. In the native table (TC-Iceberg), the data is widened according to the primary key during the merge process and updated into BaseStore.
Intelligent Data Optimization
In TC-Iceberg, the process of automatically merging the data in ChangeStore into BaseStore is completed by the automatic optimization service. The automatic optimization service includes the following components:
1. Metrics Reporter: installed as a plugin in the computing engine (such as Spark/Flink). It generates metric events and reports them to the Optimizing Service when there are write/read operations on the table.
2. Optimizing Service: a management component of the merge service. It receives monitoring events reported by the Metrics Reporter, intelligently schedules optimization tasks based on the monitoring events, and hands them over to the Optimizer for execution.
3. Optimizer: the execution node of optimization tasks. It is dynamically scheduled by the Optimizing Service based on the load. It pulls optimization tasks from the Optimizing Service, executes them, and reports the execution results.
Currently, the optimization tasks executed on the automatic optimization service include:
1. File merging tasks: include merging data from ChangeStore into BaseStore, merging small files in BaseStore, and merging Delete File into Insert File.
2. Storage cleanup tasks: include automatic snapshot expiration, orphan file cleanup, data lifecycle management, and invalid Delete File cleanup.
3. Analysis acceleration tasks: include automatic data sorting, Bloom Filter/Bitmap secondary index creation, and Table Statistics/Partition Statistics creation.