To achieve optimal performance for the bulk load import mode and maximize the utilization of system resources, we have summarized the tuning experience accumulated during practice below for your reference.
Sequential Import
Ordered import refers to importing data in ascending order of the primary key, allowing the database to process the data more efficiently. Compared with disordered import, ordered import improves performance by 20%~300%.
Here, the order is divided into two levels of order.
bulk-load transactions are internally ordered: Within a bulk-load transaction, data must be arranged in ascending order of the primary key.
Orderable bulk-load transactions: Taking the minimum and maximum primary keys of the data written by a transaction as its range, if the ranges of transactions do not overlap, they are considered orderable. In other words, when the data written by two transactions can be sorted in ascending order of the primary key without any interleaving, these two transactions are orderable.
When both of the above ordering conditions are met simultaneously, the data import is considered fully ordered, achieving optimal performance.
If the first condition is not met, meaning the individual records within a transaction are not sorted by primary key, the parameter tdsql_bulk_load_allow_unsorted must be enabled before import; otherwise, the operation will fail with an error. When the parameter tdsql_bulk_load_allow_unsorted is enabled, data within each transaction is first staged in temporary files. During the commit phase, an external merge sort is performed before sequentially writing to external SST files. If the data within a transaction is already ordered, this step is skipped, and data is written directly to external SST files. Consequently, unordered transactions consume more CPU and I/O resources than ordered ones.
If the second condition is not met, the external SST files generated between transactions will have overlapping ranges. When they are inserted into the LSM-tree, this overlap prevents finding suitable insertion positions in lower levels, causing more SST files to be ultimately inserted into the top level (level-0). This triggers compaction operations to lower levels, consuming additional I/O resources during bulk-load data import and adversely affecting overall performance.
Concurrent Import
When disk performance is good (such as NVMe disks), the bottleneck for bulk-load performance is more likely to be CPU-bound rather than I/O-bound. You can try appropriately increasing the concurrency of import threads (50~100) to improve import efficiency.
The system supports configuration via tdstore_bulk_load_sorted_data_sst_compression, tdstore_bulk_load_unsorted_data_sst_compression,
The tdstore_bulk_load_temp_file_compression parameter (requires super administrator privileges) adjusts the compression algorithm used for external SST files and temporary data files generated during the bulk-load import process to balance CPU and I/O resources with disk space utilization.
SET GLOBAL tdstore_bulk_load_sorted_data_sst_compression=zstd,snappy,lz4,auto,nocompression,...;
Control of Transaction Size
Unlike regular transactions that temporarily store uncommitted data in a write batch for in-memory storage, bulk-load transactions directly write uncommitted data to external SST files. This approach prevents large transactions from causing excessive memory consumption and OOM (Out Of Memory) issues.
Conversely, bulk-load transactions should not be too small in scale. Otherwise, they would generate smaller SST files on one hand, and frequent bulk-load transaction commits would also result in suboptimal import performance on the other.
Typically, a bulk-load transaction with a data volume of 200MB or more delivers optimal performance. Note that compression is enabled by default for external SST files. Taking ZSTD as an example, compression ratios can reach 10:1 or higher compared to raw data. Therefore, it is recommended to have a bulk-load transaction write raw data volumes exceeding 2GB.
Automatic Transaction Batching and Committing
For large-scale data migration imports with higher performance requirements and no read requests during the migration phase, consider enabling auto-batching and committing for bulk-load transactions: SET GLOBAL tdsql_bulk_load_allow_auto_organize_txn = ON; This allows the database to automatically accumulate bulk-load transaction data to an optimal scale before initiating commits, maintaining high import performance. Note that when this option is enabled, data corresponding to a client's successful commit request may still be in the batching phase and not yet reached the auto-commit threshold, thus potentially not immediately visible. It is recommended to perform row count verification after the import is completed.
Secondary index
For tables with secondary indexes, bulk-load performance degrades as the number of secondary indexes increases. This occurs because secondary index data is inherently unordered, requiring additional sorting operations that introduce computational overhead during the import process. To optimize scenarios involving secondary indexes, it is recommended to appropriately increase the size of bulk-load transactions.
In versions 21.x or later, a more recommended approach is: when importing data via bulk-load, create tables without secondary indexes initially. After completing the import, create secondary indexes with fast online DDL optimization enabled (requires manual activation), which delivers better overall performance.
Partition Table
From the computational layer perspective, each partitioned table is assigned a unique t_index_id. The actual processing logic is essentially similar to handling a regular table. It is necessary to consider scenarios where a bulk load transaction spans multiple partitioned tables.
For range partitioning with sequential data import, the data within a bulk-load transaction typically concentrates within a single range partition or a small number of adjacent range partitions. In such cases, a bulk-load transaction generally does not involve too many partitioned tables, similar to scenarios with regular tables, and therefore no additional processing is required.
For hash partitioning with sequential data import, the data within a bulk-load transaction is evenly distributed across all hash partitions. In this scenario, a single bulk-load transaction will involve nearly all partitioned tables. It is necessary to further increase the size of bulk-load transactions to ensure that the data volume allocated to each hash partition within a transaction is not too small. Assuming there are N hash partitions, it is recommended to scale the bulk-load transaction size by a factor of N compared to regular table scenarios. Currently, in scenarios with 16 hash partitions, performance experiences an approximately 15% reduction compared to non-partitioned tables (regular tables).
If sub-partitions exist, it is necessary to consider how many sub-partitions the data within a bulk load transaction will involve, ensuring that the data volume allocated to each sub-partition is not too small.
In summary, as the number of partitions involved in a bulk load transaction increases, performance gradually degrades. From the storage layer perspective, this occurs because a single bulk load transaction engages too many RGs as transaction participants. Similar to regular transactions, performance deteriorates as more partitions are involved, since a greater number of participants at the RG level negatively impacts efficiency.