V18.1.0
Version Release Notes
The following table lists some key kernel features for upgrading from TDSQL V18 to V18.1.0:
|
Scalability and Performance | Enable CompactOnDelete to improve batch deletion performance. | TDSQL V18.1.0 introduces the CompactOnDelete feature of RocksDB, solving the performance degradation issue caused by the accumulation of Delete Marks during DELETE operations in older versions. After this feature is enabled, the time consumption for batch deletion is significantly reduced to 1/10 of the original, effectively improving system performance. |
| Supports separate deployment of Raft Log logs and data. | TDSQL V18.1.0 supports separate disk storage for Raft Log and user data, placing Raft Log on SSDs to enhance write performance while retaining user data on HDDs to reduce costs. This solution achieves a balance between performance and cost. Currently, this feature is only supported on newly deployed instances. |
| Optimize the instance initialization process. | TDSQL V18.1.0 optimizes the instance initialization process. All HyperNode nodes start simultaneously, with MC selecting a node to initialize the data dictionary. This eliminates the time for initializing a single node and restarting for scaling out, significantly reducing startup time and improving efficiency. |
| Add two views: compaction_history and active_compaction_state. | TDSQL V18.1.0 introduces compaction-related views to track the history of compaction execution, facilitating issue identification. |
Database management | Simplify the concurrency control policy for node AS. | TDSQL V18.1.0 simplifies concurrency control for data migration by introducing new system variables: migrate-node-rep-group-replicas-parallel-limit, tdstore_max_install_snapshot_task, and tdstore_max_install_snapshot_rate. These variables dynamically adjust the concurrency and speed of migration tasks to prevent high I/O from impacting business operations. The legacy parameter td_basic_opt.max_num_install_snapshot_task is deprecated, enhancing flexibility and efficiency. |
Scalability and Performance
Enable CompactOnDelete to improve batch deletion performance
In versions prior to TDSQL V18.1.0, during the process of batch-cleaning large volumes of historical data by cyclically executing the DELETE FROM tbl WHERE ... LIMIT k statement, deletion operations became progressively slower. This occurred because deleted data was marked with deletion flags in RocksDB. As more data was deleted, the number of deletion flags in RocksDB increased. Each cyclic query for eligible deletion data required traversing these flagged entries from start to end, ultimately degrading performance.
To address this issue, when starting from TDSQL V18.1.0, RocksDB's CompactOnDelete feature is enabled. This allows the engine to collect the proportion of Delete Marks in the current table when creating new SST files. When this proportion exceeds a preset threshold, the SST file is flagged for higher-priority Compaction, accelerating Compaction speed to reduce the number of invalid MVCC historical versions in storage.
By enabling this feature, the time required for batch deletion in sysbench repeated-deletion scenarios is reduced to 1/10 of the original duration.
Supports separate deployment of Raft Log logs and data
In scenarios with massive data volumes and less demanding read/write performance requirements, users typically opt for cost-effective HDDs to store data. However, the poor I/O performance of HDDs may lead to degraded overall system performance. To address this issue, starting from TDSQL V18.1.0, Raft Logs and user data can be deployed on separate disks. Raft Logs are placed on SSDs to leverage their high I/O performance for frequent append operations on small data volumes, while user data remains on HDDs to meet large-capacity storage needs and cost efficiency. This approach achieves an optimal balance between performance and cost. This feature is only supported on newly deployed instances.
Optimize the instance initialization process
In versions prior to TDSQL V18.1.0, initializing an instance required three time-consuming steps: starting the MC node, launching a single HyperNode node to initialize the data dictionary, and then restarting the HyperNode node to scale out and reset the number of replicas. To accelerate the instance initialization process, starting from TDSQL V18.1.0, all HyperNode nodes now start simultaneously, after which the MC selects a node to initialize the data dictionary. This optimization significantly reduces the time previously spent on node scaling during startup.
Stability
Storage Stability Enhancement
Optimize TDStore RG job executor.
Replace the task queue with a scheme where each task is assigned a separate coroutine to resolve potential deadlock issues caused by an excessive number of idle coroutines and insufficient queue capacity.
Idempotency Guarantee for RG demote-promote Tasks
RG demote-promote jobs are made idempotent to prevent identical promote-demote jobs from being executable after MC aborts. This avoids inconsistencies between the RG state in TDStore and the metadata recorded by MC, thus preventing subsequent new jobs issued by MC from being blocked by TDStore's state checks.
Enhanced MC stability
Optimize Limitation on Failed Split Region Tasks.
Optimized as: when a Region split fails, only that Region is restricted from immediately issuing split tasks, while other Regions remain unaffected.
Isolate Server nodes during node unloading and await DDL completion.
Optimize the node unloading logic by first deactivating the node and awaiting the release of all its object locks.
Add Target Replica Status Check Logic Before Master Switchover Tasks Are Issued in Certain Scenarios
Added a check for the target new leader before initiating leader switchover to avoid repeatedly issuing transfer-leader jobs to a failed target leader. Simultaneously implemented an exponential backoff logic for transfer-leader-job to prevent aggressively issuing tasks.
MC Time Service Lease Check
Optimize the MC Time Service to prevent dual masters from simultaneously providing timestamp services.
MC Data Scheduling Policy DDL Awareness
When performing scheduling operations such as hotspot management or automatic merging, MC checks whether any objects within the selected RG are currently executing DDL. If so, it refrains from issuing switchover, split, or merge tasks to the RG.
MC supports the API for issuing automatic Merge RG.
Single RG Mode Phase 3 primarily provides APIs to manually issue multiple automatic Merge RG tasks.
Database management
Enhanced DDL Feature
Offline Index Creation Acceleration
In daily Ops, frequent adjustments to table structures—such as adding columns or indexes—are often required. However, when large datasets are dealt with, adding indexes may take considerable time, hindering business development. Starting from version V18.1.0, TDSQL introduced the FastOnlineDDL mechanism, accelerating offline index additions (including unique indexes) by an order of magnitude.
Usage: Setting tdsql_ddl_fillback_mode = 'BulkLoadLock' will lock the table. tdsql_ddl_fillback_mode defaults to 'ThomasWrite'.
create table sbtest1 (a INT AUTO_INCREMENT PRIMARY KEY, b INT, c INT);
INSERT INTO sbtest1 VALUES(1,1,1),(2,2,2),(3,3,3);
set session tdsql_ddl_fillback_mode = 'BulkLoadLock';
alter table sbtest1 add index idx_b(b);
Usage recommendation: DDL operations based on BulkLoad require MC to disable scheduling. If MC is currently executing scheduling tasks, an error will be immediately reported. If the instance initiates a long-duration replica migration task, it will cause continuous DDL initiation failures. You can proactively disable all MC scheduling in advance:
sql > call dbms_admin.meta_cluster_pause_schedule("new-ddl-enabled");
sql > call dbms_admin.meta_cluster_resume_schedule();
Efficiency Improvement: In the TPC-H benchmark test, adding an index to 100G of offline data (600 million rows) took 4 hours in ThomasWrite mode, while BulkLoadLock mode completed the task in only 302 seconds.
DDL Operations Support Immediate Response to KILL Statements.
Starting from TDSQL V18.1.0, during parallel data backfilling for DDL operations, the task processing logic has been optimized to support immediate response to KILL statements. This ensures quick termination and rollback of DDL operations, enhancing system stability.
FastOnlineDDL and BulkLoad Data Writing Features Are Mutually Exclusive.
To ensure data writing accuracy, TDSQL V18.1.0 enforces mutual exclusivity between the FastOnlineDDL mechanism and the BulkLoad data writing feature. When FastOnlineDDL is enabled, the system does not support executing multi-row INSERT statements. Conversely, when multi-row INSERT statements are executed, enabling the FastOnlineDDL feature is not supported.
Simplify the concurrency control policy for node AS
TDSQL V18.1.0 simplifies the concurrency control policy for data migration tasks between instance nodes, which is managed via the following system variables:
migrate-node-rep-group-replicas-parallel-limit: MC's parallel variable that controls the concurrency level of total data migration tasks within the cluster.
tdstore_max_install_snapshot_task: A TDStore system variable that controls the maximum number of Install Snapshot tasks allowed to execute concurrently on a single node when data migration tasks are consumed.
tdstore_max_install_snapshot_rate: A TDStore system variable that controls the download rate for snapshots from the Leader when a single node consumes data migration tasks, measured in bytes/second.
Through these parameter controls, excessively high I/O requests during data migration can be prevented from affecting business read/write operations. The original concurrency control parameter td_basic_opt.max_num_install_snapshot_task has been deprecated starting from version V18.1.0.
The newly introduced system variables can be dynamically modified via SQL without requiring node restarts. For example, the following SQL statement sets the concurrency level for executing Install Snapshot tasks on a single node to 4 and the maximum allowed download speed to 150 MB/s:
set persist tdstore_max_install_snapshot_task = 4;
set persist tdstore_max_install_snapshot_rate = 157286400;
TDBR Backup and Restore Tool Enhancement
Optimizing fetch Logic When Incremental Backup Encounters purge Intervals
During incremental backups when purge intervals are encountered, directly obtain the first_log_index from the DB to avoid retrieving each index individually.
Incremental Recovery Performance Optimization
Incremental recovery performance optimization by aggregating RPCs and reducing the number of RPC transmissions, resulting in a performance improvement of over 200%.
Security Enhancement
None.
Data Migration
BulkLoad Stability Enhancement
TDSQL V18.1.0 version continues to enhance the stability and compatibility of BulkLoad, with the main improvements as follows:
BulkLoad supports transmitting BulkLoadTransInfo to Followers.
Prior to TDSQL V18.1.0, when Install Snapshot is performed, Followers could occasionally crash when replaying BulkLoadCommitLog. Specifically, as migration replicas, Followers might lack corresponding BulkLoad External SST files, causing issues during log replay that would trigger self-termination. To address this issue, starting with TDSQL V18.1.0, the Leader supports transmitting BulkLoadTransInfo to Followers. Upon receiving this information, Followers no longer need to replay BulkLoadCommitLog, thereby preventing node crashes due to missing External SST files.
BulkLoad now stores transmitted SST files using relative paths.
To further adapt to diverse deployment environments and support single-machine multi-instance deployments and Pod deployments, starting from TDSQL V18.1.0, BulkLoad has switched from using absolute paths to relative paths for storing transmitted SST files.
BulkLoad aligns with regular transactions in error handling for duplicate keys.
In MySQL, when data is inserted using multi-row or single-row inserts, encountering duplicate keys such as primary key or unique index conflicts will return the error ERROR 1062 (23000): Duplicate entry 'duplicate_key_value' for key 'key_name'. Starting from TDSQL V18.1.0, TDSQL ensures that errors returned during BulkLoad transactions for duplicate keys align with MySQL client behavior, thereby enhancing compatibility.
BulkLoad optimizes memory allocation for partitioned tables to prevent OOM
Prior to TDSQL V18.1.0, during BulkLoad imports of extremely large transactions or partitioned tables, the system could experience excessive memory consumption due to external sorting of unsorted data, leading to memory thrashing and OOM issues. To address this problem, TDSQL V18.1.0 has optimized BulkLoad's memory usage. Now, each BulkLoad transaction no longer allocates separate memory for external sorting per partitioned table but instead utilizes a shared memory pool, significantly reducing memory consumption. Additionally, TDSQL V18.1.0 introduces the new system variable tdstore_bulk_load_use_unsorted_data_pool to control whether this optimization is enabled. The parameter defaults to true, meaning it is enabled by default. This optimization effectively manages and reduces memory consumption, enhances system stability, and decreases the occurrence of OOM issues during data imports.
Operations
Batch KILL Session Enhancement
Optimizes the efficiency of terminating sessions in batches while ignoring errors caused by invalid session IDs due to session expiration. When you need to terminate specific sessions in batches, the system executes operations more rapidly without interruption or excessive error messages due to partially invalid session IDs.
Collect Metrics by Node Type Under Storage-Compute Separation Architecture
Under the storage-compute separation architecture, the monitoring and alarm page supports collecting metrics for four node types: Hyper, Engine, Storage, and CDC.
MC Schedule Config Parameters Support SQL Querying and Modification.
Ops personnel can directly modify MC parameters and perform other operations via SQL in MySQL Client.
Scheduling SQL: TRANSFER LEADER, MIGRATE, SPLIT, MERGE
Four scheduling SQL commands—TRANSFER LEADER, MIGRATE, SPLIT, and MERGE—have been added to TDSQL as a SQL-based complement to HTTP and RPC interfaces.
The master scheduling switch controls all task behaviors of MC.
MC encapsulates a master scheduling switch. During database upgrades, you only need to call one interface to transition MC into a "quiescent" state, and re-enable it after the upgrade completes.
Reports Relevant Metrics for Transaction Recording
Four new metrics have been added: number of distributed transactions, number of incomplete distributed transactions, number of rolled back distributed transactions, and number of transaction rollbacks.
SHOW PROCESSLIST data collected by hyper-agent is stored separately in individual files.
processlist data collected from hyper-agent is stored separately in individual files, facilitating online Ops.
Save heap profile when hyper node memory usage exceeds a certain threshold.
Supports automatic dumping of a heap file when the SQLEngine process memory reaches a certain threshold, facilitating problem analysis.
information_schema added a compaction information view.
Supports viewing active and historical compaction task information through SQL.
By default, pessimistic locks and participant information are displayed using broadcast queries.
The tables for automatic broadcasting are controlled by the parameter auto_broadcast_tables. Single-table queries against these tables will aggregate information from all nodes.
MC provides an API to query the modification records of SQLEngine SET GLOBAL/PERSIST.
MC provides a query API to retrieve historical modification records of SQLEngine SET GLOBAL/SET PERSIST.
HTTP API usage:
curl -XGET http://127.0.0.1:62379/meta-cluster/api/get-global-variables-modify-history/{cluster_id}?limit=5
Parameter description:
- limit: Specifies the number of recent modification records to obtain; if not specified, a maximum of 256 latest modification records are returned by default.
curl-tool.sh usage:
./sbin/curl-tool.sh ggvmh
mc-ctl usage:
./bin/mc-ctl
>>cluster gv history --limit 5
Bug fixes
Fixed an issue where MC failed to issue ActivateNode tasks to reactivate hyper/engine nodes reporting SQL_INACTIVATE status via heartbeat during normal instance operation.
Fixed an issue where unintended operations on the MC switch during instance specification changes triggered unexpected data scheduling behavior.
Fixed an issue where coroutine leaks caused the old master MC to continue sending completed tasks to TDStore after a master switchover.
Fixed an issue where MC returned its member version to TDStore during leader switchover after scaling out is performed, which resulted in TDStore being unaware of the new MC node list after the leader switchover completed.
Fixed an issue where MC excessively issued migration tasks across multiple nodes during data import operations immediately after cluster creation.
Fixed an issue where after Merge RG tasks are completed, MC's internal data object routing information might be updated incorrectly. This could prevent the system from locating the correct routing when deleting data objects in vanished RGs, causing delete Region tasks to get stuck.
Fixed an issue where during BulkLoad tasks, the MC failed to properly stop scheduling tasks, causing TDStore to trigger KillSelf due to missing SST files. The MC has enhanced both the termination and initiation conditions for tasks.
Fixed the issue where hotspot scheduling could not split data objects out from within the RG.
Fixed an issue where the WaitAllEngineSyncTableSchema call for Online Add Partition was incorrectly positioned, failing to take effect and causing concurrent DML operations to fail.
Fixed an issue where during background statistical updates via RPC, a preceding error caused thd->killed to be set but not reset. When re-initiating the RPC, the operation failed because the thd had been killed, resulting in statistics being permanently unable to update.
Fixed an issue where LogService might fail to open partitioned tables with auto-increment columns during parsing, causing replay tasks to hang.
Fixed an issue where if users had performed operations like ALTER TABLE t1 CHANGE COLUMN CHARACTER SET = utf8mb4; (which triggers DDL copy table operations) after an instance had been running for some time, and then created LogService, the LogService would fail to synchronize the t1 table.
Fixed the issue where DML operations persistently reported the EC_TDS_SCHEMA_CHECK_VERSION_MISMATCH error after Add Partition.
Added a variable max_dd_cache_size to control the allowable cache capacity size for the data dictionary.
Fixed an issue where after a sub-task failure in Online DDL, the system failed to promptly notify the entire task to exit.
Fixed an issue where during incremental backups, if Raft Log had been purged, the backup would fail to start from the first_log_index.
Fixed the issue where BulkLoad did not support using relative paths for file transmission.
Fixed the issue where BulkLoad reported inconsistent errors compared to regular transactions when encountering duplicate primary keys.
Fixed the issue where BulkLoad data import and FastOnlineDDL failed to properly enforce mutual exclusion.
Fixed the issue where inconsistent sst_file_info.json files were computed by different commits during backup and recovery.
Idempotency Guarantee for RG demote-promote Tasks.
Fixed an issue where TDStore, after entering readonly mode due to a full disk, failed to initiate necessary compaction upon exiting readonly mode.
Fixed the issue where when the Region is small, the approximate_size and approximate_keys reported to MC for the Region are inaccurate, causing the Split Region job issued by MC to fail with the error region find split_key not between with start_key and end_key.
Fixed the issue where modifications to the keep_db_log_file_num parameter did not take effect immediately.
Fixed an issue where CDC LogService could cause OOM due to scanning excessive logs during startup.
Fixed the issue where SegmentWriteBufWrapper::write_buf_to_disk entered an infinite loop after the environment disk became full, causing the LogManager's lock to remain unreleased.
Fixed the issue where RG jobs may remain uncompleted when the number of RGs exceeds 4096.