Overview
Online DDL (Online Data Definition Language) aims to modify database table structures while supporting concurrent read and write operations, reducing the table lock time and ensuring database availability. In distributed environments, DDL operations need to be performed simultaneously across multiple nodes, requiring not only guaranteed safety for concurrent DDL & DML operations but also considerations for performance, execution efficiency, and crash-safe mechanisms.
How It Works
Online DDL Capability
TDSQL Boundless already supports Online DDL capabilities in most scenarios. For details, see Online DDL Notes. Usage Examples
CREATE TABLE sbtest1 (id int primary key, v1 int);
INSERT INTO sbtest1 (id, v1) values (1,1), (2,2), (3,3);
ALTER TABLE sbtest1 ADD INDEX idx_v1(v1);
ALTER TABLE sbtest1 ADD COLUMN v2 int;
Recommendations for Creating Indexes on Large-Volume Tables
The Fast Online DDL capability of TDSQL Boundless, by combining parallel processing and bypass writing, makes DDL operations more efficient and convenient.
However, if we fail to correctly distinguish between large/small tables or implement appropriate partitioning based on data scale, the execution efficiency of Fast Online DDL may be significantly compromised. This occurs because when a large table lacks proper partitioning, data tends to concentrate on a single node, forcing DDL operations to execute serially on that single node rather than being parallelized across multiple nodes, which substantially reduces execution efficiency.
Only by reasonably utilizing partitioned tables based on data scale can the distributed scalability of Fast Online DDL be fully leveraged.
Partitioning Recommendations:
1. TDSQL Boundless is 100% compatible with native MySQL partitioned table syntax, supporting first/second-level partitioning. It is primarily designed to address: (1) the capacity issues of large tables; (2) the performance issues under high-concurrency access.
2. Large table capacity issues: If a single table is expected to exceed the data disk capacity of a single node in the future, it is recommended to create first-level hash or key partitioning to evenly distribute data across multiple nodes. If data volume continues to grow, elastic scaling can be used to "progressively reduce disk usage".
3. Performance issues under high-concurrency access: For TP services experiencing high-concurrency access, if a single node's performance is expected to be insufficient to handle excessive read/write pressure, it is also recommended to create first-level hash or key partitioning to evenly distribute the read/write load across multiple nodes.
4. For partitioned tables created in Point 2 and Point 3, it is recommended to select fields that satisfy most core business queries as the partition key based on business characteristics, and the number of partitions should be a multiple of the number of instance nodes.
5. If there is a need for data cleanup, you can create a RANGE partitioned table and use the truncate partition command for quick data cleanup. To also distribute data while achieving cleanup, you can further create a partitioned table with secondary HASH partitioning.
Relevant parameters
|
tdsql_ddl_fillback_mode
| ThomasWrite | Algorithm used in the data backfilling phase during Online DDL. ThomasWrite: the default algorithm, which ignores stale writes. IngestBehind: a backfilling algorithm based on bulk load writing. |
|
max_parallel_ddl_degree
| 8 | Control the maximum parallelism during the data backfilling phase of Online DDL. Increasing this value can accelerate data backfilling but will consume additional CPU, I/O, and memory resources. It can be adjusted based on server hardware configuration and business workload. |
|