Feature Overview
TDSQL Boundless provides the preemptive DDL feature to resolve the issue where DDL operations in distributed scenarios experience prolonged waits or even timeout failures due to MDL-S locks being held by long-running transactions on other nodes.
Enabling preemptive DDL ensures that when a DDL operation waits for an MDL-X lock on the target node beyond the specified duration, the system proactively terminates sessions blocking MDL-X lock acquisition, guaranteeing successful completion of the DDL operation.
Background Information
DDL Lock Coordination in TDSQL Boundless Distributed Architecture
TDSQL Boundless adopts a distributed architecture where each SQLEngine node possesses independent read and write capabilities. When a node performs a DDL operation, it needs to acquire locks at the following two levels:
1. Local Node MDL Lock: The DDL operation requires acquiring an MDL-X lock on the target table at the initiating node, which is mutually exclusive with active transactions on the local node.
2. Global Object Lock: DDL needs to obtain the global object lock via the metadata service (TDMC) to ensure that only one node can modify the table structure at a time, preventing cross-node DDL conflicts.
Symptom
On the same node, if long-running transactions or large queries exist on the target table, these sessions hold MDL-S locks on the table. The DDL operation requires an MDL-X lock, which is mutually exclusive with the MDL-S lock, causing the DDL to be blocked. If the long-running transaction does not complete within the lock wait timeout period, the DDL operation will fail.
Typical Error Message:
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Without preemptive DDL enabled, handling such issues typically requires DBAs to manually locate and terminate lock-holding sessions, a process that is cumbersome and often delayed.
The solution approach for preemptive DDL
Preemptive DDL provides an automated solution: when a DDL operation waits for an MDL lock beyond the specified duration, the system automatically terminates sessions blocking the DDL, releasing MDL lock resources to allow the DDL operation to proceed. Compared with manual intervention, this approach offers faster response times and more reliable operations.
Limitations
Business Impact: After preemptive DDL is enabled, sessions blocking DDL will be forcibly terminated by the system, and uncommitted transactions in those sessions will be rolled back. Please evaluate your business's tolerance for transaction interruption.
Recommendation for off-peak usage: Although preemptive DDL can automatically handle lock conflicts, it is still recommended to perform DDL changes during off-peak business hours to minimize impact on normal operations.
Parameter Description
The behavior of Preemptive DDL can be controlled through the following parameters, which can be configured either on the parameter configuration page in the console or via SQL commands.
|
tdsql_ddl_block_mode | Session | Controls the lock acquisition behavior for normal-thread (normal) DDL; enables preemptive DDL logic when set to "preemptive". | preemptive nonblock default | preemptive |
tdsql_ddl_recovery_block_mode | Global | Controls the lock acquisition behavior for recovery-thread (recovery) DDL; enables preemptive DDL logic when set to "preemptive". Only GLOBAL-level settings take effect; SESSION-level settings are invalid. | preemptive nonblock default | preemptive |
tdsql_ddl_preempt_after_wait_seconds | Session | The tolerance duration for Preemptive DDL to wait for an MDL-X lock; automatically triggers the lock preemption process upon exceeding this duration. Unit: seconds. | 1 ~ 31536000 | 50 |
Usage
1. Enable Preemptive DDL.
Enable the feature in the session executing DDL:
SET tdsql_ddl_block_mode = 'preemptive';
2. (Optional) Adjust the preemption wait time.
For example, set it to 5 seconds:
SET tdsql_ddl_preempt_after_wait_seconds = 5;
3. (Optional) Configure recovery-thread Preemptive DDL.
SET GLOBAL tdsql_ddl_recovery_block_mode = 'preemptive';
4. Execute DDL operations.
ALTER TABLE orders ADD COLUMN remark VARCHAR(255);
If a long-running transaction on the current node blocks the MDL lock of the table, the system will automatically terminate the blocking session after waiting for the specified duration, and the DDL operation will proceed until completion.
Usage Examples
Preparations
Create a test table and insert data:
CREATE TABLE t1 (id INT PRIMARY KEY, name VARCHAR(50));
INSERT INTO t1 VALUES (1, 'Alice'), (2, 'Bob');
Preemptive DDL is not enabled
Session 1 (simulating a long transaction):
Session 2 (executing DDL):
ALTER TABLE t1 ADD COLUMN age INT;
Result: DDL execution failed, requiring manual intervention by DBAs to locate the session holding the lock via performance_schema.metadata_locks.
Enable Preemptive DDL
Session 1 (simulating a long transaction):
Session 2 (enabling preemptive DDL and executing):
SET tdsql_ddl_block_mode = 'preemptive';
SET tdsql_ddl_preempt_after_wait_seconds = 5;
ALTER TABLE t1 ADD COLUMN age INT;
Query OK, 0 rows affected (5.067 sec)
Records: 0 Duplicates: 0 Warnings: 0
Changes to Session 1:
SELECT * FROM t1;
ERROR 2013 (HY000): Lost connection to MySQL server during query
Result: DDL executed successfully. Session 1 was terminated, the transaction was rolled back, and subsequent operations need to be reinitiated.
Troubleshoot and diagnose
When DDL operations encounter MDL lock blocking, you can also manually troubleshoot using the following methods as a complementary approach to preemptive DDL:
1. Check MDL lock status:
SELECT * FROM performance_schema.metadata_locks WHERE OBJECT_NAME = 't1';
2. Locate the session holding the lock and refer to PROCESSLIST_ID Manually terminate the session holding the lock:
SELECT
ml.OBJECT_NAME,
ml.LOCK_TYPE,
ml.LOCK_STATUS,
t.PROCESSLIST_ID,
t.PROCESSLIST_INFO
FROM performance_schema.metadata_locks ml
JOIN performance_schema.threads t
ON ml.OWNER_THREAD_ID = t.THREAD_ID
WHERE ml.OBJECT_NAME = 't1' AND ml.LOCK_STATUS = 'GRANTED';
Note:
In distributed scenarios, if a DDL operation reports the error ERROR 8542 ... Acquire object lock ... wait timeout, it indicates that the DDL is blocked by another node's DDL through a global object lock. This scenario involves DDL-DDL conflict, requiring waiting for the other DDL to complete. Preemptive DDL does not apply to this scenario.
References