tencent cloud

Lock Mechanism Analysis and Troubleshooting Practices
Last updated:2026-03-06 18:50:08
Lock Mechanism Analysis and Troubleshooting Practices
Last updated: 2026-03-06 18:50:08

1. Introduction

Lock mechanisms are one of the core mechanisms for implementing concurrent access control in relational databases. Understanding their working principles is a key entry point for troubleshooting access conflicts. For example, common lock conflict errors in high-concurrency or complex transaction processing scenarios, (such as "Lock wait timeout exceeded"), essentially reflect that the data resources (such as rows and tables) requested by the current transaction have been locked by other transactions. When the current transaction continuously fails to obtain the lock and reaches the threshold for lock wait timeout, it is actively interrupted.
To resolve lock conflicts, the session holding the lock must release it. The best way to have the session release the lock is to identify the initiator of the long-term lock-holding session and contact the user to complete the transaction (commit or rollback). In emergency situations, the DBA may terminate the session holding the lock.
This article introduces TencentDB TDSQL's latest product—TDSQL Boundless. Using typical case studies, it analyzes the core role of lock mechanisms in transaction concurrency, revealing how they balance data consistency assurance with system throughput enhancement.

2. TDSQL Boundless Architecture Introduction


TDSQL Boundless is a high-performance, highly available enterprise-grade distributed database solution developed by Tencent for financial-grade application scenarios. It adopts a containerized cloud-native architecture, providing high-performance computing capabilities and cost-effective massive storage at the cluster level.
TDSQL Boundless architecture and features: fully distributed + storage-compute integration/storage-compute separation + data plane/control plane separation + high scalability + global consistency + high compression ratio.

3. Lock in TDSQL Boundless

TDSQL Boundless, as a typical distributed system, requires not only mechanisms for concurrent access control within individual nodes but also ensures (Mutual Exclusion) across multiple nodes to prevent data inconsistencies caused by concurrent modifications of the same resource by multiple nodes.

Core Lock Types and Implementation Levels

1. Table-level locks (computing layer) are coarse-grained locks that adopt MySQL's native Metadata Lock (MDL) to resolve concurrency conflicts between DDL/DML operations within a single node.
2. Row-level locks & range locks (storage layer) are fine-grained locks that prevent multiple sessions from concurrently modifying the same record, enabling precise concurrency control.
3. Global object locks (applied by the computing layer and persisted in the TDMC layer) are table-level locks. While table-level locks in the computing layer block concurrent DDL operations within a single node, global object locks coordinate DDL operations across multiple nodes.
Table-level locks act on the computing layer, and their conflict scenarios can be categorized into two types: intra-node and inter-node. The following sections will analyze the specific conflict situations in both scenarios respectively.

Table-Level Lock Conflict

1. Intra-node DDL-DML Conflict
In the following example, both Transaction 1 and Transaction 2 are connected to the same peer node hybrid-1. Transaction 1 explicitly starts a transaction to query table sbtest1 and holds an MDL shared read lock [table-level lock] on this table before the transaction ends. When Transaction 1 is still ongoing, Transaction 2 executing a DDL statement will be blocked, thereby protecting the metadata of table sbtest1.

2. Intra-node DDL-DDL Conflict
Similarly, it is still assumed that both Transaction 1 and Transaction 2 are connected to the same peer node hybrid-1. Transaction 1 is performing a DDL operation on table sbtest1, holding the MDL shared lock and exclusive lock [table-level lock] on this table in phases, preventing the subsequent DDL operation of Transaction 2 from compromising the metadata of table sbtest1.

On the same peer node, DDL and DML operations are treated equally—whichever acquires the MDL lock first executes, with no inherent priority, as the MDL lock exists as an in-memory state within the process. However, TDSQL Boundless is a distributed database where sessions are distributed across all nodes. What would happen if sessions connected to two different peer nodes perform operations on the same table?
3. Inter-node DDL-DML Conflict
Let's look at the following example: Transaction 1 and Transaction 2 are connected to peer nodes hybrid-1 and hybrid-2 respectively.
Transaction 1 explicitly starts a transaction to query table sbtest1. Subsequently, Transaction 2 successfully executes a DDL statement on another node hybrid-2 (since no other session accesses table sbtest1 on this node, no lock conflict occurs), pushing up the schema version of table sbtest1.
Subsequently, Transaction 1 continues to execute the query on table sbtest1 and reports an error. For the Repeatable Read isolation level, a Consistent Read View is generated only during the first query of a transaction. Returning records under a new version of the table structure would violate this principle, resulting in ERROR 1412 and prompting transaction retry (when catching this Exception, the business program should perform rollback and retry the transaction).

4. Inter-node DDL-DDL Conflict
Cross-node DDL operations rely on the global object lock mechanism in the TDMC layer to achieve mutual exclusion, ensuring data consistency and operational orderliness in a distributed environment.
The following example demonstrates mutual blocking of DDL operations on the same table. The difference from previous cases is that these operations are executed on different nodes. Here, mutual exclusion is not ensured by node-level MDL locks but by the global object lock in TDMC.

Row-level locks & range locks act on the storage layer. When two sessions encounter row lock or range conflicts, it indicates that both ultimately accessed the primary replica of the same storage node. Thus, regardless of whether the sessions are connected to the same SQLEngine node, the outcome remains identical. The following analysis will examine specific conflict scenarios involving range locks, row-level locks, and the particularly distinct case of row-level deadlocks.

Range Lock Conflict

When operations are performed on a range of a table, the TDSQL Boundless storage layer applies a range lock (range lock). The lock covers a left-closed, right-open key interval, behaving similarly to InnoDB's next-key lock under the Repeatable Read isolation level.
The following example demonstrates that under the Repeatable Read isolation level, updating the id range [5, 11] will lock the gaps in that range to prevent phantom reads caused by new record insertions.


Row-Level Lock Conflict

When an operation is performed on a single key of a table, a row-level lock is applied. TDSQL Boundless maximizes concurrency capability of the database through its refined locking mechanism.
Specific examples will not be listed here. Compared to range locks, this approach only applies locks to individual rows, which can be analyzed independently.

Deadlock Conflict

Deadlock is a special case of row-level lock conflicts. It occurs when two or more sessions are waiting for data locked by the other party. Since both parties are waiting for the other, neither can complete the transaction to resolve the conflict.
TDSQL Boundless features automatic deadlock detection capability. By default, it rolls back the transaction with fewer data writes and returns an error, thereby releasing all other locks held by that session so that another session can proceed with its transaction.
In the following example, after Transaction 1 and Transaction 2 enter a deadlock, the deadlock detection mechanism takes effect. Transaction 2 is rolled back, allowing Transaction 1 to proceed.


Is There a Table-Level Lock at the Storage Layer?

InnoDB provides table-level S-locks and X-locks, but they offer "little practical value". These locks do not provide additional protection and only reduce concurrency capability. Therefore, the TDStore storage layer does not implement table-level locks. It only supports syntax parsing without actual functionality:
locktables sbtest1 read;
Query OK,0rows affected,1 warning (0.02 sec)

showwarnings;
+---------+------+-----------------------------------------------------------------------------------+
|Level| Code | Message |
+---------+------+-----------------------------------------------------------------------------------+
| Warning |8533|LOCK/UNLOCKoptionis used for compatibility only,and it does not actually work.|
+---------+------+-----------------------------------------------------------------------------------+


4. Best Practices

Lock timeouts and deadlocks are common issues in high-concurrency database systems. Effective lock management requires systematic optimization across three dimensions: business logic, database configuration, and troubleshooting.

Business Logic

Transaction design is the primary step in preventing lock issues. Adhering to the three principles of "short transactions, lightweight operations, sequential access" can effectively reduce the probability of lock conflicts.
1. Transaction execution time should be minimized. For instance, keep the number of rows operated per transaction under 2000. The more SQL statements included in a transaction and the larger the number of rows operated, the longer the lock holding time becomes, and the higher the probability of conflicts with other transactions.
2. Avoid performing user interactions during transactions; keep only essential data operations within the transaction to ensure rapid completion and resource release.
3. Ensure consistent order of resource access within transactions. When multiple transactions need to access multiple resources, they must access these resources in a standardized sequence, which is crucial for preventing deadlocks. For example, in transactions involving deducting inventory and creating orders, all transactions should either lock the inventory table first followed by the order table, or vice versa. However, it must be avoided that some transactions lock inventory first while others lock orders first, preventing circular wait scenarios.
4. TDSQL Boundless already supports Online DDL capabilities for most scenarios. However, it is still recommended to review the OnlineDDL documentation before DDL operations are initiated. Additionally, performing such operations during off-peak business hours is advised.

Database configuration

Secondly, when lock wait timeouts occur in production systems, priority should be given to verifying whether the configuration of relevant parameters is reasonable. For instances upgraded from previous versions, special attention must be paid to whether newly added lock control parameters in subsequent versions remain disabled for compatibility requirements. Enable them manually if necessary.
Database-related parameters:
1. tdsql_lock_wait_timeout: Controls the maximum lock wait time, with a default value of 50 seconds. In the scenario demonstrated in the diagram above, if a blocked session still cannot acquire the lock within 50 seconds, it will report the error "Lock wait timeout exceeded". Generally, there is no need to adjust the default value. If severe lock conflicts occur in the business that cannot be quickly resolved, this value can be appropriately decreased as a temporary measure.
2. tdstore_deadlock_detect (requires superuser privileges): Automatic deadlock detection feature. Default value is ON for newly purchased instances, and it is recommended to keep it enabled. For instances upgraded from older versions, this feature is disabled by default and can be manually enabled if required.
3. tdstore_deadlock_victim (requires superuser privileges): Determines which transaction to roll back when a deadlock occurs [This parameter takes effect only when the tdstore_deadlock_detect deadlock detection feature is enabled]. The default value is "WRITE_LEAST", aligning with InnoDB behavior. Generally, no adjustment is needed. When set to "WRITE_LEAST", it prioritizes rolling back the transaction with fewer writes; when set to "START_LATEST", it prioritizes rolling back the transaction started later.

troubleshooting

Chapter 3 presents a comparative case analysis to intuitively demonstrate TDSQL Boundless' DDL/DML conflict handling mechanisms in both intra-node and cross-node scenarios. Errors encountered in actual business scenarios fall into two categories: DDL timeout failures or DML timeout failures. Follow the steps below for troubleshooting and resolution.
Note:
Refer to the definitions of dictionary tables in System Tables and System Views for the following content.
1. DDL Timeout Failure:
If DDL execution reports the error "Lock wait timeout exceeded", it indicates that the session executing DDL is blocked by DML or DDL on the same node; if DDL execution reports the error ERROR 8542 (HY000): Acquire object lock 'test.sbtest1' from MC wait timeout, sql-node: node-tdsql3-xxxxxxxx-xxx, it indicates that the session executing DDL is blocked by DDL on another node.
You can query performance_schema.metadata_locks to view the usage status of SQLEngine MDL locks on the current node [Note: In TDSQL Boundless, there's no need to set the performance_schema system variable to ON].
# Verify that session1 and session2 are connected to the same node.
# If not connected to the same node, the ddl can be executed successfully. Refer to "Cross-node ddl-DML Conflict".

show variables like'hostname';

#session1
BEGIN;
UPDATE sbtest1 SET k =0WHERE id =999;

#session2
ALTERTABLE sbtest1 ADDCOLUMN new_column VARCHAR(255);

#View metadata_locks to see:
#The first row with LOCK_STATUS=GRANTED corresponds to session1, indicating that the MDL lock has been acquired;
#The second row with LOCK_STATUS=PENDING corresponds to session2, indicating that obtaining the MDL lock is pending.
#The broadcast HINT needs to be added at the beginning to specify broadcasting queries on all nodes.

/*#broadcast*/select*from performance_schema.metadata_locks where OBJECT_NAME='sbtest1' \\G
***************************1.row***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: sbtest1
COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140384374661472
LOCK_TYPE: SHARED_WRITE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE: sql_parse.cc:6373
OWNER_THREAD_ID: 4879164
OWNER_EVENT_ID: 1
***************************2.row***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: sbtest1
COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140375267009376
LOCK_TYPE: SHARED
LOCK_DURATION: EXPLICIT
LOCK_STATUS: PENDING
SOURCE: ddl_executer.cc:245
OWNER_THREAD_ID: 4879122
OWNER_EVENT_ID: 1
2rowsinset(0.02 sec)

#When repeated queries are performed and show no change in the session with LOCK_STATUS: GRANTED, locate the SESSION ID corresponding to the thread holding the lock via OWNER_THREAD_ID. After confirming the session can be safely terminated, first end it with the KILL command, then reinitiate the DDL operation.
/*#broadcast*/select*from performance_schema.threads where THREAD_ID=4879164\\G
***************************1.row***************************
THREAD_ID: 4879164
NAME: thread/sql/one_connection
TYPE: FOREGROUND
PROCESSLIST_ID: 2346946
PROCESSLIST_USER: xxxxx
PROCESSLIST_HOST: xxx.xxx.xxx.xxx
PROCESSLIST_DB: test
PROCESSLIST_COMMAND: Sleep
PROCESSLIST_TIME: 1330
PROCESSLIST_STATE:
PROCESSLIST_INFO:
PARENT_THREAD_ID:
ROLE:
INSTRUMENTED: YES
HISTORY: YES
CONNECTION_TYPE: TCP/IP
THREAD_OS_ID: 45448
RESOURCE_GROUP:
SQLEngine_id: node-tdsql3-db38679b-002
1rowinset(0.02 sec)

#KILL the lock holder (session with LOCK_STATUS: GRANTED).
#session1 was killed.

#session2
ALTERTABLE sbtest1 ADDCOLUMN new_column VARCHAR(255);
Query OK,0rows affected (1.67 sec)
Records: 0 Duplicates: 0 Warnings: 0

2. DML Timeout Failure:
If DML execution reports the error "Lock wait timeout exceeded", it generally indicates a failure to acquire a row-level lock or encountering a deadlock. For legacy instances upgraded from older versions, it is first recommended to check the automatic deadlock detection feature. If it is disabled, enabling it (without requiring a restart) is advised. Subsequently, if deadlocks occur again, the system will automatically resolve them.
If lock timeout issues persist, use the following method to identify the session ID holding the lock (Lock Holder), terminate the blocking session via KILL, release resources occupied by row locks, and allow the waiting session (Lock Waiter) to proceed.
You can query performance_schema.data_locks and performance_schema.data_lock_waits to view session information about locks held and locks waited for in TDStore [Note: In TDStore, there's no need to set the performance_schema system variable to ON].
#Regardless of whether session1 and session2 are connected to the same node, the result remains the same; because row-level locks operate at the storage layer, a conflict between the two sessions indicates they both accessed the primary replica of the same storage node.

#session1
SELECT id FROM sbtest1 ORDERBY id limit10;
+----+
| id |
+----+
|1|
|3|
|5|
|7|
|9|
|11|
|12|
|13|
|14|
|15|
+----+
10rowsinset(9.23 sec)

BEGIN;
UPDATE sbtest1 SET k=50000WHERE id<=11AND id>=5;

#session2
INSERTINTO sbtest1(id)VALUES(8);

#Query the pessimistic lock information of the Lock Holder (Lock Holder) on the current node.
#In TDStore, the range lock is left-closed and right-open, as seen in the ENGINE_LOCK_ID column displaying the interval [5,12).

SELECT data_locks.*FROM performance_schema.data_locks, performance_schema.data_lock_waits WHERE blocking_engine_lock_id = engine_lock_id \\G
***************************1.row***************************
ENGINE: RocksDB
ENGINE_LOCK_ID: 29374591168151726_[00002C7B80000005,00002C7B8000000C)
ENGINE_TRANSACTION_ID: 29374591168151726
THREAD_ID: 1093359
EVENT_ID: NULL
OBJECT_SCHEMA:
OBJECT_NAME:
PARTITION_NAME: NULL
SUBPARTITION_NAME: NULL
INDEX_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140400912025696
LOCK_TYPE: PRE_RANGE
LOCK_MODE: Write
LOCK_STATUS: GRANTED
LOCK_DATA: NULL
START_KEY: 00002C7B80000005
END_KEY: 00002C7B8000000C
EXCLUDE_START_KEY: 0
BLOCKING_TRANSACTION_NUM: 1
BLOCKING_CHECK_READ_TRANSACTION_NUM: 0
READ_LOCKED_NUM: 0
TINDEX_ID: 11387
DATA_SPACE_TYPE: DATA_SPACE_TYPE_USER
REPLICATION_GROUP_ID: 257
KEY_RANGE_REGION_ID: 1302791
1rowinset(0.04 sec)

#View the session information of the Lock Holder (Lock Holder).
# !!! In TDStore's PERFORMANCE_SCHEMA.DATA_LOCKS table, the thread_id column refers to processlist_id; whereas in official mysql, thread_id refers to the thread_id in Performance_Schema.threads. This issue will be fixed in future versions.

select*from information_schema.processlist where id=1093359\\G
***************************1.row***************************
ID: 1093359
USER: tdsql_admin
HOST: xxx.xxx.xxx.xxx:35956
DB: test
COMMAND: Sleep
TIME: 946
STATE: NULL
INFO: NULL
TIME_MS: 945727
ROWS_SENT: 0
ROWS_EXAMINED: 4
1rowinset(0.12 sec)


#After the lock holder (Lock Holder) is KILLed, the blocked session was executed successfully.
#session1 was killed.

#session2
INSERTINTO sbtest1(id)VALUES(8);
Query OK,1row affected (43.78 sec)

#Note: In high-concurrency scenarios, the queue of Lock Waiters may be lengthy, potentially resulting in new Lock Holders emerging. Multiple termination attempts may be required.
Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback