tencent cloud

TDSQL Boundless

Release Notes
Product Introduction
Overview
Scenarios
Product Architecture
Instance Types
Compatibility Notes
Usage specification recommendations
Kernel Features
Kernel Overview
Kernel Version Release Notes
Functionality Features
Performance Features
Billing
Billing Overview
Purchase Method
Pricing Details
Renewal
Overdue Payments
Refund
Getting Started
Creating an Instance
Connect to Instances
User Guide
Data Migration
Data Subscription
Instance Management
Parameter Configuration
Account Management
Security Group
Backup and Restoration
Database Auditing
Tag Management
Use Cases
Technical Evolution and Usage Practices of Online DDL
Lock Mechanism Analysis and Troubleshooting Practices
Data Intelligent Scheduling and Related Practices for Performance Optimization
TDSQL Boundless Selection Guide and Practical Tutorial
Developer Guide
Developer Guide (MySQL Compatibility Mode)
Developer Guide (HBase Compatibility Mode)
Performance Tuning
Performance Tuning Overview
SQL Tuning
DDL Tuning
Performance White Paper
Performance Overview
TPC-C Test
Sysbench Test
API Documentation
History
Introduction
API Category
Making API Requests
Instance APIs
Security Group APIs
Task APIs
Backup APIs
Rollback APIs
Parameter APIs
Database APIs
Data Types
Error Codes
General Reference
System Architecture
SQL Reference
Database Parameter Description
TPC-H benchmark data model reference
Error Code Information
Security and Compliance
FAQs
Agreements
Service Level Agreement
Terms of Service
Privacy Policy
Data Processing And Security Agreement
Contact Us
Glossary

Creating a Partitioned Table

PDF
聚焦模式
字号
最后更新时间: 2026-03-06 18:48:24

Overview

Partitioned tables (Partition Table) are the core means for TDSQL Boundless to handle massive data and achieve horizontal scaling. Unlike single tables, partitioned tables distribute data across multiple storage nodes in a cluster through partition keys (Partition Key). This enables the database to leverage the overall computing power and storage space of the cluster, breaking through the performance bottlenecks of a single machine.

When Should Partitioned Tables Be Used?

If your business meets any one of the following characteristics, be sure to choose partitioned tables:
1. Large data volumes or high-concurrency writes: requiring extremely high write throughput (TPS) beyond the capacity of a single machine.
2. Unlimited growth: The business is in a rapid development phase, unable to estimate the future data upper limit.

Select a Partitioned Strategy

TDSQL Boundless supports four main partitioning policies. Selecting the correct policy is the most critical step in table creation.
1. Partition using HASH
Introduction: Performs modulo operation on the partition key value to randomly and evenly scatter the data.
Applicable scenarios: The most recommended default choice. Suitable for scenarios where the partition key is an integer type (INT/BIGINT), such as user IDs or purely numeric order numbers.
Advantages: Excellent load balancing; high write performance.
2. Partition using KEY
Introduction: Similar to HASH partitioning but supports multiple data types except BLOB/TEXT. The system processes data using an internal hash function.
Applicable scenarios: Scenarios where the partition key is a string type (VARCHAR/CHAR), such as UUIDs, ID card numbers, or business serial numbers (String).
Advantages: Solves the issue where strings cannot be directly processed by HASH modulo operations, eliminating the need for conversion at the business layer (application layer).
3. RANGE partitioning
Introduction: Data is divided based on continuous ranges of key values (such as time periods and ID ranges).
Applicable scenarios: Time-series data (logs, transactions, monitoring).
Advantages: Fast range queries, and expired historical data can be instantly deleted via DROP PARTITION.
4. LIST partitioning
Introduction: Divides data based on enumerated values (discrete list).
Applicable scenarios: Data with clear categorical attributes, such as regions (provinces/cities) or tenant IDs.
Advantages: Clear data categorization facilitates isolated management by category.
Risk: Data skew. If the data volume of a specific category (such as "Guangdong Province") far exceeds that of other categories, the corresponding node will become a bottleneck.

Syntax and Example

Scenario A: Standard E-Commerce Orders (Using HASH)

The partition key is a purely numeric ID.
CREATE TABLE `mall`.`orders` (
`order_id` bigint NOT NULL,
`user_id` bigint NOT NULL,
`amount` decimal(10,2),
`create_time` datetime DEFAULT CURRENT_TIMESTAMP,
-- The primary key must include the partition key
PRIMARY KEY (`order_id`)
)
PARTITION BY HASH(`order_id`)
PARTITIONS 64;

Scenario B: User Center (Using KEY)

The partition key is a UUID string, and KEY partitioning automatically handles hashing.
CREATE TABLE `user_center`.`user_profiles` (
`user_uuid` varchar(36) NOT NULL,
`nick_name` varchar(50),
`register_time` datetime,
PRIMARY KEY (`user_uuid`)
)
PARTITION BY KEY(`user_uuid`)
PARTITIONS 64;

Scenario C: Operation Logs (Using RANGE)

Storing logs monthly facilitates regular cleanup of old data.
CREATE TABLE `sys`.`audit_logs` (
`log_id` bigint NOT NULL,
`content` text,
`log_time` datetime NOT NULL,
PRIMARY KEY (`log_id`, `log_time`)
)
PARTITION BY RANGE COLUMNS(`log_time`) (
PARTITION p202301 VALUES LESS THAN ('2023-02-01'),
PARTITION p202302 VALUES LESS THAN ('2023-03-01'),
PARTITION p_future VALUES LESS THAN (MAXVALUE)
);

Scenario D: By Geographic Distribution (Using LIST)

Businesses are explicitly partitioned by city, with data isolation between different cities.
CREATE TABLE `crm`.`customers` (
`cust_id` bigint NOT NULL,
`city_id` int NOT NULL,
`name` varchar(50),
PRIMARY KEY (`cust_id`, `city_id`)
)
PARTITION BY LIST(`city_id`) (
PARTITION p_bj VALUES IN (10),
PARTITION p_sh VALUES IN (20),
PARTITION p_others VALUES IN (30)
);

Critical Constraints

Partitioned tables have a mandatory hard constraint that must be adhered to: violating this constraint will result in table creation failure.
The primary key and unique index must include the partition key.
Error example:
-- Error: The partition key user_id is not included in the primary key
CREATE TABLE `error_table` (
`id` bigint NOT NULL PRIMARY KEY,
`user_id` bigint NOT NULL
) PARTITION BY HASH(`user_id`);
Correct example:
-- Correct: Add user_id to the composite primary key
CREATE TABLE `correct_table` (
`id` bigint NOT NULL,
`user_id` bigint NOT NULL,
PRIMARY KEY (`id`, `user_id`)
) PARTITION BY HASH(`user_id`);

Use Cases

1. the number of partitions
Policy: Pre-sharding should not only consider the current number of physical nodes, but also account for future cluster scale and expected total data volume.
Recommended value: It is recommended to set it to the maximum expected future number of nodes * 2, which ensures sufficient data dispersion.
2. Avoid modifying the partition key.
Try to avoid using UPDATE statements to modify the value of the partition key. This may cause data rows to migrate between physical nodes (Delete + Insert) at a high cost.
3. Query Optimization
Include the partition key: When querying, always include the partition key in the WHERE clause (for example, WHERE user_id = ?). This allows TDSQL Boundless to directly route the query to the specific node, delivering optimal performance.
Avoid full table scans: Queries without the partition key will trigger broadcast scans (Scatter-Gather), querying all nodes and aggregating the results, which consumes significant resources.

Frequently Asked Questions (FAQ)

What Is the Difference Between HASH and KEY Partitioning?

A: HASH primarily handles integers, with a simple and straightforward algorithm; KEY primarily handles strings (also supports integers), internally using a complex hash function. For string IDs, directly use KEY partitioning.

Q: Can I Use Multiple Columns as the Partition Key?

A: Yes, for example PARTITION BY HASH(col1 + col2) or KEY(col1, col2). However, this increases complexity, and it is generally recommended to use a single high-cardinality column as the partition key.

Q: Can the partition key or partitioning type be modified after table creation?

A: Strongly discouraged. Although syntactically supported, this essentially constitutes a "reshuffle" of the entire dataset, consuming massive system resources (IO/CPU) and potentially causing prolonged business disruption.

帮助和支持

本页内容是否解决了您的问题?

填写满意度调查问卷,共创更好文档体验。

文档反馈