To help you better plan table structure and optimize performance, this document provides a detailed introduction to the core rules and suggestions for shard allocation in TcaplusDB.
Basic Concept and Limitations of the Shard
TcaplusDB adopts a sharding mechanism to achieve distributed storage and access of data. Its core mechanism involves dividing the data of each table into multiple shards, which are distributed across different storage nodes (tcapsvr). The distribution of data is based on a hash calculation performed on the splittablekey of the table (the default value is the primary key if not specified), followed by taking the modulus with 10,000 (namely, hash(splittablekey) % 10,000) to determine the associated ShardID. Each table can be divided into a maximum of 10,000 shards.
Capacity Planning Suggestions of the Shard
Reasonable shard capacity planning is the key to ensuring the performance and stability.
Physical upper limit: Each shard supports a maximum data volume of 256 GB. Exceeding the limit causes write failures.
Operation suggestions: To avoid performance issues and Ops complexity that may arise from having a single shard that is too large, it is recommended to maintain the data volume of each shard within 30 GB during daily operation. Smaller shards typically have better access performance. Simultaneously, the business personnel can determine whether the number of shards for a corresponding table needs to be increased based on the current latency conditions (generally, the average access latency for a 1 KB request should be within 10 ms).
Memory pre-allocation: In the official environment, each shard is allocated 1 GB of memory space by default for caching hot data. When the data volume exceeds the threshold, the excess data volume will be stored on the disk.
Planning Principle of the Shard Quantity
Due to the limited number of shards for each group, the total number of shards for a table needs to be calculated considering factors such as total data volume, access volume, and costs to arrive at a relatively reasonable value.
Minimum number of shards for a single table: one. It indicates that each table has at least one shard, and one shard is generally assigned by default when a table is added.
Calculation formula reference: You can estimate the required number of shards based on the expected total data volume and the recommended capacity for a single shard (such as 30 GB). For example, a table with an expected total data volume of 3 TB theoretically requires (3 x 1024) / 30 ≈ 103 shards.
Access performance consideration: A single shard can support a QPS in the tens of thousands level (the specific values will depend on factors such as record size and request type). For tables with large expected access volumes, it is recommended to set more shards to improve overall throughput capacity.
Memory correlation: The number of shards is also closely related to memory planning. A common baseline for server configuration is to allocate approximately 112 shards for a storage node machine with 128 GB of physical memory. This ensures that each shard has about 1 GB of pre-allocated memory space.
Summary and Suggestions
|
Maximum number of shards for a single table | 10,000 | Upper limit of table shards. |
Upper limit of capacity for a single shard | 256GB | Physical hard limit. |
Recommended capacity for the operation of a single shard | ≤ 30GB | To ensure performance and ease of Ops, this standard is recommended; however, for tables with small access volume, exceeding 30 GB is generally acceptable. |
Memory pre-allocation for a single shard | 1GB | The excess part is stored on the disk. |
Determining factors of the number of shards | Total data volume & access volume (QPS) | Data volume determines the foundation, while access volume determines the distribution. For tables with high QPS, it is recommended to use more shards with smaller individual shard sizes. |
Processing capability for a single shard | About tens of thousands of QPS | Specific performance varies by record size and request type. |
Memory configuration reference | 128 GB memory machine ≈ 112 shards | Ensure that each shard has about 1 GB of memory. |
Note:
The values provided above are for general reference, and they may vary in actual deployment based on specific configurations and business scenarios.
In summary, shard allocation requires a balance between data volume, performance, and cost. It is generally recommended to initially determine the total number of shards based on the expected data volume per shard (such as 30 GB) and then assess whether the number of shards can meet your access performance (QPS) requirements, while also ensuring that the total number of shards does not exceed what you have purchased. Within the limit of the total number, it is recommended to allocate more shards for core high-frequency access tables of your businesses.