A sharded cluster is distributed MongoDB database architecture. Compared with replica sets, sharded clusters evenly distribute data across shards, which not only greatly increases the capacity, but also distributes the read/write workload across shards, effectively solving the performance bottleneck of replica sets. The trade off is increased complexity in architecture. This document lists some issues you should take note of when using TencentDB for MongoDB sharded clusters.
Sharded Cluster Components
A MongoDB sharded cluster consists of the following components:
shard: each shard contains a subset of the sharded data, and can be deployed as a replica set.
mongos: the mongos acts as a query router, providing an interface between client applications and the sharded cluster.
config servers: config servers store metadata and configuration settings for the cluster, including permission and authentication configurations.
Sharding Strategies and Performance Impact
A MongoDB sharded cluster supports three Sharding (data distribution) methods: range-based, hash-based, and zone/tag-based. Different Sharding methods suit different business scenarios and have varying impacts on performance.
Ranged sharding
Advantages: good performance in shard key range-based query and read
Disadvantages: possibly uneven data distribution with hot spots
Hashed sharding
Advantages: even data distribution, good write performance, and suitable for high concurrency use cases such as logging and Internet of Things
Disadvantages: low range-based query efficiency
Zone/tag-based sharding
Data which has a natural distinction, such as geographical or time distinction, can be distinguished by tags.
Advantages: good data distribution
Choosing Shard Key
A shard key is a field in a document that is used for routing queries.
Choosing an appropriate shard key significantly impacts sharding efficiency, primarily based on the following four factors:
Cardinality
It is recommended to maximize the cardinality of the shard key. If a shard key with low cardinality is used, the limited number of distinct values results in a limited total number of chunks. As data volume grows, the size of each chunk will increase, making chunk migration during horizontal scaling extremely difficult.
For example: if you choose age as a shard key with a cardinality of only up to 100 distinct values, the limited number of chunks will cause each chunk to grow excessively as data volume increases. This leads to chunk growth beyond the configured chunk size, resulting in jumbo chunks. Consequently, these chunks cannot be migrated, causing uneven data distribution and a performance bottleneck.
Value Distribution
It is recommended to maintain a uniform value distribution. A shard key with an uneven distribution can cause certain chunks to hold excessively large amounts of data. This leads to the same issues mentioned above: uneven data distribution and a performance bottleneck.
Query with Sharding
It is recommended to include the shard key in queries. When a conditional query uses the shard key, mongos can directly route the query to the specific shard. Otherwise, mongos must broadcast the query to all shards and then wait for responses.
Avoid Monotonically Increasing or Decreasing Values
A monotonically increasing sharding key results in minimal data file movement. However, write operations become concentrated, causing the data volume in the last shard to continuously increase and triggering frequent migrations. The same issues apply to monotonically decreasing keys.
In summary, when selecting a shard key, consider the four conditions mentioned above and strive to meet as many of them as possible. This approach minimizes the performance impact of MoveChunks operations, thereby achieving optimal performance.
Modifying a Shard Key
In versions prior to MongoDB 4.2, the value of a document's shard key field is immutable.
Starting from MongoDB version 4.2, you can update a document's shard key value unless the shard key field is the immutable _id field. To perform the update, use the following method to update the document's shard key value:
|
| |
| |
| If the shard key modification causes the document to be moved to another shard, you cannot specify multiple shard keys for batch modification; that is, the batch size must be 1. Otherwise, you can specify multiple shard keys for batch modification. |
Note the following when modifying a shard key:
You must perform it in a transaction or on mongos in a retryable write mode. Do not perform it directly on shards.
You must include an equality condition in the complete shard key of the query filter. For example, if you use {country:1, userid:1} as the shard key in a shard collection, to update the document shard key, you must include country:, userid: in the query filter. You can also include other fields in the query as needed.
Balancing and Related Parameters
MongoDB sharded cluster partitions data into chunks. The background process balancer monitors the number of chunks on each shard and migrates the chunks between shards to balance the load on each shard server.
Note:
The system creates an initial chunk, and the chunk size is 64 MB by default.
Because chunk migrations have an impact on cluster read/write performance, you can set the balancing window to avoid the impact during business peak, or run commands to disable balancing.
Commands to manage balancing are described as follows. If you do not have the permission to run the commands, submit a ticket for further assistance. Checking whether balancing is enabled for MongoDB sharded cluster
mongos> sh.getBalancerState()
true
You can also run sh.status() to check the balancing status.
Check whether data is being migrated.
The following tests are performed using MongoDB shell v4.2.23. The returned information may differ for other versions.
mongos> sh.isBalancerRunning()
false
Set the balancer window
Modify the balancer window time:
db.settings.update(
{ _id: "balancer" },
{ $set: { activeWindow : { start : "<start-time>", stop : "<stop-time>" } } },
{ upsert: true }
)
Delete the balancer window:
use config
db.settings.update({ _id : "balancer" }, { $unset : { activeWindow : true } })
Disable the balancer
By default, the balancer can run at any time, and migration only requires the chunks to be migrated. If you need to disable balancing, execute the following commands:
sh.stopBalancer()
sh.getBalancerState()
After the balancer is stopped, check whether any migration processes are running. You can execute the following commands:
use config
while( sh.isBalancerRunning() ) {
print("waiting...");
sleep(1000);
}
Enable the balancer
If you need to re-enable the balancer, execute the following commands:
sh.setBalancerState(true)
If the driver version does not support sh.startBalancer(), execute the following commands to re-enable the balancer:
use config
db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , { upsert: true } )
Collection balancer
Disable the balancer for a specific collection:
sh.disableBalancing("students.grades")
Enable the balancer for a specific collection:
sh.enableBalancing("students.grades")
Check whether the balancer is enabled for a specific collection:
db.getSiblingDB("config").collections.findOne({_id : "students.grades"}).noBalance