tencent cloud

Shard Removal Task: Guide for Confirming the Progress and Troubleshooting Issues
Last updated:2026-02-25 11:24:22
Shard Removal Task: Guide for Confirming the Progress and Troubleshooting Issues
Last updated: 2026-02-25 11:24:22
This document guides you through troubleshooting the causes of blocking of a shard removal task and provides the solutions.

Step 1: Confirming the Balancer Status

Data migration is entirely dependent on the balancer. If the balancer is not enabled, the data on a shard to be removed cannot be migrated out, resulting in the removal task being blocked.

Check Methods

Method 1: On the Parameter Settings page in the console, view the current running value of the openBalance.window parameter. If it is false, it indicates that the balancer is not enabled. If it is true, it indicates that the balancer is enabled.
Method 2: Execute sh.getBalancerState(). If it returns false, it indicates that the balancer is not enabled. If it returns true, it indicates that the balancer is enabled.

Solution

On the Parameter Settings page in the console, click Modify Running Value and set the openBalance.window parameter to true to enable the balancer. For detailed operations, see Database Parameter Adjustment.

Step 2: Checking the Balancing Window

The balancer migrates data only during the set balancing window. If the window period is too small, it will significantly slow down the migration progress.

Check Methods

Method 1: On the Parameter Settings page in the console, view the current running value of the balance.window parameter to obtain the current balancing window.
Method 2: Execute sh.getBalancerWindow(), which returns the start and end times of the window period, for example, { "start" : "00:30", "stop" : "02:30" }.

Solution

On the Parameter Settings page in the console, click Modify Running Value and appropriately increase the window period of the balance.window parameter. For detailed operations, see Database Parameter Adjustment.

Step 3: Obtaining the Task Progress Information

Check how much data (chunks) remains to be migrated on a shard to be removed.
1. Check the number of chunks that need to be migrated on the shard to be removed.
Note:
The shard name format is cmgo-xxxxxxxx_n. xxxxxxxx is the unique instance identifier (8 characters) and n is the shard sequence number starting from 0. For example, a 5-shard instance would have shard names ranging from cmgo-xxxxxxxx_0 to cmgo-xxxxxxxx_4. If you need to remove the last shard, you should specify cmgo-xxxxxxxx_4.
It is recommended to perform query operations on a secondary node to avoid causing performance impacts on the primary node.
mongos> db.getSiblingDB("config").chunks.aggregate([{$match: {shard: "cmgo-xxxxxxxx_3"}},{$group: {_id: "$ns", count: {$sum: 1}}}])
{ "_id" : "config.system.sessions", "count" : 1960 }
If the number of chunks is large, it is recommended to query by collection separately.
db.getSiblingDB("config").chunks.aggregate([
{ $match: {
shard: "cmgo-xxxxxxxx_2",
ns: { $in: [ "config.system.sessions" ] }
}
},
{ $group: {
_id: "$ns",
count: { $sum: 1 }
}
}
])
2. Check the migration rate. Query the number of chunks migrated in the past 24 hours.
// details.from indicates the target shard whose chunks will be migrated.
// Use the time field with ISODate to specify the time range.
mongos> db.getSiblingDB("config").changelog.find({"what": "moveChunk.commit","details.from": "cmgo-xxxxxxxx_3","time": {$gte: ISODate("2025-11-14T00:00:00Z"),$lt: ISODate("2025-11-15T00:00:00Z")}}).count()
256

Step 4: Estimating the Task Completion Time

Under ideal conditions (the service load is stable and the total number of chunks is unchanged), you can estimate the task completion time based on the above information:
Estimated completion time ≈ Number of chunks to be migrated/Number of chunks migrated in the past 24 hours
For example, if the number of chunks to be migrated is 1960, and the number of chunks migrated in the past 24 hours is 256, then the estimated completion time is: 1960/256 ≈ 8 (days).

Step 5: Diagnosing Task Blocking

If no chunks have been successfully migrated for an extended period and some chunks remain on the shard to be removed, the task has likely been blocked.

Most Common Reason: Jumbo Chunks

Unreasonable shard key design (for example, presence of hotspot keys) may cause certain chunks to exceed the maximum size limit. Such chunks cannot be migrated, thus blocking the entire removal task.

Executing the Following Command to Query Jumbo Chunks on the Shard to Be Removed

db.getSiblingDB("config").chunks.aggregate([{$match: {shard: "cmgo-xxxxxxxx_n", jumbo:true}},{$group: {_id: "$ns", count: {$sum: 1}}}])

Solutions

1. Optimize the shard key.
TencentDB for MongoDB 4.4: Use the refineCollectionShardKey command to add a suffix to the original shard key and increase its cardinality.
TencentDB for MongoDB 5.0+: Use the reshardCollection command to set a new shard key for the collection.
2. Delete data. If the business logic permits, you can delete some data within the jumbo chunks to reduce their size.
3. Modify parameters. You can increase the value of the chunkSize parameter to modify the criteria for identifying jumbo chunks.
Removing shards will trigger data migration, which may exert a long-term impact on instance load. It is recommended to prioritize the evaluation of directly downgrading mongod node specifications to achieve cost reduction. For specific operations, see Adjusting Mongod Node Specification.
Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback