This document guides you through troubleshooting the causes of blocking of a shard removal task and provides the solutions.
Step 1: Confirming the Balancer Status
Data migration is entirely dependent on the balancer. If the balancer is not enabled, the data on a shard to be removed cannot be migrated out, resulting in the removal task being blocked.
Check Methods
Method 1: On the Parameter Settings page in the console, view the current running value of the openBalance.window parameter. If it is false, it indicates that the balancer is not enabled. If it is true, it indicates that the balancer is enabled. Method 2: Execute sh.getBalancerState(). If it returns false, it indicates that the balancer is not enabled. If it returns true, it indicates that the balancer is enabled.
Solution
On the Parameter Settings page in the console, click Modify Running Value and set the openBalance.window parameter to true to enable the balancer. For detailed operations, see Database Parameter Adjustment. Step 2: Checking the Balancing Window
The balancer migrates data only during the set balancing window. If the window period is too small, it will significantly slow down the migration progress.
Check Methods
Method 1: On the Parameter Settings page in the console, view the current running value of the balance.window parameter to obtain the current balancing window. Method 2: Execute sh.getBalancerWindow(), which returns the start and end times of the window period, for example, { "start" : "00:30", "stop" : "02:30" }.
Solution
On the Parameter Settings page in the console, click Modify Running Value and appropriately increase the window period of the balance.window parameter. For detailed operations, see Database Parameter Adjustment. Step 3: Obtaining the Task Progress Information
Check how much data (chunks) remains to be migrated on a shard to be removed.
1. Check the number of chunks that need to be migrated on the shard to be removed.
Note:
The shard name format is cmgo-xxxxxxxx_n. xxxxxxxx is the unique instance identifier (8 characters) and n is the shard sequence number starting from 0. For example, a 5-shard instance would have shard names ranging from cmgo-xxxxxxxx_0 to cmgo-xxxxxxxx_4. If you need to remove the last shard, you should specify cmgo-xxxxxxxx_4.
It is recommended to perform query operations on a secondary node to avoid causing performance impacts on the primary node.
mongos> db.getSiblingDB("config").chunks.aggregate([{$match: {shard: "cmgo-xxxxxxxx_3"}},{$group: {_id: "$ns", count: {$sum: 1}}}])
{ "_id" : "config.system.sessions", "count" : 1960 }
If the number of chunks is large, it is recommended to query by collection separately.
db.getSiblingDB("config").chunks.aggregate([
{ $match: {
shard: "cmgo-xxxxxxxx_2",
ns: { $in: [ "config.system.sessions" ] }
}
},
{ $group: {
_id: "$ns",
count: { $sum: 1 }
}
}
])
2. Check the migration rate. Query the number of chunks migrated in the past 24 hours.
mongos> db.getSiblingDB("config").changelog.find({"what": "moveChunk.commit","details.from": "cmgo-xxxxxxxx_3","time": {$gte: ISODate("2025-11-14T00:00:00Z"),$lt: ISODate("2025-11-15T00:00:00Z")}}).count()
256
Step 4: Estimating the Task Completion Time
Under ideal conditions (the service load is stable and the total number of chunks is unchanged), you can estimate the task completion time based on the above information:
Estimated completion time ≈ Number of chunks to be migrated/Number of chunks migrated in the past 24 hours
For example, if the number of chunks to be migrated is 1960, and the number of chunks migrated in the past 24 hours is 256, then the estimated completion time is: 1960/256 ≈ 8 (days).
Step 5: Diagnosing Task Blocking
If no chunks have been successfully migrated for an extended period and some chunks remain on the shard to be removed, the task has likely been blocked.
Most Common Reason: Jumbo Chunks
Unreasonable shard key design (for example, presence of hotspot keys) may cause certain chunks to exceed the maximum size limit. Such chunks cannot be migrated, thus blocking the entire removal task.
Executing the Following Command to Query Jumbo Chunks on the Shard to Be Removed
db.getSiblingDB("config").chunks.aggregate([{$match: {shard: "cmgo-xxxxxxxx_n", jumbo:true}},{$group: {_id: "$ns", count: {$sum: 1}}}])
Solutions
1. Optimize the shard key.
TencentDB for MongoDB 4.4: Use the refineCollectionShardKey command to add a suffix to the original shard key and increase its cardinality. TencentDB for MongoDB 5.0+: Use the reshardCollection command to set a new shard key for the collection. 2. Delete data. If the business logic permits, you can delete some data within the jumbo chunks to reduce their size.
3. Modify parameters. You can increase the value of the chunkSize parameter to modify the criteria for identifying jumbo chunks.
Removing shards will trigger data migration, which may exert a long-term impact on instance load. It is recommended to prioritize the evaluation of directly downgrading mongod node specifications to achieve cost reduction. For specific operations, see Adjusting Mongod Node Specification.