Hello,
Been using MongoDB 6.0.4 with Ranged-Sharding.
Noticed that the storage attached to those shards is imbalanced, which became a concern in terms of storage planning.
In this case, we have a 3 shards cluster which in terms of shard size looks balanced but the actual size of the data on disk is very different, be it because of compress/dedup - when our system sees a disk is going to be fully used it will automatically start a new shard with a new replica set, disks and everything, even that the other shards may be 50% used in capacity.
Tried setting the chunkSize
to be 128MB on a different system but didn’t see it has any effects, the chunks seem to go beyond that (right now past 1GB), so not sure how I can use that to solve the issue, also saw this thread about it: Chunk size many times bigger than configure chunksize (128 MB)
This is the first system with 3 shards (without any chunkSize
restriction), each volume here has 512GB size:
shard1:
- node1: volume11: 317GB
- node2: volume12: 326GB
- node3: volume13: 325GB
shard2:
- node1: volume21: 258GB
- node2: volume22: 258GB
- node3: volume23: 258GB
shard3:
- node1: volume31: 186GB
- node2: volume32: 194GB
- node3: volume33: 194GB
[direct: mongos] zios> db.collection_name.getShardDistribution()
Shard shard3 at shard3/***IPs***
{
data: '428.79GiB',
docs: 522351318,
chunks: 4752,
'estimated data per chunk': '92.4MiB',
'estimated docs per chunk': 109922
}
Shard shard1 at shard1/***IPs***
{
data: '429.1GiB',
docs: 300330555,
chunks: 975,
'estimated data per chunk': '450.67MiB',
'estimated docs per chunk': 308031
}
Shard shard2 at shard2/***IPs***
{
data: '428.68GiB',
docs: 290760604,
chunks: 2720,
'estimated data per chunk': '161.38MiB',
'estimated docs per chunk': 106897
}
Totals
{
data: '1286.58GiB',
docs: 1113442477,
chunks: 8447,
'Shard shard3': [
'33.32 % data',
'46.91 % docs in cluster',
'881B avg obj size on shard'
],
'Shard shard1': [
'33.35 % data',
'26.97 % docs in cluster',
'1KiB avg obj size on shard'
],
'Shard shard2': [
'33.31 % data',
'26.11 % docs in cluster',
'1KiB avg obj size on shard'
]
}
As you can see, the distribution between the shards looks ok, but the final disk usage is not balanced at all.
Is there any suggestion we can follow to balance the disk capacities better?
Thanks for your support.