hello,
in version 4.4 anybody have a concern or recommendation about “Compact” command,
note : my database exceeded 3.5 TB,
thanks a lot
Hi @Abdelrahman_N_A,
If you were using a server release of MongoDB earlier than 4.4, I’d definitely have serious concerns about blocking side effects of a compact
operation in production. Removing the blocking behaviour was one of the improvements included in the MongoDB 4.4 release, so that is a positive change from previous releases.
However, although compact
will no longer block CRUD operations for the database containing the collection being compacted, there could still be a significant impact on your working set if you are compacting a large collection.
Considerations before you compact
Before running compaction I would check that this might be useful to do based on:
- The
file bytes available for reuse
metric (viadb["collectionname"].stats().wiredTiger["block-manager"]
in themongo
shell) - Likelihood that you won’t be inserting that much data into the collection in the near future
It is normal to have some reusable space for a collection with active updates. Excessive reusable space is typically the result of deleting a large amount of data, but can sometimes be related to your workload or the provenance of your data files.
The outcome of a compact
operation is dependent on the storage contents, so I would draw your attention to the note on Disk Space in the compact
documentation:
On WiredTiger, compact attempts to reduce the required storage space for data and indexes in a collection, releasing unneeded disk space to the operating system. The effectiveness of this operation is workload dependent and no disk space may be recovered. This command is useful if you have removed a large amount of data from the collection, and do not plan to replace it.
Running compact
in production
If this is a production environment, I would hope you have a replica set or sharded cluster deployment so you can minimise the operational impact.
If you have many large collections to compact (or want a more likely outcome of freeing up disk space), Re-syncing a Secondary Member of a Replica Set via initial sync will rebuild all of the data files by copying over the data from another member. If compact
doesn’t end up freeing up enough space, this would be the next procedure to run.
If you do decide to run compact
in a production environment, I would minimise the operational impact by:
- Always having a replica set deployment (ideally a minimum of three data-bearing members, no arbiters)
- Run
compact
operations on one secondary at a time. - Configure a
secondary
ashidden
during the compact operation so the only competing traffic will be basic replication.
Regards,
Stennie
This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.