Compact command concerns

Hi @Abdelrahman_N_A,

If you were using a server release of MongoDB earlier than 4.4, I’d definitely have serious concerns about blocking side effects of a compact operation in production. Removing the blocking behaviour was one of the improvements included in the MongoDB 4.4 release, so that is a positive change from previous releases.

However, although compact will no longer block CRUD operations for the database containing the collection being compacted, there could still be a significant impact on your working set if you are compacting a large collection.

Considerations before you compact

Before running compaction I would check that this might be useful to do based on:

  • The file bytes available for reuse metric (via db["collectionname"].stats().wiredTiger["block-manager"] in the mongo shell)
  • Likelihood that you won’t be inserting that much data into the collection in the near future

It is normal to have some reusable space for a collection with active updates. Excessive reusable space is typically the result of deleting a large amount of data, but can sometimes be related to your workload or the provenance of your data files.

The outcome of a compact operation is dependent on the storage contents, so I would draw your attention to the note on Disk Space in the compact documentation:

On WiredTiger, compact attempts to reduce the required storage space for data and indexes in a collection, releasing unneeded disk space to the operating system. The effectiveness of this operation is workload dependent and no disk space may be recovered. This command is useful if you have removed a large amount of data from the collection, and do not plan to replace it.

Running compact in production

If this is a production environment, I would hope you have a replica set or sharded cluster deployment so you can minimise the operational impact.

If you have many large collections to compact (or want a more likely outcome of freeing up disk space), Re-syncing a Secondary Member of a Replica Set via initial sync will rebuild all of the data files by copying over the data from another member. If compact doesn’t end up freeing up enough space, this would be the next procedure to run.

If you do decide to run compact in a production environment, I would minimise the operational impact by:

  • Always having a replica set deployment (ideally a minimum of three data-bearing members, no arbiters)
  • Run compact operations on one secondary at a time.
  • Configure a secondary as hidden during the compact operation so the only competing traffic will be basic replication.

Regards,
Stennie

2 Likes