Mongo DB CE v 4.2.x - Deleting huge data - Unreliable Compact

Ok_Measurement_1908 · November 11, 2024, 4:20am

We’re managing a MongoDB database that has reached 10TB in size and continues to grow daily. We’re using the community edition, version 4.2.x. To control the database size, we’re planning to run a continuous purge job to delete old documents.

However, we’re encountering issues with the compact operation. It has proven unpredictable—compact times for the same collection and similar volumes of deleted data vary significantly between runs, which makes it difficult for us to reliably schedule or plan around it.

Given that we’re deleting large amounts of data, we’re concerned about the potential performance impact over time if we skip running compact. Has anyone experienced performance degradation in MongoDB under similar conditions without regularly compacting?

If compact operation is optional, how does MongoDB react to long term disk fragmentation… what are the implications, if any? or are the later versions where compact is reliable and can deal with large scale fragmentation and defrag the space?

Due to shorter oplog window, we cannot go for init sync for reclaiming the space. And, due to the way our applications are working, we can’t afford to increase the oplog size.

We’re running a 3-node replica set, but we’re not using sharding.

Any insights or suggestions would be greatly appreciated.