Hi,
In one of our collection, we are using wildcard indexing and when inserting the data (around 1 million documents) the index size becomes more than 300 MB.
But if I create the index after inserting the data, the index size becomes less than 80 MB.
Now my question is, since there is big difference in the index size and I think it also affects on the performance specially with more documents and less memory available, is it something that we need to do it once in a while and rebuild the indices to keep them optimized or MongoDB itself handles it automatically in the background to keep the indices optimized to take less space?
And also another question, because inserting the data with having indices takes longer time, do you recommend whenever we do a bulk insert (million of records in bulk) first drop the indices, and then create them after the insertion or there is another way to handle that?
Hi Babak… Great questions… You’re absolutely right - the size of an index differs depending on when it’s created or more accurately, how much data is in the collection when it’s created. Let dive in…
Wildcard Index Size and Rebuilding:
Inserting documents while an index exists (like a wildcard index), MongoDB has to update that index incrementally for each doc, which can lead to inefficiency and potentially even bloat. The db is building the index as it inserts, meaning there’s a potential for more fragmentation and less optimal disk use. That’s why, when you create the index after inserting data, mongo can build it more efficiently, scanning the entire collection at once. The index is tighter and more compact as a result… i.e. fresher?
Should You Rebuild Indexes?
I don’t recommend rebuilding indexes regularly… mongodb handles a lot of optimizations under the hood, like balancing data across the collection, but it doesn’t automatically compact or rebuild indexes. If you find that the index bloat is affecting your performance, manually rebuilding your index from time to time theoretically could be helpful. PErhaps schedule periodic maintenance where you drop and recreate the index if space and performance are issues over time.
Bulk Inserts and Indexing:
You’re right to assume that inserting data with indexes in place will take longer since mongo is updating the index with each document inserted… If you’re inserting millions of records, it can be more efficient to drop the index, insert the data, and then recreate the index afterward. However, dropping indexes also temporarily affects reads and queries that depend on that index, so it’s a trade-off… and depends on the active nature of the app. In most cases:
- If you’re doing a one-time or infrequent bulk insert (like your million-record scenario), dropping indexes first and recreating them after the insert is good strategy.
- For frequent bulk inserts, you might use mongo’s
ordered: false
option for the bulk write operation. It can speed up inserts by allowing MongoDB to proceed with other inserts even if some fail, plus it reduces overhead.
See the docs for insert…
boolean
Optional. A boolean specifying whether the mongod instance should perform an ordered or unordered insert. Defaults to true.`
### so in short...
* **Yes**, rebuilding indexes occasionally can help reduce size and improve performance, but it's manual.
* **For large bulk inserts**, dropping indexes beforehand and recreating them after the insert is a solid approach. MongoDB won’t handle that automatically.
Let us know if you want more details on the index-rebuild process or how to optimize bulk inserts... Happy to dive (even) deeper!
2 Likes