Hello MongoDB Experts! We’re using MongoDB for a use case that involves de-duplicating data. The app effectively issues a “get” to check whether a key already exists or not, and if it does, it discards the key. We expect 99% of the queries to miss as most of the data coming through the system is unique.
Most LSM-based KV stores have bloom filters which make this query pattern extremely effective (you can query without hitting disk at all), but for B-Trees without bloom filters this is a degenerate case that results in worst case performance (you must always go to disk since the cache will miss and find no results).
Two questions:
- Is there any way to configure Atlas/MongoDB to optimize for these kind of queries?
- It seems that WiredTiger supports LSMs, is it possible to configure Atlas to use that as its storage engine or are we stuck with the BTree index?
Thanks!