Optimizing for Negative Lookups (Queries with 0 Results)

barbershop · December 28, 2023, 12:27am

Hello MongoDB Experts! We’re using MongoDB for a use case that involves de-duplicating data. The app effectively issues a “get” to check whether a key already exists or not, and if it does, it discards the key. We expect 99% of the queries to miss as most of the data coming through the system is unique.

Most LSM-based KV stores have bloom filters which make this query pattern extremely effective (you can query without hitting disk at all), but for B-Trees without bloom filters this is a degenerate case that results in worst case performance (you must always go to disk since the cache will miss and find no results).

Two questions:

Is there any way to configure Atlas/MongoDB to optimize for these kind of queries?
It seems that WiredTiger supports LSMs, is it possible to configure Atlas to use that as its storage engine or are we stuck with the BTree index?

Thanks!

Andrew_Davidson · January 5, 2024, 10:01pm

Check out Unique Indexes: these allow you to maintain a uniqueness constrained through an optimized index data struture https://www.mongodb.com/docs/manual/core/index-unique/

barbershop · January 5, 2024, 10:50pm

Thanks Andrew! We are using unique indexes for these fields and empirically the performance is pretty good. I was wondering if you had any description of what the underlying wired tiger index format for unique indexes are? I tried to search for a blog/ticket/design doc or something that describes it and I couldn’t find anything outside of a few tickets like https://jira.mongodb.org/browse/SERVER-34489 that mention the UniqueIndexV2 but don’t describe how it works…