A Decisioning Framework for MongoDB $regex and $text vs Atlas Search
Rate this article
Are you using $text or $regex to provide search-like functionality in your application? If so, MongoDB Atlas’ $search operator offers several advantages to $text and $regex, such as faster and more efficient search results, natural language queries, built-in relevance ranking, and better scalability. Getting started is super easy as $search is embedded as an aggregation stage right into MongoDB Atlas, providing you with full text search capabilities on all of your operational data.
While the $text and $regex operators are your only options for on-premises or local deployment, and provide basic text matching and pattern searching, Atlas users will find that $search provides a more comprehensive and performant solution for implementing advanced search functionality in your applications. Features like fuzzy matching, partial word matching, synonyms search, More Like This, faceting, and the capability to search through large data sets are only available with Atlas Search.
Migrating from $text or $regex to $search doesn't necessarily mean rewriting your entire codebase. It can be a gradual process where you start incorporating the $search operator in new features or refactoring existing search functionality in stages.
The table below explores the benefits of using Atlas Search compared to regular expressions for searching data. Follow along and experience the power of Atlas Search firsthand.
Note: $text and $regex have had no major updates since 2015, and all future enhancements in relevance-based search will be delivered via Atlas Search.
App Requirements | $regex | $text | $search | Reasoning |
---|---|---|---|---|
The datastore must respect write concerns | ✅ | 🚫 | 🚫 | If you have a datastore that must respect write concerns for use cases like transactions with heavy reads after writes, $regex is a better choice. For search use cases, reads after writes should be rare. |
Language awareness (Spanish, Chinese, English, etc.) | 🚫 | 🚫 | ✅ | Atlas Search natively supports over 40 languages so that you can better tokenize languages, remove stopwords, and interpret diacritics to support improved search relevance. |
Case-insensitive text search | 🚫 | 🚫 | ✅ | Case-insensitive text search using $regex is one of the biggest sources of problems among our customer base, and $search offers far more capabilities than $text. |
Highlighting result text | 🚫 | 🚫 | ✅ | The ability to highlight text fragments in result documents helps end users contextualize why some documents are returned compared to others. It's essential for user experiences powered by natural language queries. While developers could implement a crude version of highlighting with the other options, the $search aggregation stage provides an easy-to-consume API and a core engine that handles topics like tokenization and offsets. |
Geospatial-aware search queries | ✅ | 🚫 | ✅ | Both $regex and $search have geospatial capabilities. The differences between the two lie in the differences between how $regex and $search treat geospatial parameters. For instance, Lucene draws a straight line from one query coordinate to another, whereas MongoDB lines are spherical. Spherical queries are best for flights, whereas flat map queries might be better for short distances. |
On-premises or local deployment | ✅ | ✅ | 🚫 | Atlas Search is not available on-premise or for local deployment. The single deployment target enables our team to move fast and innovate at a more rapid pace than if we targeted many deployment models. For that reason, $regex and $text are the only options for people who do not have access to Atlas. |
Autocomplete of characters (nGrams) | 🚫 | 🚫 | ✅ | End users typing in a search box have grown accustomed to an experience where their search queries are completed for them. Atlas Search offers edgeGrams for left-to-right autocomplete, nGrams for autocomplete with languages that do not have whitespace, and rightEdgeGram for languages that are written and read right-to-left. |
Autocomplete of words (wordGrams) | 🚫 | 🚫 | ✅ | If you have a field with more than two words and want to offer word-based autocomplete as a feature of your application, then a shingle token filter with custom analyzers could be best for you. Custom analyzers offer developers a flexible way to index and modify how their data is stored. |
Fuzzy matching on text input | 🚫 | 🚫 | ✅ | If you would like to filter on user generated input, Atlas Search’s fuzzy offers flexibility. Issues like misspelled words are handled best by $search. |
Filtering based on more than 10 strings | 🚫 | 🚫 | ✅ | It’s tricky to filter on more than 10 strings in MongoDB due to the limitations of compound text indexes. The compound filter is again the right way to go here. |
Relevance score sorted search | 🚫 | 🚫 | ✅ | Atlas Search uses the state-of-art BM25 algorithm for determining the search relevance score of documents and allows for advanced configuration through boost expressions like multiply and gaussian decay, as well as analyzers, search operators, and synonyms. |
Cluster needs to be optimized for write performance | 🚫 | 🚫 | ✅ | When you add a database index in MongoDB, you should consider tradeoffs to write performance in cases where database write performance is important. Search Indexes don’t degrade cluster write performance. |
Searching through large data sets | 🚫 | 🚫 | ✅ | If you have lots of documents, your queries will linearly get slower. In Atlas Search, the inverted index enables fast document retrieval at very large scales. |
Partial indexes for simple text matching | ✅ | 🚫 | 🚫 | Atlas Search does not yet support partial indexing. Today, $regex takes the cake. |
Single compound index on arrays | 🚫 | 🚫 | ✅ | Atlas Search is partially designed for this use case, where term indexes are intersected in a single Search index, to eliminate the need for compound indexes for filtering on arrays. |
Synonyms search | 🚫 | 🚫 | ✅ | The only option for robust synonyms search is Atlas Search, where synonyms are defined in a collection, and that collection is referenced in your search index. |
Fast faceting for counts | 🚫 | 🚫 | ✅ | If you are looking for faceted navigation, or fast counts of documents based on text criteria, let Atlas Search do the bucketing. In our internal testing, it's 100x faster and also supports number and date buckets. |
Custom analyzers (stopwords, email/URL token, etc.) | 🚫 | 🚫 | ✅ | Using Atlas Search, you can define a custom analyzer to suit your specific indexing needs. |
Partial match | 🚫 | 🚫 | ✅ | MongoDB has a number of partial match options ranging from the wildcard operator to autocomplete, which can be useful for some partial match use cases. |
Phrase queries | 🚫 | 🚫 | ✅ | Phrase queries are supported natively in Atlas Search via the phrase operator. |
Note: The green check mark sometimes does not appear in cases where the corresponding aggregation stage may be able to satisfy an app requirement, and in those cases, it’s because one of the other stages (i.e., $search) is far superior for a given use case.
If we’ve whetted your appetite to learn more about Atlas Search, we have some resources to get you started:
The Atlas Search documentation provides reference materials and tutorials, while the MongoDB Developer Hub provides sample apps and code. You can spin up Atlas Search at no cost on the Atlas Free Tier and follow along with the tutorials using our sample data sets, or load your own data for experimentation within your own sandbox.