Mastering Indexing for Perfect Query Matches

7 min • Published Nov 08, 2023

Rate this video

Search Video Summary

00:00:00Introduction to Indexing

00:00:57Index Configuration and Search Requirements

00:01:42Document Model and Index Structure

00:02:38Atlas Search Index Synchronization

00:03:31Understanding the Inverted Index

00:04:38Analyzers and Index Configuration

00:05:47Demonstration of Inverted Index and Analyzers

00:07:00Effects of Basic Analyzers in Atlas Search

00:08:31Conclusion and Significance of Index Configuration

00:09:52Preview of Next Topic: Searching

00:10:27Closing Remarks and Call to Subscribe

The main theme of the video is the significance of index configuration in MongoDB's Atlas Search and how it impacts the searchability and effectiveness of queries.

🔑 Key Points

Index configuration is critical for optimizing search functionality.
The `$search` aggregation pipeline stage is central to MongoDB's search capabilities.
Understanding the types of data and queries is essential for effective indexing.
MongoDB's document model aligns well with index structures, facilitating searches.
Changes in index configuration trigger a full reindex of content.
Atlas Search uses an inverted index for full-text content, with various analyzers affecting searchability and relevancy.

🔗 Related Links

All MongoDB Videos

Full Video Transcript

this is episode three of season 1 indexing findability is intimately tied to your index configuration with the dollar search aggregation pipeline stage so it's very important to dig into the configuration that you have and make sure that it meets the demands of your searching requirements um so things like a language and uh variations and the type of fuzziness that you want in terms of matchability uh do you have domain synonyms and all of those sorts of things uh that relate to the type of data that you have and then the types of queries that you um are going to um issue against the dollar search aggregation stage uh very nicely the document model lends itself well to index structure so when you do a search with dollar search what you are retrieving back what what matches are the documents from your original collection and uh in the Lucine index and the index under the covers of Atlas search those actually are called documents as well and they have Fields just like the documents that you have uh in your Atlas database and because of that very nice mapping uh there is the ability to specify the configuration that you want per field and say how you want those fields to be searched or faceted or uh sometimes uh how they are sortable um when you add when you uh create a atlas search index like we said at the beginning in the Quick Start Episode there's an automatic synchronization process that is uh occurring so that any changes to your your content are reflected in the index itself also important to note any change in the index configuration itself causes uh a full reindex of all of your content and uh that happens uh in parallel to your um active index so that uh things are stay searchable uh and but things will get reindexed with the new reindex structure and and and point at the new nodes when that happens um so given all that let's talk about uh the index structure and what it takes to configure that index structure so under the covers of Atlas search is what's called an inverted index that is the data structure for full text content for textual content uh that is being mapped from your database into Atlas search under the covers again this is a Lucine index on textual content produces a data structure called an inverted index and here's an illustration of an inverted Index this is uh consider here's a document with the text here's some text Capital H apostrophe s okay and I'm using the standard analyzer in configuration here for this inverted index uh demonstration and and so when we get the next document uh and more text dot dot dot and more text that analyzer uh configuration the standard analyzer which is the default will break the text into tokens and terms and these become the inverted index searchable units so that analyzer configuration is key to how the text in your content is searchable what are the words that are extracted from your content are the uh special characters uh how they are handled how is case handled Capital H goes a lowercase H here the apostrophe s remained using this particular analyzer configuration now we're going to index one more document and see how that inverted index structure looks now so the third document and some more text so there's some overlapping words and and more and some and text so you can see over here in uh the inverted index What's called the posting list piece of the inverted index are the documents that are associated with each term that was extracted during the analysis process and again these are the searchable units and changing the analyzer that you're using for uh your particular Fields allows you to adjust again the searchability and sometimes even the relevancy of the results that are returned so now let's take a look at um various uh effects of the basic analyzer that are built into Atlas search so here we're going to walk through a number of um analyzers um I'm going to zoom this in over here a little bit so that we can see the terms that are extracted here so using the standard analyzer of uh Atlas search this is how the text gets broken down for here's some text and in this example we're just going to keep using here's some text here's some text using the simple analyzer so this is called Lucine Dost standard for the first one Lucine do simple for the second one here and then uh the Lucine white space analyzer that breaks the text at wh space characters but leaves everything else intact capitalization and so on and you can see here on the simple analyzer the apostrophe s uh was used as delimiter and it kept the S as a separate token the English analyzer so this one uh applies some other fistic it's not quite apparent here on these particular terms other than the apostrophe s got removed so uh plurals or contractions in this way get removed so that here is a searchable unit and then there is a keyword analyzer this one's important if you know that you are going to keep the text um exactly as is and use uh that as a an exact match typee search or maybe a prefix type search um so the keyword analyzer again keeps the text exactly as is and that's EX indexed in the inverted index as a single term and then finally uh there is an autocomplete uh field type in Atlas search and when you turn that on there are some options and this is just kind of one configuration of it where it engrams the text um after token is ation so after here some text gets tokenized into separate tokens three different tokens here then it uses engrams to uh index all the sub pieces of the text up to a certain minimum and maximum gram sizes and with that inverted index and analyzer demonstration the significance of the index configuration becomes clear what you index is what you can find and how you can find it from basic word separation and case insensitive normalization to language specific stemming and character level amrs index configuration provides the pieces needed for the next Topic in our series searching so stay tuned subscribe to stay notified of our video releases

Rate this video

This is part of a series

The Atlas Search 'cene!

Up Next

Query Operators & Relevancy Controls for Precision Searches

Continue

The Atlas Search 'cene: Season 1

Sep 11, 2024 | 2 min

Video

Mastering Indexing for Perfect Query Matches

Full Video Transcript

The Atlas Search 'cene!

Up Next

More in this series

Related

The Atlas Search 'cene: Season 1