Atlas Search Overview
On this page
MongoDB's Atlas Search allows fine-grained text indexing and querying of data
on your Atlas cluster. It enables advanced search functionality for
your applications without any additional management or separate search
system alongside your database. Atlas Search provides options for several
kinds of text analyzers, a rich query
language that uses Atlas Search aggregation pipeline
stages like $search
and $searchMeta
in
conjunction with other MongoDB aggregation pipeline stages, and
score-based results ranking.
Tip
Quickly try Atlas Search without needing an Atlas account, cluster, or collection, with the Atlas Search Playground. To learn more, see the documentation.
Atlas Search Fundamentals
The following concepts form the basis of Atlas Search and are essential to optimize your application.
Indexing
In the context of search, an index is a data structure that categorizes data in an easily searchable format. Search indexes enable faster retrieval of documents that contain a given term without having to scan the entire collection. While both Atlas Search indexes and MongoDB Indexes make data retrieval faster, note that they are not the same. Like the index in the back of a book, a search index is a mapping between terms and the documents that contain those terms. Search indexes also contain other relevant metadata, such as the positions of terms in documents.
Creating at least one search index is usually required in any search application. For more information, see Create and Manage Atlas Search Indexes.
Tokenization
When creating a search index, data must first be transformed into a sequence of tokens or terms. An analyzer facilitates this process through steps including:
Tokenization: Breaking up words in a string into indexable tokens. For example, dividing a sentence by whitespace and punctuation.
Normalization: Organizing data so that it is consistently represented and easier to analyze. For example, transforming text to lower case or removing unwanted words called stop words.
Stemming: Reducing words to their root form. For example, ignoring suffixes, prefixes, and plural word forms.
The specifics of tokenization are language-specific and can require making additional choices. Which analyzer to use depends on your data and application. For more information, see Process Data with Analyzers.
Querying
Search queries consult the index to return a set of results. Search queries are different than traditional database queries, as they are intended to meet more general information needs. Where a database query must follow a strict syntax, search queries can be for simple text matching, but can also look for similar phrases, number or date ranges, or use regular expressions or wildcards.
For more information, see Create and Run Atlas Search Queries.
Scoring
Each document receives a relevancy score that enables query results to be returned in order from the highest relevance to the lowest. In the simplest form of scoring, documents score higher if the query term appears frequently in a document and lower if the query term appears across many documents in the collection. Scoring can also be customized. Tailoring search to a specific domain often means customizing the relevance-based default score by boosting, decaying, or modifying it in other ways.
For more information, see Score Documents.
Atlas Search Architecture
The Atlas Search mongot
process, built on Apache Lucene, interfaces with the mongod
database
process to create and manage your full-text and vector search indexes
and queries.
About the mongot
Process
The mongot
process performs the following tasks:
Creates Atlas Search indexes based on the rules in the index definition for the collection.
Monitors change streams for the current state of the documents and indexes for the collections for which you defined Atlas Search indexes.
Processes Atlas Search queries and returns the document IDs and other search metadata for the matching documents to
mongod
, which then does a full document lookup and returns the results to the client.
You can choose a deployment model where the Atlas Search mongot
process
runs alongside the mongod
process on each node in the Atlas
cluster or where the mongot
process runs on separate search
nodes. For testing your search queries and prototyping your application,
you can choose the default deployment model where both the mongot
and mongod
processes run on the same node. However, for
production-ready applications, deploy mongot
on separate search
nodes to avoid any resource contention between the mongot
and
mongod
processes in your production environment.
For guidance on choosing a deployment type for pre-production and production environments, see Atlas Search Deployment Options and Atlas Vector Search Deployment Options.