Docs Menu

Get Started with the LangChain Integration


This tutorial uses LangChain's Python library. For a tutorial that uses the JavaScript library, see Get Started with the LangChain JS/TS Integration.

You can integrate Atlas Vector Search with LangChain to build LLM applications and implement retrieval-augmented generation (RAG). This tutorial demonstrates how to start using Atlas Vector Search with LangChain to perform semantic search on your data and build a RAG implementation. Specifically, you perform the following actions:

  1. Set up the environment.

  2. Store custom data on Atlas.

  3. Create an Atlas Vector Search index on your data.

  4. Run the following vector search queries:

    • Semantic search.

    • Semantic search with score.

    • Semantic search with metadata pre-filtering.

  5. Implement RAG by using Atlas Vector Search to answer questions on your data.

Work with a runnable version of this tutorial as a Python notebook.

LangChain is an open-source framework that simplifies the creation of LLM applications through the use of "chains." Chains are LangChain-specific components that can be combined for a variety of AI use cases, including RAG.

By integrating Atlas Vector Search with LangChain, you can use Atlas as a vector database and use Atlas Vector Search to implement RAG by retrieving semantically similar documents from your data. To learn more about RAG, see Retrieval-Augmented Generation (RAG) with Atlas Vector Search.

To complete this tutorial, you must have the following:

  • An Atlas account with a cluster running MongoDB version 6.0.11, 7.0.2, or later (including RCs). Ensure that your IP address is included in your Atlas project's access list. To learn more, see Create a Cluster.

  • An OpenAI API Key. You must have an OpenAI account with credits available for API requests. To learn more about registering an OpenAI account, see the OpenAI API website.

  • An environment to run interactive Python notebooks such as Colab.

Set up the environment for this tutorial. Create an interactive Python notebook by saving a file with the .ipynb extension. This notebook allows you to run Python code snippets individually, and you'll use it to run the code in this tutorial.

To set up your notebook environment:


Run the following command:

pip install --quiet --upgrade langchain langchain-community langchain-core langchain-mongodb langchain-openai pymongo pypdf

Then, run the following code to import the required packages:

import os, pymongo, pprint
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pymongo import MongoClient
from pymongo.operations import SearchIndexModel

Run the following code, replacing the placeholders with the following values:

os.environ["OPENAI_API_KEY"] = "<api-key>"
ATLAS_CONNECTION_STRING = "<connection-string>"


Your connection string should use the following format:


Then, load custom data into Atlas and instantiate Atlas as a vector database, also called a vector store. Copy and paste the following code snippets into your notebook.


Run the following code to establish a connection to your Atlas cluster. It specifies the following:

  • langchain_db.test as the name of the collection for which to load the data.

  • vector_index as the name of the Atlas Vector Search index to use for querying the data.

# Connect to your Atlas cluster
# Define collection and index name
db_name = "langchain_db"
collection_name = "test"
atlas_collection = client[db_name][collection_name]
vector_search_index = "vector_index"

For this tutorial, you use a publicly accessible PDF document titled MongoDB Atlas Best Practices as the data source for your vector store. This document describes various recommendations and core concepts for managing your Atlas deployments.

To load the sample data, run the following code snippet. It does the following:

  • Retrieves the PDF from the specified URL and loads the raw text data.

  • Uses a text splitter to split the data into smaller documents.

  • Specifies chunk parameters, which determines the number of characters in each document and the number of characters that should overlap between two consecutive documents.

# Load the PDF
loader = PyPDFLoader("")
data = loader.load()
# Split PDF into documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
docs = text_splitter.split_documents(data)
# Print the first document
Document(page_content='Mong oDB Atlas Best P racticesJanuary 20 19A MongoD B White P aper', metadata={'source': '', 'page': 0})

Run the following code to create a vector store instance named vector_store from the sample documents. This snippet uses the from_documents method to create the MongoDBAtlasVectorSearch vector store and specifies the following parameters:

  • The sample documents to store in the vector database.

  • An OpenAI embedding model as the model used to convert text into vector embeddings for the embedding field. By default, this model is text-embedding-ada-002.

  • langchain_db.test as the Atlas collection to store the documents.

  • vector_index as the index to use for querying the vector store.

# Create the vector store
vector_store = MongoDBAtlasVectorSearch.from_documents(
documents = docs,
embedding = OpenAIEmbeddings(disallowed_special=()),
collection = atlas_collection,
index_name = vector_search_index

After running the sample code, you can view your vector embeddings in the Atlas UI by navigating to the langchain_db.test collection in your cluster.


To create an Atlas Vector Search index, you must have Project Data Access Admin or higher access to the Atlas project.

To enable vector search queries on your vector store, create an Atlas Vector Search index on the langchain_db.test collection by using the LangChain helper method or the PyMongo driver method.

Run the following code in your notebook for your preferred method. The index definition specifies indexing the following fields:

  • embedding field as the vector type. The embedding field contains the embeddings created using OpenAI's text-embedding-ada-002 embedding model. The index definition specifies 1536 vector dimensions and measures similarity using cosine.

  • page field as the filter type for pre-filtering data by the page number in the PDF.

# Use helper method to create the vector search index
dimensions = 1536, # The dimensions of the vector embeddings to be indexed
filters = [ "page" ]
# Create your index model, then create the search index
search_index_model = SearchIndexModel(
"fields": [
"type": "vector",
"path": "embedding",
"numDimensions": 1536,
"similarity": "cosine"
"type": "filter",
"path": "page"

The index should take about one minute to build. While it builds, the index is in an initial sync state. When it finishes building, you can start querying the data in your collection.

Once Atlas builds your index, run vector search queries on your data. The following examples demonstrate various queries that you can run on your vectorized data.

The following query uses the similarity_search method to perform a basic semantic search for the string MongoDB Atlas security. It returns a list of documents ranked by relevance.

query = "MongoDB Atlas security"
results = vector_store.similarity_search(query)
[Document(page_content='To ensure a secure system right out of the b ox,\nauthentication and I P Address whitelisting are\nautomatically enabled.\nReview the security section of the MongoD B Atlas', metadata={'_id': ObjectId('65c2e8f480f26794dedad8d5'), 'source': '', 'page': 17}),
Document(page_content='MongoD B Atlas team are also monitoring the underlying\ninfrastructure, ensuring that it is always in a healthy state.\nApplication L ogs And Database L ogs', metadata={'_id': ObjectId('65c2e8f480f26794dedad8a0'), 'source': '', 'page': 15}),
Document(page_content='MongoD B.\nMongoD B Atlas incorporates best practices to help keep\nmanaged databases healthy and optimized. T hey ensure\noperational continuity by converting comple x manual tasks', metadata={'_id': ObjectId('65c2e8f380f26794dedad883'), 'source': '', 'page': 13}),
Document(page_content='Atlas provides encryption of data at rest with encrypted\nstorage volumes.\nOptionally , Atlas users can configure an additional layer of\nencryption on their data at rest using the MongoD B', metadata={'_id': ObjectId('65c2e8f480f26794dedad8e3'), 'source': '', 'page': 18})]

The following query uses the similarity_search_with_score method to perform a semantic search for the string MongoDB Atlas security and specifies the k parameter to limit the number of documents to return to 3.


The k parameter in this example refers to the similarity_search_with_score method option, not the knnBeta operator option of the same name.

It returns the three most relevant documents and a relevance score between 0 and 1.

query = "MongoDB Atlas security"
results = vector_store.similarity_search_with_score(
query = query, k = 3
[(Document(page_content='To ensure a secure system right out of the b ox,\nauthentication and I P Address whitelisting are\nautomatically enabled.\nReview the security section of the MongoD B Atlas', metadata={'_id': ObjectId('65c2e8f480f26794dedad8d5'), 'source': '', 'page': 17}),
(Document(page_content='MongoD B Atlas team are also monitoring the underlying\ninfrastructure, ensuring that it is always in a healthy state.\nApplication L ogs And Database L ogs', metadata={'_id': ObjectId('65c2e8f480f26794dedad8a0'), 'source': '', 'page': 15}),
(Document(page_content='MongoD B.\nMongoD B Atlas incorporates best practices to help keep\nmanaged databases healthy and optimized. T hey ensure\noperational continuity by converting comple x manual tasks', metadata={'_id': ObjectId('65c2e8f380f26794dedad883'), 'source': '', 'page': 13}),

You can pre-filter your data by using an MQL match expression that compares the indexed field with boolean, number, or string values. You must index any metadata fields that you want to filter by as the filter type. To learn more, see How to Index Fields for Vector Search.


You specified the page field as a filter when you created the index for this tutorial.

The following query uses the similarity_search_with_score method to perform a semantic search for the string MongoDB Atlas security. It also specifies the following:

  • The k parameter to limit the number of documents to return to 3.

  • A pre-filter on the page field that uses the $eq operator to match documents appearing on page 17 only.

It returns the three most relevant documents from page 17 and a relevance score between 0 and 1.

query = "MongoDB Atlas security"
results = vector_store.similarity_search_with_score(
query = query,
k = 3,
pre_filter = { "page": { "$eq": 17 } }
[(Document(page_content='To ensure a secure system right out of the b ox,\nauthentication and I P Address whitelisting are\nautomatically enabled.\nReview the security section of the MongoD B Atlas', metadata={'_id': ObjectId('65c2e8f480f26794dedad8d5'), 'source': '', 'page': 17}),
(Document(page_content='Security\nAs with all software, MongoD B administrators must\nconsider security and risk e xposure for a MongoD B\ndeployment. T here are no magic solutions for risk', metadata={'_id': ObjectId('65c2e8f480f26794dedad8d0'), 'source': '', 'page': 17}),
(Document(page_content='number of diff erent methods for managing risk and\nreducing risk e xposure.\nMongoD B Atlas f eatures e xtensive capabilities to def end,\ndetect, and control access to MongoD B, off ering among', metadata={'_id': ObjectId('65c2e8f480f26794dedad8d2'), 'source': '', 'page': 17}),


For a full list of semantic search methods, refer to the API reference.

This section demonstrates how to implement RAG in your application with Atlas Vector Search and LangChain. Now that you've used Atlas Vector Search to retrieve semantically similar documents, run the following code examples to prompt the LLM to answer questions based on those documents.

This example does the following:

  • Instantiates Atlas Vector Search as a retriever to query for similar documents, including the optional k parameter to search for only the 10 most relevant documents.

  • Defines a LangChain prompt template to instruct the LLM to use these documents as context for your query. LangChain passes these documents to the {context} input variable and your query to the {question} variable.

  • Constructs a chain that specifies the following:

    • Atlas Vector Search as the retriever to search for documents to use as context.

    • The prompt template that you defined.

    • An LLM from OpenAI to generate a context-aware response. By default, this is the gpt-3.5-turbo model.

  • Prompts the chain with a sample query about Atlas security recommendations.

  • Returns the LLM's response and the documents used as context. The generated response might vary.

# Instantiate Atlas Vector Search as a retriever
retriever = vector_store.as_retriever(
search_type = "similarity",
search_kwargs = { "k": 10 }
# Define a prompt template
template = """
Use the following pieces of context to answer the question at the end.
Question: {question}
prompt = PromptTemplate.from_template(template)
model = ChatOpenAI()
# Construct a chain to answer questions on your data
chain = (
{ "context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
# Prompt the chain
question = "How can I secure my MongoDB Atlas cluster?"
answer = chain.invoke(question)
print("Question: " + question)
print("Answer: " + answer)
# Return source documents
documents = retriever.invoke(question)
print("\nSource documents:")
Question: How can I secure my MongoDB Atlas cluster?
Answer: To secure your MongoDB Atlas cluster, you can enable
authentication and IP address whitelisting, review the security section
in the MongoDB Atlas dashboard, encrypt data at rest with encrypted storage
volumes, optionally configure an additional layer of encryption on your
data, set up global clusters on Amazon Web Services, Microsoft Azure,
and Google Cloud Platform, and ensure operational continuity by choosing
appropriate instance size, storage size, and storage speed options.
Additionally, consider setting up a larger number of replica nodes for
increased protection against database downtime.
Source documents:
[Document(page_content='To ensure a secure system right out of the b ox,\nauthentication and I P Address whitelisting are\nautomatically enabled.\nReview the security section of the MongoD B Atlas', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe0436'), 'source': '', 'page': 17}),
Document(page_content='MongoD B Atlas team are also monitoring the underlying\ninfrastructure, ensuring that it is always in a healthy state.\nApplication L ogs And Database L ogs', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe0401'), 'source': '', 'page': 15}),
Document(page_content='All the user needs to do in order for MongoD B Atlas to\nautomatically deploy the cluster is to select a handful of\noptions:\n•Instance size\n•Storage size (optional)\n•Storage speed (optional)', metadata={'_id': ObjectId('65fb4f046979cf7cbbfe03ef'), 'source': '', 'page': 14}),
Document(page_content='MongoD B.\nMongoD B Atlas incorporates best practices to help keep\nmanaged databases healthy and optimized. T hey ensure\noperational continuity by converting comple x manual tasks', metadata={'_id': ObjectId('65fb4f046979cf7cbbfe03e4'), 'source': '', 'page': 13}),
Document(page_content='You can set up global clusters — available on Amazon W eb\nServices, Microsoft Azure, and Google Cloud Platform —\nwith just a f ew clic ks in the MongoD B Atlas U I. MongoD B', metadata={'_id': ObjectId('65fb4f046979cf7cbbfe03bb'), 'source': '', 'page': 12}),
Document(page_content='Table of Contents\n1 Introduction\n2 Preparing for a MongoD B Deployment\n9 Scaling a MongoD B Atlas Cluster\n11 Continuous A vailability & Data Consistency\n12 Managing MongoD B\n16 Security', metadata={'_id': ObjectId('65fb4f026979cf7cbbfe02d6'), 'source': '', 'page': 1}),
Document(page_content='Atlas provides encryption of data at rest with encrypted\nstorage volumes.\nOptionally , Atlas users can configure an additional layer of\nencryption on their data at rest using the MongoD B', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe0444'), 'source': '', 'page': 18}),
Document(page_content='Disaster Recovery\nCreated by the engineers who develop the database,\nMongoD B Atlas is the simplest way to run MongoD B,\nmaking it easy to deploy , monitor , backup, and scale\nMongoD B.', metadata={'_id': ObjectId('65fb4f046979cf7cbbfe03e3'), 'source': '', 'page': 13}),
Document(page_content='Security\nAs with all software, MongoD B administrators must\nconsider security and risk e xposure for a MongoD B\ndeployment. T here are no magic solutions for risk', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe0431'), 'source': '', 'page': 17}),
Document(page_content='A larger number of replica nodes provides increased\nprotection against database downtime in case of multiple\nmachine failures.\nMongoD B Atlas replica sets have a minimum of 3 nodes', metadata={'_id': ObjectId('65fb4f046979cf7cbbfe03ca'), 'source': '', 'page': 12})]

This example does the following:

  • Instantiates Atlas Vector Search as a retriever to query for similar documents, including the following optional parameters:

    • k to search for only the 10 most relevant documents.

    • score_threshold to use only documents with a relevance score above 0.75.


      This parameter refers to a relevance score that Langchain uses to normalize your results, and not the relevance score used in Atlas Search queries. To use Atlas Search scores in your RAG implementation, define a custom retriever that uses the similarity_search_with_score method and filters by the Atlas Search score.

    • pre_filter to filter on the page field for documents that appear on page 17 only.

  • Defines a LangChain prompt template to instruct the LLM to use these documents as context for your query. LangChain passes these documents to the {context} input variable and your query to the {question} variable.

  • Constructs a chain that specifies the following:

    • Atlas Vector Search as the retriever to search for documents to use as context.

    • The prompt template that you defined.

    • An LLM from OpenAI to generate a context-aware response. By default, this is the gpt-3.5-turbo model.

  • Prompts the chain with a sample query about Atlas security recommendations.

  • Returns the LLM's response and the documents used as context. The generated response might vary.

# Instantiate Atlas Vector Search as a retriever
retriever = vector_store.as_retriever(
search_type = "similarity",
search_kwargs = {
"k": 10,
"score_threshold": 0.75,
"pre_filter": { "page": { "$eq": 17 } }
# Define a prompt template
template = """
Use the following pieces of context to answer the question at the end.
Question: {question}
prompt = PromptTemplate.from_template(template)
model = ChatOpenAI()
# Construct a chain to answer questions on your data
chain = (
{ "context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
# Prompt the chain
question = "How can I secure my MongoDB Atlas cluster?"
answer = rag_chain.invoke(question)
print("Question: " + question)
print("Answer: " + answer)
# Return source documents
documents = retriever.invoke(question)
print("\nSource documents:")
Question: How can I secure my MongoDB Atlas cluster?
Answer: To secure your MongoDB Atlas cluster, you can enable
authentication and IP Address whitelisting, define permissions
for users and applications, use VPC Peering for secure connectivity,
implement a Defense in Depth approach for securing deployments, and
consider using LDAP integration for centralized authorization
management. It is important to regularly review the security section
of MongoDB Atlas and continuously monitor and update security measures
to mitigate risk and maintain a secure deployment.
Source documents:
[Document(page_content='To ensure a secure system right out of the b ox,\nauthentication and I P Address whitelisting are\nautomatically enabled.\nReview the security section of the MongoD B Atlas', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe0436'), 'source': '', 'page': 17}),
Document(page_content='Security\nAs with all software, MongoD B administrators must\nconsider security and risk e xposure for a MongoD B\ndeployment. T here are no magic solutions for risk', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe0431'), 'source': '', 'page': 17}),
Document(page_content='number of diff erent methods for managing risk and\nreducing risk e xposure.\nMongoD B Atlas f eatures e xtensive capabilities to def end,\ndetect, and control access to MongoD B, off ering among', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe0433'), 'source': '', 'page': 17}),
Document(page_content='permissions for a user or application, and what data it can\naccess when querying MongoD B. MongoD B Atlas provides\nthe ability to provision users with roles specific to a', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe043b'), 'source': '', 'page': 17}),
Document(page_content='connectivity without using public I P addresses, and without\nneeding to whitelist every client in your MongoD B Atlas\ngroup.\nAuthorization\nMongoD B Atlas allows administrators to define', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe043a'), 'source': '', 'page': 17}),
Document(page_content='mitigation, and maintaining a secure MongoD B deployment\nis an ongoing process.\nDefense in Depth\nA Def ense in Depth approac h is recommended for\nsecuring MongoD B deployments, and it addresses a', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe0432'), 'source': '', 'page': 17}),
Document(page_content='optimization.\nIn addition, MongoD B Atlas provides pac kaged integration\nwith the New Relic platform. K ey metrics from MongoD B\nAtlas are accessible to the AP M for visualization, enabling', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe042e'), 'source': '', 'page': 17}),
Document(page_content='their I P address (or a C IDR covering their I P address) has\nbeen added to the IP whitelist for your MongoD B Atlas\ngroup.\nVPC P eering\nVirtual P rivate Cloud (VPC) P eering allows users to create', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe0438'), 'source': '', 'page': 17}),
Document(page_content='dedicated A tlas clusters using credentials that are verified\nby a centralized L DAP server . Authorization management is\nsimplified by allowing control at the L DAP group level.', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe043d'), 'source': '', 'page': 17}),
Document(page_content='database, making it possible to realize a separation of\nduties between diff erent entities accessing and managing\nthe data.\nAtlas supports L DAP integration, allowing users to login to', metadata={'_id': ObjectId('65fb4f056979cf7cbbfe043c'), 'source': '', 'page': 17})]