Building RAG Pipelines With Haystack and MongoDB Atlas

Pavel Duchovny4 min read • Published Sep 18, 2024 • Updated Sep 18, 2024

AI Vector Search Python Atlas

SNIPPET

Rate this quickstart

Integrating Haystack with MongoDB Atlas allows you to build powerful retrieval-augmented generation (RAG) pipelines. This introductory article will guide you through the process of setting up a Haystack-based RAG pipeline using MongoDB Atlas for vector search. Our code will use a grocery product dataset and the RAG pipeline can fetch relevant products for a user cooking request. Relevant groceries are passed to the LLM for a detailed generated guide.

All code presented in this tutorial is available in the GitHub repository.

Step 1: Install dependencies

First, install the necessary dependencies:

1 pip install haystack-ai mongodb-atlas-haystack tiktoken datasets getpass re

Step 2: Set up MongoDB Atlas connection and OpenAI API key

If you have not created an Atlas cluster, follow our guide. Set the MongoDB connection string and OpenAI API key by following the guide on Open AI website.

1 import os
2 import getpass, re; 
3 
4 conn_str = getpass.getpass("Enter your MongoDB connection string:")
5 conn_str = (re.sub(r'appName=[^\s]*', 'appName=devrel.content.python', conn_str) 
6             if 'appName=' in conn_str 
7             else conn_str + ('&' if '?' in conn_str else '?') + 'appName=devrel.content.python')
8 os.environ['MONGO_CONNECTION_STRING']=conn_str
9 print(os.environ['MONGO_CONNECTION_STRING'])

Step 3: Create a Vector Search index on collection

Create a vector index on your database and collection in MongoDB Atlas. For more information and guidance, visit our Atlas Vector Search index docs. In this tutorial, the database is “ai_shop,”, and the collection name is “test_collection.”. Ensure that the index name is vector_index and specify the following syntax:

1 {
2   "fields": [
3     {
4       "type": "vector",
5       "path": "embedding",
6       "numDimensions": 1536,
7       "similarity": "cosine"
8     }
9   ]
10 }

Step 4: Set up vector store and load documents

Load documents into MongoDB Atlas using the Haystack framework:

1 from haystack import Pipeline, Document
2 from haystack.document_stores.types import DuplicatePolicy
3 from haystack.components.writers import DocumentWriter
4 from haystack.components.embedders import OpenAIDocumentEmbedder
5 from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
6 from bson import json_util
7 
8 # Example dataset
9 dataset = {
10     "train": [
11         {"title": "Spinach Lasagna Sheets", "price": "$3.50", "description": "Infused with spinach, these sheets add a pop of color and extra nutrients.", "category": "Pasta", "emoji": "📗"},
12         {"title": "Gluten-Free Lasagna Sheets", "price": "$4.00", "description": "Perfect for those with gluten intolerance, made with a blend of rice and corn flour.", "category": "Pasta", "emoji": "🍚🌽"},
13         # Add more documents here...
14     ]
15 }
16 insert_data = []
17 for product in dataset['train']:
18     doc_product = json_util.loads(json_util.dumps(product))
19     haystack_doc = Document(content=doc_product['title'], meta=doc_product)
20     insert_data.append(haystack_doc)
21 document_store = MongoDBAtlasDocumentStore(
22     database_name="ai_shop",
23     collection_name="test_collection",
24     vector_search_index="vector_index",
25 )
26 doc_writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP)
27 doc_embedder = OpenAIDocumentEmbedder()
28 indexing_pipe = Pipeline()
29 indexing_pipe.add_component(instance=doc_embedder, name="doc_embedder")
30 indexing_pipe.add_component(instance=doc_writer, name="doc_writer")
31 indexing_pipe.connect("doc_embedder.documents", "doc_writer.documents")
32 indexing_pipe.run({"doc_embedder": {"documents": insert_data}})

Step 5: Build a RAG pipeline

Create a pipeline that will retrieve, augment, and generate a response to user questions:

1 from haystack.components.generators import OpenAIGenerator
2 from haystack.components.builders.prompt_builder import PromptBuilder
3 from haystack.components.embedders import OpenAITextEmbedder
4 from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever
5 
6 # Prompt template 
7 prompt_template = """
8     You are a chef assistant allowed to use the following context documents and only those.\nDocuments:
9     {% for doc in documents %}
10         {{ doc.content }}
11     {% endfor %}
12     \Query: {{query}}
13     \nAnswer:
14 """
15 
16 # init a pipeline
17 rag_pipeline = Pipeline()
18 
19 # Add embedder and vector store connected 
20 rag_pipeline.add_component("text_embedder", OpenAITextEmbedder())
21 rag_pipeline.add_component(instance=MongoDBAtlasEmbeddingRetriever(document_store=document_store, top_k=50), name="retriever")
22 rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
23 
24 ## Add prompt builder and connect context to prompt to LLM
25 rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
26 rag_pipeline.add_component(instance=OpenAIGenerator(model="gpt-4o"), name="llm")
27 rag_pipeline.connect("retriever", "prompt_builder.documents")
28 rag_pipeline.connect("prompt_builder", "llm")

Step 6: Test the pipeline

Test the pipeline with a sample query:

1 query = "How can I cook a lasagne?"
2 result = rag_pipeline.run(
3     {
4         "text_embedder": {"text": query},
5         "prompt_builder": {"query": query},
6     }
7 )
8 print(result['llm']['replies'][0])

Expected output:

1 To cook a lasagne, you can follow this classic recipe:
2 ### Ingredients:
3 #### For the meat sauce:
4 - 2 tablespoons olive oil
5 - 1 onion, finely chopped
6 - 2 cloves garlic, minced
7 - 500g ground beef
8 - 800g canned tomatoes, crushed
9 - 2 tablespoons tomato paste
10 - 1 teaspoon dried basil
11 - 1 teaspoon dried oregano
12 - Salt and pepper to taste
13 #### For the béchamel sauce:
14 - 4 tablespoons butter
15 - 4 tablespoons all-purpose flour
16 - 500ml milk
17 - A pinch of nutmeg
18 - Salt and pepper to taste
19 #### For assembly:
20 - 250g lasagne sheets
21 - 200g mozzarella cheese, shredded
22 - 1 cup grated Parmesan cheese
23 - Fresh basil leaves for garnish (optional)
24 ### Instructions:
25 1. **Preheat the oven** to 375°F (190°C).
26 2. **Prepare the meat sauce:**
27    - Heat the olive oil in a large skillet over medium heat.
28    - Add the chopped onion and cook until soft and translucent, about 5 minutes.
29    - Stir in the minced garlic and cook for another minute.
30    - Add the ground beef and cook until browned, breaking it up with a spoon as it cooks.
31    - Stir in the crushed tomatoes, tomato paste, dried basil, and dried oregano.
32    - Season with salt and pepper, then reduce the heat to low.
33    - Let the sauce simmer for 30 minutes, stirring occasionally.
34 3. **Prepare the béchamel sauce:**
35    - In a medium saucepan, melt the butter over medium heat.
36    - Add the flour and whisk continuously for about 2 minutes to create a roux.
37    - Gradually add the milk while whisking to prevent lumps from forming.
38    - Cook the mixture, whisking constantly, until it thickens, about 5-7 minutes.
39    - Season with a pinch of nutmeg, salt, and pepper.
40 4. **Assemble the lasagne:**
41    - Spread a thin layer of the meat sauce on the bottom of a 9x13 inch baking dish.
42    - Place a layer of lasagne sheets over the sauce.
43    - Spread another layer of meat sauce over the lasagne sheets, followed by a layer of béchamel sauce.
44    - Sprinkle some shredded mozzarella cheese over the béchamel sauce.
45    - Repeat the layers until all the ingredients are used, finishing with a layer of béchamel sauce and a generous topping of mozzarella and Parmesan cheese.
46 5. **Bake the lasagne:**
47    - Cover the baking dish with aluminum foil.
48    - Bake in the preheated oven for 30 minutes.
49    - Remove the foil and bake for an additional 15 minutes, or until the top is golden brown and bubbling.
50 6. **Rest and serve:**
51    - Remove the lasagne from the oven and let it rest for 10-15 minutes before slicing.
52    - Garnish with fresh basil leaves if desired, and serve.
53 Enjoy your delicious homemade lasagne!

Conclusion

In this article, you learned how to integrate Haystack with MongoDB Atlas to build a RAG pipeline. This powerful combination allows you to leverage vector search and retrieval-augmented generation to create sophisticated and responsive applications.

To explore more topics on RAG, have a look at the following tutorials:

If you have questions or want to connect with other developers, join us in the MongoDB Developer Community. Thanks for reading.

Rate this quickstart

Tutorial

RAG with Atlas Vector Search, LangChain, and OpenAI

Sep 18, 2024 | 10 min read

Tutorial

Building a Multi-Environment Continuous Delivery Pipeline for MongoDB Atlas

Mar 13, 2025 | 8 min read

Article

Exact Matches in Atlas Search: Beginners Guide

Oct 09, 2024 | 5 min read

Quickstart

Building AI and RAG Apps With MongoDB, Anyscale and PyMongo

Mar 13, 2025 | 7 min read

Step 1: Install dependencies
Step 2: Set up MongoDB Atlas connection and OpenAI API key
Step 3: Create a Vector Search index on collection
Step 4: Set up vector store and load documents
Step 5: Build a RAG pipeline
Step 6: Test the pipeline
Conclusion

Atlas