Construindo pipelines RAG com haystack e MongoDB Atlas

Pavel Duchovny4 min read • Published Sep 18, 2024 • Updated Sep 18, 2024

IA Pesquisa vetorial Python Atlas

SNIPPET

Avalie esse Início rápido

A integração dohaystack com oMongoDB Atlas permite que você crie poderosos pipelines de geração aumenta a recuperação (RAG). Este artigo introdutório guiará você pelo processo de configuração de uma pipeline RAG baseada em Pahystack usando o MongoDB Atlas para pesquisa vetorial. Nosso código usará um conjunto de dados de produtos de mercearia e o pipeline RAG pode buscar produtos relevantes para uma solicitação de preparação do usuário. Os mantimentos relevantes são passados para o LLM para um guia gerado detalhado.

Todo o código apresentado neste tutorial está disponível no repositório do GitHub.

Etapa 1: Instalar dependências

Primeiro, instale as dependências necessárias:

1 pip install haystack-ai mongodb-atlas-haystack tiktoken datasets getpass re

Etapa 2: Configurar a conexão do MongoDB Atlas e a chave de API OpenAI

Se você não criou um Atlas cluster, siga nosso guia. Defina a connection string do MongoDB e a chave de API OpenAI seguindo o guia no site de Open AI.

1 import os
2 import getpass, re; 
3 
4 conn_str = getpass.getpass("Enter your MongoDB connection string:")
5 conn_str = (re.sub(r'appName=[^\s]*', 'appName=devrel.content.python', conn_str) 
6             if 'appName=' in conn_str 
7             else conn_str + ('&' if '?' in conn_str else '?') + 'appName=devrel.content.python')
8 os.environ['MONGO_CONNECTION_STRING']=conn_str
9 print(os.environ['MONGO_CONNECTION_STRING'])

Etapa 3: crie um MongoDB Atlas Vector Search na collection

Crie um índice vetorial em seu banco de dados e collection no MongoDB Atlas. Para obter mais informações e orientações, visite nosso Atlas Vector Search Docs. Neste tutorial, o banco de dados é "ai_shop, " e o nome da collection é "test_collection. ". Certifique-se de que o nome do índice seja vector_index e especifique a seguinte sintaxe:

1 {
2   "fields": [
3     {
4       "type": "vector",
5       "path": "embedding",
6       "numDimensions": 1536,
7       "similarity": "cosine"
8     }
9   ]
10 }

Etapa 4: Configurar o armazenamento de vetores e carregar documentos

Carregue documentos no MongoDB Atlas usando a estrutura do haystack:

1 from haystack import Pipeline, Document
2 from haystack.document_stores.types import DuplicatePolicy
3 from haystack.components.writers import DocumentWriter
4 from haystack.components.embedders import OpenAIDocumentEmbedder
5 from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
6 from bson import json_util
7 
8 # Example dataset
9 dataset = {
10     "train": [
11         {"title": "Spinach Lasagna Sheets", "price": "$3.50", "description": "Infused with spinach, these sheets add a pop of color and extra nutrients.", "category": "Pasta", "emoji": "📗"},
12         {"title": "Gluten-Free Lasagna Sheets", "price": "$4.00", "description": "Perfect for those with gluten intolerance, made with a blend of rice and corn flour.", "category": "Pasta", "emoji": "🍚🌽"},
13         # Add more documents here...
14     ]
15 }
16 insert_data = []
17 for product in dataset['train']:
18     doc_product = json_util.loads(json_util.dumps(product))
19     haystack_doc = Document(content=doc_product['title'], meta=doc_product)
20     insert_data.append(haystack_doc)
21 document_store = MongoDBAtlasDocumentStore(
22     database_name="ai_shop",
23     collection_name="test_collection",
24     vector_search_index="vector_index",
25 )
26 doc_writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP)
27 doc_embedder = OpenAIDocumentEmbedder()
28 indexing_pipe = Pipeline()
29 indexing_pipe.add_component(instance=doc_embedder, name="doc_embedder")
30 indexing_pipe.add_component(instance=doc_writer, name="doc_writer")
31 indexing_pipe.connect("doc_embedder.documents", "doc_writer.documents")
32 indexing_pipe.run({"doc_embedder": {"documents": insert_data}})

Etapa 5: criar um pipeline RAG

Crie um pipeline que recuperará, aumentará e gerará uma resposta às perguntas do usuário:

1 from haystack.components.generators import OpenAIGenerator
2 from haystack.components.builders.prompt_builder import PromptBuilder
3 from haystack.components.embedders import OpenAITextEmbedder
4 from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever
5 
6 # Prompt template 
7 prompt_template = """
8     You are a chef assistant allowed to use the following context documents and only those.\nDocuments:
9     {% for doc in documents %}
10         {{ doc.content }}
11     {% endfor %}
12     \Query: {{query}}
13     \nAnswer:
14 """
15 
16 # init a pipeline
17 rag_pipeline = Pipeline()
18 
19 # Add embedder and vector store connected 
20 rag_pipeline.add_component("text_embedder", OpenAITextEmbedder())
21 rag_pipeline.add_component(instance=MongoDBAtlasEmbeddingRetriever(document_store=document_store, top_k=50), name="retriever")
22 rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
23 
24 ## Add prompt builder and connect context to prompt to LLM
25 rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
26 rag_pipeline.add_component(instance=OpenAIGenerator(model="gpt-4o"), name="llm")
27 rag_pipeline.connect("retriever", "prompt_builder.documents")
28 rag_pipeline.connect("prompt_builder", "llm")

Etapa 6: testar o pipeline

Teste o pipeline com uma consulta de amostra:

1 query = "How can I cook a lasagne?"
2 result = rag_pipeline.run(
3     {
4         "text_embedder": {"text": query},
5         "prompt_builder": {"query": query},
6     }
7 )
8 print(result['llm']['replies'][0])

Resultado esperado:

1 To cook a lasagne, you can follow this classic recipe:
2 ### Ingredients:
3 #### For the meat sauce:
4 - 2 tablespoons olive oil
5 - 1 onion, finely chopped
6 - 2 cloves garlic, minced
7 - 500g ground beef
8 - 800g canned tomatoes, crushed
9 - 2 tablespoons tomato paste
10 - 1 teaspoon dried basil
11 - 1 teaspoon dried oregano
12 - Salt and pepper to taste
13 #### For the béchamel sauce:
14 - 4 tablespoons butter
15 - 4 tablespoons all-purpose flour
16 - 500ml milk
17 - A pinch of nutmeg
18 - Salt and pepper to taste
19 #### For assembly:
20 - 250g lasagne sheets
21 - 200g mozzarella cheese, shredded
22 - 1 cup grated Parmesan cheese
23 - Fresh basil leaves for garnish (optional)
24 ### Instructions:
25 1. **Preheat the oven** to 375°F (190°C).
26 2. **Prepare the meat sauce:**
27    - Heat the olive oil in a large skillet over medium heat.
28    - Add the chopped onion and cook until soft and translucent, about 5 minutes.
29    - Stir in the minced garlic and cook for another minute.
30    - Add the ground beef and cook until browned, breaking it up with a spoon as it cooks.
31    - Stir in the crushed tomatoes, tomato paste, dried basil, and dried oregano.
32    - Season with salt and pepper, then reduce the heat to low.
33    - Let the sauce simmer for 30 minutes, stirring occasionally.
34 3. **Prepare the béchamel sauce:**
35    - In a medium saucepan, melt the butter over medium heat.
36    - Add the flour and whisk continuously for about 2 minutes to create a roux.
37    - Gradually add the milk while whisking to prevent lumps from forming.
38    - Cook the mixture, whisking constantly, until it thickens, about 5-7 minutes.
39    - Season with a pinch of nutmeg, salt, and pepper.
40 4. **Assemble the lasagne:**
41    - Spread a thin layer of the meat sauce on the bottom of a 9x13 inch baking dish.
42    - Place a layer of lasagne sheets over the sauce.
43    - Spread another layer of meat sauce over the lasagne sheets, followed by a layer of béchamel sauce.
44    - Sprinkle some shredded mozzarella cheese over the béchamel sauce.
45    - Repeat the layers until all the ingredients are used, finishing with a layer of béchamel sauce and a generous topping of mozzarella and Parmesan cheese.
46 5. **Bake the lasagne:**
47    - Cover the baking dish with aluminum foil.
48    - Bake in the preheated oven for 30 minutes.
49    - Remove the foil and bake for an additional 15 minutes, or until the top is golden brown and bubbling.
50 6. **Rest and serve:**
51    - Remove the lasagne from the oven and let it rest for 10-15 minutes before slicing.
52    - Garnish with fresh basil leaves if desired, and serve.
53 Enjoy your delicious homemade lasagne!

Conclusão

Neste artigo, você aprendera como integrar o Hastack ao MongoDB Atlas para construir um pipeline RAG. Essa combinação poderosa permite que você aproveite a pesquisa vetorial e a geração aumentada de recuperação para criar aplicativos sofisticados e responsivos.

Para explorar mais tópicos sobre RAG, dê uma olhada nos seguintes tutoriais:

Se você tiver dúvidas ou quiser se conectar com outros desenvolvedores, Junte-se a nós na Comunidade de desenvolvedores MongoDB. Obrigado por ler.

Avalie esse Início rápido

Relacionado

Tutorial

Crie aplicativos inteligentes com o Atlas Vector Search e o Google Vertex AI

Sep 18, 2024 | 4 min read

Tutorial

Introdução à experiência de pesquisa local do MongoDB Atlas usando Docker

Jan 21, 2025 | 6 min read

Tutorial

Criar um backend de gerenciamento de mídia escalável: integrando Node.js, Armazenamento de blobs Azure e MongoDB

Nov 05, 2024 | 10 min read

Tutorial

Construindo um Painel de Vendas Dinâmico e em Tempo Real no MongoDB

Aug 05, 2024 | 7 min read

Sumário

Etapa 1: Instalar dependências
Etapa 2: Configurar a conexão do MongoDB Atlas e a chave de API OpenAI
Etapa 3: crie um MongoDB Atlas Vector Search na collection
Etapa 4: Configurar o armazenamento de vetores e carregar documentos
Etapa 5: criar um pipeline RAG
Etapa 6: testar o pipeline
Conclusão

1	import os
2	import getpass, re;
3
4	conn_str = getpass.getpass("Enter your MongoDB connection string:")
5	conn_str = (re.sub(r'appName=[^\s]*', 'appName=devrel.content.python', conn_str)
6	if 'appName=' in conn_str
7	else conn_str + ('&' if '?' in conn_str else '?') + 'appName=devrel.content.python')
8	os.environ['MONGO_CONNECTION_STRING']=conn_str
9	print(os.environ['MONGO_CONNECTION_STRING'])

1	{
2	"fields": [
3	{
4	"type": "vector",
5	"path": "embedding",
6	"numDimensions": 1536,
7	"similarity": "cosine"
8	}
9	]
10	}

1	from haystack import Pipeline, Document
2	from haystack.document_stores.types import DuplicatePolicy
3	from haystack.components.writers import DocumentWriter
4	from haystack.components.embedders import OpenAIDocumentEmbedder
5	from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
6	from bson import json_util
7
8	# Example dataset
9	dataset = {
10	"train": [
11	{"title": "Spinach Lasagna Sheets", "price": "$3.50", "description": "Infused with spinach, these sheets add a pop of color and extra nutrients.", "category": "Pasta", "emoji": "📗"},
12	{"title": "Gluten-Free Lasagna Sheets", "price": "$4.00", "description": "Perfect for those with gluten intolerance, made with a blend of rice and corn flour.", "category": "Pasta", "emoji": "🍚🌽"},
13	# Add more documents here...
14	]
15	}
16	insert_data = []
17	for product in dataset['train']:
18	doc_product = json_util.loads(json_util.dumps(product))
19	haystack_doc = Document(content=doc_product['title'], meta=doc_product)
20	insert_data.append(haystack_doc)
21	document_store = MongoDBAtlasDocumentStore(
22	database_name="ai_shop",
23	collection_name="test_collection",
24	vector_search_index="vector_index",
25	)
26	doc_writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP)
27	doc_embedder = OpenAIDocumentEmbedder()
28	indexing_pipe = Pipeline()
29	indexing_pipe.add_component(instance=doc_embedder, name="doc_embedder")
30	indexing_pipe.add_component(instance=doc_writer, name="doc_writer")
31	indexing_pipe.connect("doc_embedder.documents", "doc_writer.documents")
32	indexing_pipe.run({"doc_embedder": {"documents": insert_data}})

1	from haystack.components.generators import OpenAIGenerator
2	from haystack.components.builders.prompt_builder import PromptBuilder
3	from haystack.components.embedders import OpenAITextEmbedder
4	from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever
5
6	# Prompt template
7	prompt_template = """
8	You are a chef assistant allowed to use the following context documents and only those.\nDocuments:
9	{% for doc in documents %}
10	{{ doc.content }}
11	{% endfor %}
12	\Query: {{query}}
13	\nAnswer:
14	"""
15
16	# init a pipeline
17	rag_pipeline = Pipeline()
18
19	# Add embedder and vector store connected
20	rag_pipeline.add_component("text_embedder", OpenAITextEmbedder())
21	rag_pipeline.add_component(instance=MongoDBAtlasEmbeddingRetriever(document_store=document_store, top_k=50), name="retriever")
22	rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
23
24	## Add prompt builder and connect context to prompt to LLM
25	rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
26	rag_pipeline.add_component(instance=OpenAIGenerator(model="gpt-4o"), name="llm")
27	rag_pipeline.connect("retriever", "prompt_builder.documents")
28	rag_pipeline.connect("prompt_builder", "llm")

1	query = "How can I cook a lasagne?"
2	result = rag_pipeline.run(
3	{
4	"text_embedder": {"text": query},
5	"prompt_builder": {"query": query},
6	}
7	)
8	print(result['llm']['replies'][0])

1	To cook a lasagne, you can follow this classic recipe:
2	### Ingredients:
3	#### For the meat sauce:
4	- 2 tablespoons olive oil
5	- 1 onion, finely chopped
6	- 2 cloves garlic, minced
7	- 500g ground beef
8	- 800g canned tomatoes, crushed
9	- 2 tablespoons tomato paste
10	- 1 teaspoon dried basil
11	- 1 teaspoon dried oregano
12	- Salt and pepper to taste
13	#### For the béchamel sauce:
14	- 4 tablespoons butter
15	- 4 tablespoons all-purpose flour
16	- 500ml milk
17	- A pinch of nutmeg
18	- Salt and pepper to taste
19	#### For assembly:
20	- 250g lasagne sheets
21	- 200g mozzarella cheese, shredded
22	- 1 cup grated Parmesan cheese
23	- Fresh basil leaves for garnish (optional)
24	### Instructions:
25	1. Preheat the oven to 375°F (190°C).
26	2. Prepare the meat sauce:
27	- Heat the olive oil in a large skillet over medium heat.
28	- Add the chopped onion and cook until soft and translucent, about 5 minutes.
29	- Stir in the minced garlic and cook for another minute.
30	- Add the ground beef and cook until browned, breaking it up with a spoon as it cooks.
31	- Stir in the crushed tomatoes, tomato paste, dried basil, and dried oregano.
32	- Season with salt and pepper, then reduce the heat to low.
33	- Let the sauce simmer for 30 minutes, stirring occasionally.
34	3. Prepare the béchamel sauce:
35	- In a medium saucepan, melt the butter over medium heat.
36	- Add the flour and whisk continuously for about 2 minutes to create a roux.
37	- Gradually add the milk while whisking to prevent lumps from forming.
38	- Cook the mixture, whisking constantly, until it thickens, about 5-7 minutes.
39	- Season with a pinch of nutmeg, salt, and pepper.
40	4. Assemble the lasagne:
41	- Spread a thin layer of the meat sauce on the bottom of a 9x13 inch baking dish.
42	- Place a layer of lasagne sheets over the sauce.
43	- Spread another layer of meat sauce over the lasagne sheets, followed by a layer of béchamel sauce.
44	- Sprinkle some shredded mozzarella cheese over the béchamel sauce.
45	- Repeat the layers until all the ingredients are used, finishing with a layer of béchamel sauce and a generous topping of mozzarella and Parmesan cheese.
46	5. Bake the lasagne:
47	- Cover the baking dish with aluminum foil.
48	- Bake in the preheated oven for 30 minutes.
49	- Remove the foil and bake for an additional 15 minutes, or until the top is golden brown and bubbling.
50	6. Rest and serve:
51	- Remove the lasagne from the oven and let it rest for 10-15 minutes before slicing.
52	- Garnish with fresh basil leaves if desired, and serve.
53	Enjoy your delicious homemade lasagne!