使用 LangChain 集成执行混合搜索

在此页面上

先决条件
设置环境
使用 Atlas 作为向量存储
创建索引
运行混合搜索查询
将结果传递到 RAG 管道

您可以将Atlas Vector Search与 LangChain 集成以执行混合搜索。在本教程中，您将完成以下步骤：

设置环境。
将Atlas用作向量存储。
对数据创建Atlas Vector Search和Atlas Search索引。
运行混合搜索查询。
将查询结果传递到 RAG管道。

提示

使用本教程的可运行版本以作为 Python 笔记本。

先决条件

如要完成本教程，您必须具备以下条件：

一个 Atlas 帐户，而其集群运行着 MongoDB 版本 6.0.11、7.0.2 或更高版本（包括 RC）。确保您的 IP 地址包含在 Atlas 项目的访问列表中。如需了解详情，请参阅创建集群。
OpenAI API密钥。您必须拥有一个具有可用于API请求的积分的 OpenAI 帐户。要学习；了解有关注册 OpenAI 帐户的更多信息，请参阅 OpenAI API网站。
运行交互式 Python 笔记本（例如 Colab）的环境。

设置环境

为此教程设置环境。通过保存具有 .ipynb 扩展名的文件来创建交互式Python笔记本。此 Notebook 允许您单独运行Python代码片段，并且您将使用它来运行本教程中的代码。

要设立笔记本环境，请执行以下操作：

安装并导入依赖项。

在笔记本中运行以下命令：

pip install --quiet --upgrade langchain langchain-community langchain-core langchain-mongodb langchain-openai pymongo pypdf

设置环境变量。

运行以下代码为本教程设立环境变量。根据提示提供 OpenAI API密钥和Atlas集群的 SRV连接字符串。

import os
os.environ["OPENAI_API_KEY"] = "<api-key>"
ATLAS_CONNECTION_STRING = "<connection-string>"

注意

连接字符串应使用以下格式：

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

使用 Atlas 作为向量存储

您必须使用Atlas作为数据的向量存储。您可以使用Atlas中的现有集合来实例化向量存储。

加载示例数据。

如果还没有，请完成将示例数据加载到Atlas 集群的步骤。

实例化向量存储。

在笔记本中粘贴并运行以下代码，以从Atlas中的 sample_mflix.embedded_movies命名空间创建一个名为 vector_store 的向量存储实例。此代码使用 from_connection_string 方法创建 MongoDBAtlasVectorSearch 向量存储并指定以下参数：

您的Atlas集群的连接字符串。
OpenAI 嵌入模型作为用于将文本转换为向量嵌入的模型。默认下，此模型为 text-embedding-ada-002。
sample_mflix.embedded movies 作为要使用的命名空间空间。
plot 作为包含文本的字段。
plot_embedding 作为包含嵌入的字段。
dotProduct 作为相关性得分函数。

from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_openai import OpenAIEmbeddings
# Create the vector store
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
   connection_string = ATLAS_CONNECTION_STRING,
   embedding = OpenAIEmbeddings(disallowed_special=()),
   namespace = "sample_mflix.embedded_movies",
   text_key = "plot",
   embedding_key = "plot_embedding",
   relevance_score_fn = "dotProduct"
)

提示

MongoDBAtlasVectorSearch API参考

创建索引

注意

要创建Atlas Vector Search或Atlas Search索引，您必须对Atlas项目具有Project Data Access Admin 或更高访问权限。

要在向量存储上启用混合搜索查询，请在集合上创建Atlas Vector Search和Atlas Search索引。您可以使用 LangChain 辅助方法或PyMongo驱动程序方法创建索引：

创建 Atlas Vector Search 索引。

运行以下代码以创建向量搜索索引，为集合中的plot_embedding 字段编制索引。

# Use helper method to create the vector search index
vector_store.create_vector_search_index(
   dimensions = 1536 # The dimensions of the vector embeddings to be indexed
)

提示

create_vector_search_index API参考

创建Atlas Search索引。

在笔记本中运行以下代码以创建搜索索引，为集合中的 plot字段建立索引。

from langchain_mongodb.index import create_fulltext_search_index
from pymongo import MongoClient
# Connect to your cluster
client = MongoClient(ATLAS_CONNECTION_STRING)
# Use helper method to create the search index
create_fulltext_search_index(
   collection = client["sample_mflix"]["embedded_movies"],
   field = "plot",
   index_name = "search_index"
)

提示

create_fulltext_search_index API参考

创建 Atlas Vector Search 索引。

运行以下代码以创建向量搜索索引，为集合中的plot_embedding 字段编制索引。

from pymongo import MongoClient
from pymongo.operations import SearchIndexModel
# Connect to your cluster
client = MongoClient(ATLAS_CONNECTION_STRING)
collection = client["sample_mflix"]["embedded_movies"]
# Create your vector search index model, then create the index
vector_index_model = SearchIndexModel(
   definition={
      "fields": [
         {
         "type": "vector",
         "path": "plot_embedding",
         "numDimensions": 1536,
         "similarity": "dotProduct"
         }
      ]
   },
   name="vector_index",
   type="vectorSearch"
)
collection.create_search_index(model=vector_index_model)

创建Atlas Search索引。

运行以下代码以创建搜索索引，为集合中的plot 字段编制索引。

1 # Create your search index model, then create the search index
2 search_index_model = SearchIndexModel(
3    definition={
4       "mappings": {
5             "dynamic": False,
6             "fields": {
7                "plot": {
8                   "type": "string"
9                }
10             }
11       }
12    },
13    name="search_index"
14 )
15 collection.create_search_index(model=search_index_model)

构建索引大约需要一分钟时间。在建立索引时，索引处于初始同步状态。构建完成后，您可以开始查询集合中的数据。

运行混合搜索查询

Atlas构建索引后，您可以对数据运行混合搜索查询。以下代码使用MongoDBAtlasHybridSearchRetriever 检索器对字符串time travel 执行混合搜索。它还指定了以下参数：

vectorstore：向量存储实例的名称。
search_index_name： Atlas Search索引的名称。
top_k：要返回的文档数。
fulltext_penalty：全文搜索的惩罚。
惩罚越低，全文搜索分数就越高。
vector_penalty：向量搜索的惩罚。
惩罚越低，向量搜索分数就越高。

检索器返回按全文搜索分数和向量搜索分数之和排序的文档列表。代码示例的最终输出包括标题、图表和每个文档的不同分数。

要学习；了解有关混合搜索查询结果的更多信息，请参阅关于查询。

from langchain_mongodb.retrievers.hybrid_search import MongoDBAtlasHybridSearchRetriever
# Initialize the retriever
retriever = MongoDBAtlasHybridSearchRetriever(
    vectorstore = vector_store,
    search_index_name = "search_index",
    top_k = 5,
    fulltext_penalty = 50,
    vector_penalty = 50
)
# Define your query
query = "time travel"
# Print results
documents = retriever.invoke(query)
for doc in documents:
   print("Title: " + doc.metadata["title"])
   print("Plot: " + doc.page_content)
   print("Search score: {}".format(doc.metadata["fulltext_score"]))
   print("Vector Search score: {}".format(doc.metadata["vector_score"]))
   print("Total score: {}\n".format(doc.metadata["fulltext_score"] + doc.metadata["vector_score"]))

Title: Timecop
Plot: An officer for a security agency that regulates time travel, must fend for his life against a shady politician who has a tie to his past.
Search score: 0.019230769230769232
Vector Search score: 0.01818181818181818
Total score: 0.03741258741258741
Title: The Time Traveler's Wife
Plot: A romantic drama about a Chicago librarian with a gene that causes him to involuntarily time travel, and the complications it creates for his marriage.
Search score: 0.0196078431372549
Vector Search score: 0
Total score: 0.0196078431372549
Title: Thrill Seekers
Plot: A reporter, learning of time travelers visiting 20th century disasters, tries to change the history they know by averting upcoming disasters.
Search score: 0
Vector Search score: 0.0196078431372549
Total score: 0.0196078431372549
Title: About Time
Plot: At the age of 21, Tim discovers he can travel in time and change what happens and has happened in his own life. His decision to make his world a better place by getting a girlfriend turns out not to be as easy as you might think.
Search score: 0
Vector Search score: 0.019230769230769232
Total score: 0.019230769230769232
Title: My iz budushchego
Plot: My iz budushchego, or We Are from the Future, is a movie about time travel. Four 21st century treasure seekers are transported back into the middle of a WWII battle in Russia. The movie's ...
Search score: 0.018867924528301886
Vector Search score: 0
Total score: 0.018867924528301886

提示

MongoDBAtlasHybridSearchRetriever API参考

将结果传递到 RAG 管道

您可以将混合搜索结果传递到 RAG管道中，以便对检索到的文档生成响应。示例代码执行以下操作：

定义 LangChain 提示模板，指示 LLM 使用检索到的文档作为查询的上下文。LangChain 将这些文档传递给 {context} 输入变量，并将您的查询传递给 {query} 变量。
构建一条链指定以下内容：
- 您定义的用于检索相关文档的混合搜索检索器。
- 您定义的提示模板。
- OpenAI 的法学硕士，用于生成上下文感知响应。默认下，这是 gpt-3.5-turbo 模型。
使用示例查询提示链并返回响应。生成的响应可能会有所不同。

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import  RunnablePassthrough
from langchain_openai import ChatOpenAI
# Define a prompt template
template = """
   Use the following pieces of context to answer the question at the end.
   {context}
   Question: Can you recommend some movies about {query}?
"""
prompt = PromptTemplate.from_template(template)
model = ChatOpenAI()
# Construct a chain to answer questions on your data
chain = (
   {"context": retriever, "query": RunnablePassthrough()}
   | prompt
   | model
   | StrOutputParser()
)
# Prompt the chain
query = "time travel"
answer = chain.invoke(query)
print(answer)

Based on the pieces of context provided, here are some movies about time travel that you may find interesting:
1. "Timecop" (1994) - A movie about a cop who is part of a law enforcement agency that regulates time travel, seeking justice and dealing with personal loss.
2. "The Time Traveler's Wife" (2009) - A romantic drama about a man with the ability to time travel involuntarily and the impact it has on his relationship with his wife.
3. "Thrill Seekers" (1999) - A movie about two reporters trying to prevent disasters by tracking down a time traveler witnessing major catastrophes.
4. "About Time" (2013) - A film about a man who discovers he can travel through time and uses this ability to improve his life and relationships.
5. "My iz budushchego" (2008) - A Russian movie where four treasure seekers from the 21st century are transported back to a WWII battle, exploring themes of action, drama, fantasy, and romance.
These movies offer a variety of perspectives on time travel and its impact on individuals and society.

后退

开始体验

来年

Parent Document Retrieval

1	# Create your search index model, then create the search index
2	search_index_model = SearchIndexModel(
3	definition={
4	"mappings": {
5	"dynamic": False,
6	"fields": {
7	"plot": {
8	"type": "string"
9	}
10	}
11	}
12	},
13	name="search_index"
14	)
15	collection.create_search_index(model=search_index_model)