开始使用 LangChainGo 集成
您可以将Atlas Vector Search与 LangChainGo 集成,以构建大型语言模型 (LLM) 应用程序并实现检索增强生成 (RAG)。本教程演示如何开始将Atlas Vector Search与 LangChainGo 结合使用,对数据执行语义搜索并构建RAG实施。具体来说,您执行以下操作:
设置环境。
在 Atlas 上存储自定义数据。
在您的数据上创建一个 Atlas Vector Search 索引。
运行以下向量搜索查询:
语义搜索。
带元数据预过滤的语义搜索。
使用 Atlas Vector Search 来回答有关数据的问题,从而实施RAG 。
背景
LangChainGo 是 LangChain 的Go编程语言实施。它是社区驱动的 LangChain框架的第三方端口。
LangChain 是一个开源框架,可通过使用“链”来简化LLM应用程序的创建。 链是 LangChain 特有的组件,可组合用于各种 AI 使用案例,包括RAG 。
通过将Atlas Vector Search与 LangChain 集成,您可以将Atlas用作向量数据库,并使用Atlas Vector Search从数据中检索语义相似的文档来实现RAG。要学习;了解有关 RAG 的更多信息,请参阅 使用Atlas Vector Search进行检索增强生成 (RAG)。
LangChainGo 促进了AI应用程序的法学硕士编排,将 LangChain 的功能带入Go生态系统。它还允许开发者使用向量存储连接到他们首选的数据库,包括MongoDB。
先决条件
如要完成本教程,您必须具备以下条件:
一个 Atlas 帐户,而其集群运行着 MongoDB 版本 6.0.11、7.0.2 或更高版本(包括 RC)。确保您的 IP 地址包含在 Atlas 项目的访问列表中。如需了解详情,请参阅创建集群。
一个 OpenAI API 密钥。您必须拥有一个 OpenAI 账号,该账号具有可用于 API 请求的信用额度。要了解有关注册 OpenAI 账号的更多信息,请参阅 OpenAI API 网站。
用于运行 Go 项目的终端和代码编辑器。
设置环境
您必须首先为本教程设立环境。请完成以下步骤以设立您的环境。
使用 Atlas 作为向量存储
在本部分中,您将定义一个异步函数以将自定义数据加载到Atlas中,并将Atlas实例化为向量数据库(也称为向量存储)。
导入以下依赖项。
将以下导入添加到 main.go
文件的顶部。
package main import ( "context" "log" "os" "github.com/joho/godotenv" "github.com/tmc/langchaingo/embeddings" "github.com/tmc/langchaingo/llms/openai" "github.com/tmc/langchaingo/schema" "github.com/tmc/langchaingo/vectorstores/mongovector" "go.mongodb.org/mongo-driver/v2/mongo" "go.mongodb.org/mongo-driver/v2/mongo/options" )
定义向量存储详细信息。
以下代码执行这些操作:
通过指定以下内容,将Atlas配置为向量存储:
langchaingo_db.test
作为Atlas中的集合,用于存储文档。vector_index
作为用于查询向量存储的索引。text
作为包含原始文本内容的字段的名称。embedding
作为包含向量嵌入的字段的名称。
通过执行以下操作来准备自定义数据:
为每个文档定义文本。
使用 LangChainGo 的
mongovector
包生成文本的嵌入。此包将文档嵌入存储在MongoDB中,并支持对存储的嵌入进行搜索。构造包含 文本、嵌入内容 和元数据 的文档。
将构建的文档摄入Atlas并实例化向量存储。
将以下代码粘贴到 main.go
文件中:
// Defines the document structure type Document struct { PageContent string `bson:"text"` Embedding []float32 `bson:"embedding"` Metadata map[string]string `bson:"metadata"` } func main() { const ( openAIEmbeddingModel = "text-embedding-3-small" openAIEmbeddingDim = 1536 similarityAlgorithm = "dotProduct" indexName = "vector_index" databaseName = "langchaingo_db" collectionName = "test" ) if err := godotenv.Load(); err != nil { log.Fatal("No .env file found") } // Loads the MongoDB URI from environment uri := os.Getenv("ATLAS_CONNECTION_STRING") if uri == "" { log.Fatal("Set your 'ATLAS_CONNECTION_STRING' environment variable in the .env file") } // Loads the API key from environment apiKey := os.Getenv("OPENAI_API_KEY") if apiKey == "" { log.Fatal("Set your OPENAI_API_KEY environment variable in the .env file") } // Connects to MongoDB Atlas client, err := mongo.Connect(options.Client().ApplyURI(uri)) if err != nil { log.Fatalf("Failed to connect to server: %v", err) } defer func() { if err := client.Disconnect(context.Background()); err != nil { log.Fatalf("Error disconnecting the client: %v", err) } }() log.Println("Connected to MongoDB Atlas.") // Selects the database and collection coll := client.Database(databaseName).Collection(collectionName) // Creates an OpenAI LLM embedder client llm, err := openai.New(openai.WithEmbeddingModel(openAIEmbeddingModel)) if err != nil { log.Fatalf("Failed to create an embedder client: %v", err) } // Creates an embedder from the embedder client embedder, err := embeddings.NewEmbedder(llm) if err != nil { log.Fatalf("Failed to create an embedder: %v", err) } // Creates a new MongoDB Atlas vector store store := mongovector.New(coll, embedder, mongovector.WithIndex(indexName), mongovector.WithPath("embeddings")) // Checks if the collection is empty, and if empty, adds documents to the MongoDB Atlas database vector store if isCollectionEmpty(coll) { documents := []schema.Document{ { PageContent: "Proper tuber planting involves site selection, proper timing, and exceptional care. Choose spots with well-drained soil and adequate sun exposure. Tubers are generally planted in spring, but depending on the plant, timing varies. Always plant with the eyes facing upward at a depth two to three times the tuber's height. Ensure 4 inch spacing between small tubers, expand to 12 inches for large ones. Adequate moisture is needed, yet do not overwater. Mulching can help preserve moisture and prevent weed growth.", Metadata: map[string]any{ "author": "A", "type": "post", }, }, { PageContent: "Successful oil painting necessitates patience, proper equipment, and technique. Begin with a carefully prepared, primed canvas. Sketch your composition lightly before applying paint. Use high-quality brushes and oils to create vibrant, long-lasting artworks. Remember to paint 'fat over lean,' meaning each subsequent layer should contain more oil to prevent cracking. Allow each layer to dry before applying another. Clean your brushes often and avoid solvents that might damage them. Finally, always work in a well-ventilated space.", Metadata: map[string]any{ "author": "B", "type": "post", }, }, { PageContent: "For a natural lawn, selection of the right grass type suitable for your climate is crucial. Balanced watering, generally 1 to 1.5 inches per week, is important; overwatering invites disease. Opt for organic fertilizers over synthetic versions to provide necessary nutrients and improve soil structure. Regular lawn aeration helps root growth and prevents soil compaction. Practice natural pest control and consider overseeding to maintain a dense sward, which naturally combats weeds and pest.", Metadata: map[string]any{ "author": "C", "type": "post", }, }, } _, err := store.AddDocuments(context.Background(), documents) if err != nil { log.Fatalf("Error adding documents: %v", err) } log.Printf("Successfully added %d documents to the collection.\n", len(documents)) } else { log.Println("Documents already exist in the collection, skipping document addition.") } } func isCollectionEmpty(coll *mongo.Collection) bool { count, err := coll.EstimatedDocumentCount(context.Background()) if err != nil { log.Fatalf("Failed to count documents in the collection: %v", err) } return count == 0 }
运行您的Go项目。
保存文件,然后运行以下命令将数据加载到 Atlas。
go run main.go
Connected to MongoDB Atlas. Successfully added 3 documents to the collection.
提示
运行 main.go
后,您可以通过导航到集群中的 langchaingo_db.test
集合在 Atlas 用户界面中查看矢量嵌入。
创建 Atlas Vector Search 索引
注意
要创建 Atlas Vector Search 索引,您必须对 Atlas 项目具有Project Data Access Admin
或更高访问权限。
要在向量存储上启用向量搜索查询,请在langchaingo_db.test
集合上创建 Atlas Vector Search 索引。
将以下导入添加到 main.go
文件的顶部:
import ( // Other imports... "fmt" "time" "go.mongodb.org/mongo-driver/v2/bson" )
在 main.go
文件中的 main()
函数之外定义以下函数。这些函数可为MongoDB集合创建和管理向量搜索索引:
SearchIndexExists
函数检查具有指定名称的搜索索引是否存在且可查询。CreateVectorSearchIndex
函数在指定集合上创建向量搜索索引。此函数会阻塞,直到索引创建完成且可查询。
// Checks if the search index exists func SearchIndexExists(ctx context.Context, coll *mongo.Collection, idx string) (bool, error) { log.Println("Checking if search index exists.") view := coll.SearchIndexes() siOpts := options.SearchIndexes().SetName(idx).SetType("vectorSearch") cursor, err := view.List(ctx, siOpts) if err != nil { return false, fmt.Errorf("failed to list search indexes: %w", err) } for cursor.Next(ctx) { index := struct { Name string `bson:"name"` Queryable bool `bson:"queryable"` }{} if err := cursor.Decode(&index); err != nil { return false, fmt.Errorf("failed to decode search index: %w", err) } if index.Name == idx && index.Queryable { return true, nil } } if err := cursor.Err(); err != nil { return false, fmt.Errorf("cursor error: %w", err) } return false, nil } // Creates a vector search index. This function blocks until the index has been // created. func CreateVectorSearchIndex( ctx context.Context, coll *mongo.Collection, idxName string, openAIEmbeddingDim int, similarityAlgorithm string, ) (string, error) { type vectorField struct { Type string `bson:"type,omitempty"` Path string `bson:"path,omitempty"` NumDimensions int `bson:"numDimensions,omitempty"` Similarity string `bson:"similarity,omitempty"` } fields := []vectorField{ { Type: "vector", Path: "embeddings", NumDimensions: openAIEmbeddingDim, Similarity: similarityAlgorithm, }, { Type: "filter", Path: "metadata.author", }, { Type: "filter", Path: "metadata.type", }, } def := struct { Fields []vectorField `bson:"fields"` }{ Fields: fields, } log.Println("Creating vector search index...") view := coll.SearchIndexes() siOpts := options.SearchIndexes().SetName(idxName).SetType("vectorSearch") searchName, err := view.CreateOne(ctx, mongo.SearchIndexModel{Definition: def, Options: siOpts}) if err != nil { return "", fmt.Errorf("failed to create the search index: %w", err) } // Awaits the creation of the index var doc bson.Raw for doc == nil { cursor, err := view.List(ctx, options.SearchIndexes().SetName(searchName)) if err != nil { return "", fmt.Errorf("failed to list search indexes: %w", err) } if !cursor.Next(ctx) { break } name := cursor.Current.Lookup("name").StringValue() queryable := cursor.Current.Lookup("queryable").Boolean() if name == searchName && queryable { doc = cursor.Current } else { time.Sleep(5 * time.Second) } } return searchName, nil }
通过调用 main()
函数中的上述函数来创建向量存储集合和索引。将以下代码添加到 main()
函数的末尾:
// SearchIndexExists will return true if the provided index is defined for the // collection. This operation blocks until the search completes. if ok, _ := SearchIndexExists(context.Background(), coll, indexName); !ok { // Creates the vector store collection err = client.Database(databaseName).CreateCollection(context.Background(), collectionName) if err != nil { log.Fatalf("failed to create vector store collection: %v", err) } _, err = CreateVectorSearchIndex(context.Background(), coll, indexName, openAIEmbeddingDim, similarityAlgorithm) if err != nil { log.Fatalf("failed to create index: %v", err) } log.Println("Successfully created vector search index.") } else { log.Println("Vector search index already exists.") }
保存文件,然后运行以下命令以创建Atlas Vector Search索引。
go run main.go
Checking if search index exists. Creating vector search index... Successfully created vector search index.
提示
运行main.go
后,您可以在Atlas用户界面中导航到集群中的 langchaingo_db.test
集合,查看向量搜索索引。
运行向量搜索查询
本部分演示了可以对矢量化数据运行的各种查询。创建索引后,您可以运行向量搜索查询。
选择 Basic Semantic Search 或 Semantic Search with Filtering 标签页,查看相应的代码。
将以下代码添加到主函数中并保存文件。
语义搜索检索与查询有意义相关的信息。以下代码使用 SimilaritySearch()
方法对字符串 "Prevent
weeds"
执行语义搜索,并将结果限制为第一个文档。
// Performs basic semantic search docs, err := store.SimilaritySearch(context.Background(), "Prevent weeds", 1) if err != nil { fmt.Println("Error performing search:", err) } fmt.Println("Semantic Search Results:", docs)
运行以下命令以执行查询。
go run main.go
Semantic Search Results: [{For a natural lawn, selection of the right grass type suitable for your climate is crucial. Balanced watering, generally 1 to 1.5 inches per week, is important; overwatering invites disease. Opt for organic fertilizers over synthetic versions to provide necessary nutrients and improve soil structure. Regular lawn aeration helps root growth and prevents soil compaction. Practice natural pest control and consider overseeding to maintain a dense sward, which naturally combats weeds and pest. map[author:C type:post] 0.69752026}]
您可以使用 MQL 匹配表达式预先过滤数据,该表达式将索引字段与布尔值、数字或 string 值进行比较。您必须将要过滤的任何元数据字段作为 filter
类型进行索引。要了解详情,请参阅如何为向量搜索建立字段索引。
将以下代码添加到主函数中并保存文件。
以下代码使用 SimilaritySearch()
方法对字符串 "Tulip care"
执行语义搜索。它指定以下参数:
以
1
形式返回的文件数。分数阈值为
0.60
。
它返回与过滤metadata.type:
post
匹配并包含分数阈值的文档。
// Performs semantic search with metadata filter filter := map[string]interface{}{ "metadata.type": "post", } docs, err := store.SimilaritySearch(context.Background(), "Tulip care", 1, vectorstores.WithScoreThreshold(0.60), vectorstores.WithFilters(filter)) if err != nil { fmt.Println("Error performing search:", err) } fmt.Println("Filter Search Results:", docs)
运行以下命令以执行查询。
go run main.go
Filter Search Results: [{Proper tuber planting involves site selection, proper timing, and exceptional care. Choose spots with well-drained soil and adequate sun exposure. Tubers are generally planted in spring, but depending on the plant, timing varies. Always plant with the eyes facing upward at a depth two to three times the tuber's height. Ensure 4 inch spacing between small tubers, expand to 12 inches for large ones. Adequate moisture is needed, yet do not overwater. Mulching can help preserve moisture and prevent weed growth. map[author:A type:post] 0.64432365}]
回答有关数据的问题
本部分演示使用Atlas Vector Search和 LangChainGo 的 RAG实施。现在您已经使用Atlas Vector Search检索语义相似的文档,使用以下代码示例提示法学硕士回答针对Atlas Vector Search返回的文档的问题。
将以下代码添加到主函数末尾并保存文件。
此代码执行以下操作:
将Atlas Vector Search实例化为检索器,以查询语义相似的文档。
定义 LangChainGo 提示模板,指示 LLM 使用检索到的文档作为查询的上下文。 LangChainGo 将这些文档填充到
{{.context}}
输入变量中,并将您的查询填充到{{.question}}
变量中。构建一条链,该链使用 OpenAI 的聊天模型,根据提供的提示模板生成上下文感知响应。
向链发送有关面向初学者的绘画的示例查询,使用提示和检索器收集相关上下文。
返回并打印LLM的响应以及用作上下文的文档。
// Implements RAG to answer questions on your data optionsVector := []vectorstores.Option{ vectorstores.WithScoreThreshold(0.60), } retriever := vectorstores.ToRetriever(&store, 1, optionsVector...) prompt := prompts.NewPromptTemplate( `Answer the question based on the following context: {{.context}} Question: {{.question}}`, []string{"context", "question"}, ) llmChain := chains.NewLLMChain(llm, prompt) ctx := context.Background() const question = "How do I get started painting?" documents, err := retriever.GetRelevantDocuments(ctx, question) if err != nil { log.Fatalf("Failed to retrieve documents: %v", err) } var contextBuilder strings.Builder for i, document := range documents { contextBuilder.WriteString(fmt.Sprintf("Document %d: %s\n", i+1, document.PageContent)) } contextStr := contextBuilder.String() inputs := map[string]interface{}{ "context": contextStr, "question": question, } out, err := chains.Call(ctx, llmChain, inputs) if err != nil { log.Fatalf("Failed to run LLM chain: %v", err) } log.Println("Source documents:") for i, doc := range documents { log.Printf("Document %d: %s\n", i+1, doc.PageContent) } responseText, ok := out["text"].(string) if !ok { log.Println("Unexpected response type") return } log.Println("Question:", question) log.Println("Generated Answer:", responseText)
运行以下命令以执行您的文件。
保存文件后,运行以下命令。 生成的响应可能会有所不同。
go run main.go
Source documents: Document 1: "Successful oil painting necessitates patience, proper equipment, and technique. Begin with a carefully prepared, primed canvas. Sketch your composition lightly before applying paint. Use high-quality brushes and oils to create vibrant, long-lasting artworks. Remember to paint 'fat over lean,' meaning each subsequent layer should contain more oil to prevent cracking. Allow each layer to dry before applying another. Clean your brushes often and avoid solvents that might damage them. Finally, always work in a well-ventilated space." Question: How do I get started painting? Generated Answer: To get started painting, you should begin with a carefully prepared, primed canvas. Sketch your composition lightly before applying paint. Use high-quality brushes and oils to create vibrant, long-lasting artworks. Remember to paint 'fat over lean,' meaning each subsequent layer should contain more oil to prevent cracking. Allow each layer to dry before applying another. Clean your brushes often and avoid solvents that might damage them. Finally, always work in a well-ventilated space.
完成本教程后,您已成功将 Atlas Vector Search 与 LangChainGo 集成以构建RAG应用程序。您已完成以下操作:
启动并配置了必要的环境来支持您的应用程序
将自定义数据存储在Atlas中,并将Atlas实例化为向量存储
基于数据构建Atlas Vector Search索引,支持语义搜索功能
使用向量嵌入来检索语义相关的数据
通过合并元元数据筛选器增强搜索结果
使用Atlas Vector Search实施 RAG 工作流程,根据您的数据为问题提供有意义的答案
后续步骤
要学习;了解有关开始使用Atlas Vector Search 的更多信息,请参阅Atlas Vector Search快速入门,然后从下拉菜单中选择 Go。
要学习;了解有关向量嵌入的更多信息,请参阅如何创建向量嵌入,然后从下拉菜单中选择 Go。
要学习;了解如何集成 LangChainGo 和 Huging Face,请参阅使用Atlas Vector Search进行检索增强生成 (RAG)。
要了解如何在不需要 API 密钥或积分的情况下实现 RAG,请参阅使用 Atlas Vector Search 构建本地 RAG 实现。
MongoDB 还提供以下开发者资源:
另请参阅:
要学习;了解有关集成 LangChainGo、OpenAI 和MongoDB的更多信息,请参阅使用MongoDB Atlas作为具有 OpenAI 嵌入的向量存储。