Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases.
MongoDB Developer
Atlas
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
Atlaschevron-right

Building a Knowledge Base and Visualization Graphs for RAG With MongoDB

Prasad Pillalamarri, Shounak Acharya12 min read • Published Sep 02, 2024 • Updated Sep 02, 2024
AIAtlasPython
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Several solutions provide rich data to improve the performance of RAG systems. Each of these alternatives offers different strengths, and the choice depends on the specific requirements of the RAG system, such as the type of data being used, the complexity of queries, and the desired quality of the generated text. In practice, combining several of these methods often yields the best results, leveraging their respective advantages to enhance both retrieval and generation processes.
MongoDB provides support for implementing vector search for fast retrieval, pre-filters, hybrid search, and knowledge bases. All of the above options can be implemented out of the box. In this article, Shounak and I would like to highlight how MongoDB — and more importantly, the JSON-based document model — can easily be used to construct a knowledge base and store the relationships between entities and nodes in a RAG architecture. We will also extend it further and use the JSON document as the base to construct hierarchical network graphs or MongoDB Charts-based visualizations.  

Document-based databases

Document databases store and index documents, allowing for fast retrieval based on complex queries. They are well-suited for storing large collections of text documents, web pages, or articles. The retriever can query these databases to fetch relevant documents based on keywords or semantic similarity, which the generator then uses to produce coherent text. 

Atlas Vector Search pre-filter

Filtering your data is useful for narrowing the scope of your semantic search and ensuring that not all vectors are considered for comparison. The $vectorSearch filter option matches only BSON boolean, date, objectId, string, and numeric values.  

Hybrid search systems

Combining full-text search with vector embeddings integrates multiple retrieval methods to leverage their strengths. For instance, embedding-based reranking is used for keyword-based retrieval, and then a full-text search is run. The hybrid system first refines the selection with advanced semantic techniques and then narrows down candidates using full-text search before passing it to the generator.

Knowledge bases

Large repositories of structured information are extracted from various sources. They often include a wide range of entities and relationships. The retriever can query these knowledge bases to fetch relevant facts and relationships, enhancing the context and detail in the generated text.
MongoDB Charts is a data visualization tool specifically designed for MongoDB Atlas, offering a fast, intuitive, and robust way to visualize your data. It supports a wide range of use cases, whether you're working with a dedicated cluster, a serverless instance, leveraging Atlas Data Federation to uncover valuable insights from combined Atlas and S3 data, or visualizing archived data in Online Archive.

Developing a knowledge base

In this article, we will create a dependency graph on the entities from a free text paragraph using the LLGraphTransformer class in LangChain and OpenAI as the LLM. We will pass in a text on the history of the Python programming language and ask it to return the entities and their relationships as shown in the code snippet below.
All of the below code snippets can be found in the GitHub repo
1llm = ChatOpenAI(temperature=0, model_name="gpt-4-turbo",api_key="YOUR-OPENAI-KEY")
2
3llm_transformer = LLMGraphTransformer(llm=llm)
4text = """
5Python was invented in the late 1980s by Guido van Rossum at Centrum Wiskunde & Informatica in the Netherlands as a successor to the ABC
6programming language, which was inspired by SETL capable of exception handling and interfacing with the Amoeba operating system.
7Its implementation began in December 1989. Python 2.0 was released on 16 October 2000, with many major new features such as list comprehensions,
8cycle-detecting garbage collection, reference counting, and Unicode support. Python 3.0, released on 3 December 2008,
9with many of its major features backported to Python 2.6.x and 2.7.x. Releases of Python 3 include the 2to3 utility,
10which automates the translation of Python 2 code to Python 3.
11"""
12documents = [Document(page_content=text)]
13graph_documents = llm_transformer.convert_to_graph_documents(documents)
14print(f"Nodes:{graph_documents[0].nodes}")
15print(f"Relationships:{graph_documents[0].relationships}")
This generates an output as below, capturing various nodes and their relationships:
1Nodes:[Node(id='Python', type='Programming_language'), Node(id='Guido Van Rossum', type='Person'), Node(id='Centrum Wiskunde & Informatica', type='Organization'), Node(id='Netherlands', type='Country'), Node(id='Abc Programming Language', type='Programming_language'), Node(id='Setl', type='Programming_language'), Node(id='Amoeba Operating System', type='Operating_system'), Node(id='Python 2.0', type='Software_version'), Node(id='Python 3.0', type='Software_version'), Node(id='2To3 Utility', type='Software_tool')]
2Relationships:[Relationship(source=Node(id='Python', type='Programming_language'), target=Node(id='Guido Van Rossum', type='Person'), type='CREATED_BY'), Relationship(source=Node(id='Python', type='Programming_language'), target=Node(id='Centrum Wiskunde & Informatica', type='Organization'), type='DEVELOPED_AT'), Relationship(source=Node(id='Python', type='Programming_language'), target=Node(id='Netherlands', type='Country'), type='DEVELOPED_IN'), Relationship(source=Node(id='Python', type='Programming_language'), target=Node(id='Abc Programming Language', type='Programming_language'), type='SUCCESSOR_OF'), Relationship(source=Node(id='Abc Programming Language', type='Programming_language'), target=Node(id='Setl', type='Programming_language'), type='INSPIRED_BY'), Relationship(source=Node(id='Python', type='Programming_language'), target=Node(id='Amoeba Operating System', type='Operating_system'), type='INTERFACE_WITH'), Relationship(source=Node(id='Python 3.0', type='Software_version'), target=Node(id='Python 2.0', type='Software_version'), type='BACKPORTED_TO'), Relationship(source=Node(id='Python 3.0', type='Software_version'), target=Node(id='2To3 Utility', type='Software_tool'), type='INCLUDES')]
As can be seen from the output above, the LLMGraphTransformer captured various entities like Python, Guido van Rossum, Netherlands, etc. and also assigned a type. For example, Python is a programming language, Guido van Rossum is a person, and Netherlands is a country.
The LLMGraphTransformer not only identifies nodes but also generates relationships between them. For instance, the output above establishes that Guido van Rossum created Python, a programming language. This connection is represented by a relationship object, which consists of a source (Python), a target (Guido van Rossum), and a relationship type (CREATED_BY). The output demonstrates multiple such relationships being captured between the identified node entities.
Now, using these node and relationship data structures, we can create MongoDB collections to capture the relationship graph inside MongoDB. In this example, we create a collection for each of the node types — for example, Programming_language, Country, Operating_system, etc. as shown below in the code snippet:
1nodes = graph_documents[0].nodes
2relationships = graph_documents[0].relationships
3collections = set()
4for node in nodes:
5 collections.add(node.type)
6print(collections)
7try:
8 uri = "MONGO-DB-URL"
9 client = MongoClient(uri)
10 database = client["generic_graph"]
11 for collection in collections:
12 database.create_collection(collection)
13except Exception as e:
14 print(e)
15finally:
16 client.close()
This creates one collection each based upon the node types as shown in the output below:
1{'Programming_language', 'Software_tool', 'Person', 'Software_version', 'Organization', 'Country', 'Operating_system'}
We can apply other design patterns, like polymorphic design patterns, to create a single collection with multiple object types. However, in those cases, the code needs to be modified based on the domain knowledge in the graph. In our example, we have kept the pattern more generic so that the same pattern can be utilized for generating collections and corresponding relationships without much code modification for any knowledge base.
Now, in order to capture relationships between the documents across collections, we will use linking. In our case, we iterate through the relationship lists and do the following:
  1. For the source of the relationship, we create an array attribute on the document.
  2. The value of the array attribute is the target of the relationship type.
  3. We create these array attributes on the source for each of the relationships where the current object is the source.
For example, in the Programming_language collection, we will have Python as one of the documents. Now, in the Python document, we will have array attributes for DEVELOPED_IN, CREATED_BY, DEVELOPED_AT, SUCCESSOR_OF, and INTERFACE_WITH, as shown in the screenshot below:
Object Hierarchy relationships about Python Language
Similarly, for example, ABC programming language, which is a predecessor to Python, has been inspired by Setl, as shown in the following screenshot. Please note that both these are from the same collection called Programming_language:
ABC programming language, which is a predecessor to Python, has been inspired by Setl
However, if we observe properly, Setl does not have any linking as there were no relationships defined in the LLMGraphTransformer output.
The following code snippets show how to arrive at the above collections. 
1 #Figuring out all relationship types per node types
2node_relationship_types = {}
3for node in nodes:
4 #print(f'On Node {node.id}')
5 node_relationship_types[node.id] = set()
6 for rel in relationships:
7 #print(f'Looking at Relationship for {rel.source.id}')
8 if rel.source.id == node.id:
9 node_relationship_types[node.id].add(rel.type)
10print(node_relationship_types)
The above code creates a dictionary of all unique relationship types per source from the LLMGraphTransformer relationship list and gives the output as below:
1{'Python': {'DEVELOPED_IN', 'CREATED_BY', 'DEVELOPED_AT', 'SUCCESSOR_OF', 'INTERFACE_WITH'}, 'Guido Van Rossum': set(), 'Centrum Wiskunde & Informatica': set(), 'Netherlands': set(), 'Abc Programming Language': {'INSPIRED_BY'}, 'Setl': set(), 'Amoeba Operating System': set(), 'Python 2.0': set(), 'Python 3.0': {'BACKPORTED_TO'}, '2To3 Utility': {'TRANSLATES_TO'}}
Once we know of all the relationships, we create the documents for each type of collection, linking other collections on the way, and insert into MongoDB as shown in the following snippet:
1mongo_documents = []
2for node in nodes:
3 document_dict = {}
4 document_dict['id'] = node.id
5 document_dict['type'] = node.type
6 document_relations = node_relationship_types[node.id]
7 for document_relation in document_relations:
8 document_dict[document_relation] = []
9 for rel in relationships:
10 if rel.source.id == node.id:
11 document_dict[rel.type].append(rel.target.id)
12 mongo_documents.append(document_dict)
13print(mongo_documents)
The above snippet first generates documents to be inserted into the corresponding collections with all the details of linking to other documents, as shown in the output below:
1[{'id': 'Python', 'type': 'Programming_language', 'DEVELOPED_IN': ['Netherlands'], 'CREATED_BY': ['Guido Van Rossum'], 'DEVELOPED_AT': ['Centrum Wiskunde & Informatica'], 'SUCCESSOR_OF': ['Abc Programming Language'], 'INTERFACE_WITH': ['Amoeba Operating System']}, {'id': 'Guido Van Rossum', 'type': 'Person'}, {'id': 'Centrum Wiskunde & Informatica', 'type': 'Organization'}, {'id': 'Netherlands', 'type': 'Country'}, {'id': 'Abc Programming Language', 'type': 'Programming_language', 'INSPIRED_BY': ['Setl']}, {'id': 'Setl', 'type': 'Programming_language'}, {'id': 'Amoeba Operating System', 'type': 'Operating_system'}, {'id': 'Python 2.0', 'type': 'Software_version'}, {'id': 'Python 3.0', 'type': 'Software_version', 'BACKPORTED_TO': ['Python 2.0']}, {'id': '2To3 Utility', 'type': 'Software_tool', 'TRANSLATES_TO': ['Python 3.0']}]
Finally, we add these documents to MongoDB. We figure out the collection on which to insert by looking at the “type” field that we have inserted in the documents in the step above:
1try:
2 uri = "YOUR-MONGO-URL"
3 client = MongoClient(uri)
4 database = client["generic_graph"]
5 for mongo_document in mongo_documents:
6 collection = database[mongo_document['type']]
7 collection.insert_one(mongo_document)
8except Exception as e:
9 print(e)
10finally:
11 client.close()

Embedded child/parent doc supporting native MongoDB schema 

When you have write once and read many use cases (1+W/99R), or write multiple times and read often (20W/80R), it is recommended to pre-compute the rendering schema expected by your rendering engine — in our case, d3.js — and save it along with your MongoDB documents. The following output shows the schema for the Python document in the Programming_language collection that we showed in the last section, which now stores all the target nodes and edges to the nodes, which start from this node:
1[{'CREATED_BY': ['Guido Van Rossum'],
2 'DEVELOPED_AT': ['Centrum Wiskunde & Informatica'],
3 'DEVELOPED_IN': ['Netherlands'],
4 'INTERFACE_WITH': ['Amoeba Operating System'],
5 'SUCCESSOR_OF': ['Abc Programming Language'],
6 'd3_edges': [{'linkName': 'CREATED_BY',
7 'source': 'Python',
8 'strength': 0.7,
9 'target': 'Guido Van Rossum'},
10 {'linkName': 'DEVELOPED_AT',
11 'source': 'Python',
12 'strength': 0.7,
13 'target': 'Centrum Wiskunde & Informatica'},
14 {'linkName': 'DEVELOPED_IN',
15 'source': 'Python',
16 'strength': 0.7,
17 'target': 'Netherlands'},
18 {'linkName': 'SUCCESSOR_OF',
19 'source': 'Python',
20 'strength': 0.7,
21 'target': 'Abc Programming Language'},
22 {'linkName': 'INTERFACE_WITH',
23 'source': 'Python',
24 'strength': 0.7,
25 'target': 'Amoeba Operating System'}],
26 'd3_source_node': {'group': 0, 'id': 'Python', 'label': 'Python', 'level': 1},
27 'd3_target_nodes': [{'group': 1,
28 'id': 'Guido Van Rossum',
29 'label': 'Guido Van Rossum',
30 'level': 2},
31 {'group': 1,
32 'id': 'Centrum Wiskunde & Informatica',
33 'label': 'Centrum Wiskunde & Informatica',
34 'level': 2},
35 {'group': 1,
36 'id': 'Netherlands',
37 'label': 'Netherlands',
38 'level': 2},
39 {'group': 1,
40 'id': 'Abc Programming Language',
41 'label': 'Abc Programming Language',
42 'level': 2},
43 {'group': 1,
44 'id': 'Amoeba Operating System',
45 'label': 'Amoeba Operating System',
46 'level': 2}],
47 'id': 'Python',
48 'type': 'Programming_language'}]
This can be done while creating the mongo documents from the graph, as shown in the following snippet. Please note that the code for creating collections would still be the same as mentioned in the previous section.
1mongo_documents = []
2for node in nodes:
3 document_dict = {}
4 document_dict['id'] = node.id
5 document_dict['type'] = node.type
6 document_dict['d3_edges'] = []
7 document_dict['d3_target_nodes'] = []
8 document_dict['d3_source_node'] = {'id':node.id,'group':0,'level':1,'label':node.id}
9 document_relations = node_relationship_types[node.id]
10 for document_relation in document_relations:
11 document_dict[document_relation] = []
12 for rel in relationships:
13 if rel.source.id == node.id:
14 document_dict[rel.type].append(rel.target.id)
15 document_dict['d3_target_nodes'].append({'id':rel.target.id,'group':1,'level':2,'label':rel.target.id})
16 document_dict['d3_edges'].append({'source':node.id,'target':rel.target.id,'strength':0.7,'linkName':rel.type})
17 mongo_documents.append(document_dict)
18pprint(mongo_documents)
The code above shows capturing and storing one-level relations. The same concepts can be utilized to store N-level relations per document based upon your use-case, which follows the subset pattern while designing MongoDB data.

Powering the knowledge base in rendering visualizations

You can leverage all graph types. Hierarchical graphs can be in a tree structure, disjoint force directed graph, or hierarchical arcs. Below, JSON documents can power the data sets to display these graphs. 
Graph type 1:
1 Object {source: "X1", target: "X2", type: "suit"}
2 Object {source: "X2", target: "X3", type: "resolved"}
3 Object {source: "X3", target: "X4", type: "suit"}
4 Object {source: "X4", target: "X1", type: "suit"}
5 columns: Array(3) [
6  0: "source"
7  1: "target"
8  2: "type"
9 ]
Graph type 2:
1  Object {source: "Napoleon", target: "Myriel", value: 1}
2  Object {source: "Mlle.Baptistine", target: "Myriel", value: 8}
3  Object {source: "Mme.Magloire", target: "Myriel", value: 10}
4  Object {source: "Mme.Magloire", target: "Mlle.Baptistine", value: 6}
5  Object {source: "Cravatte", target: "Myriel", value: 1}
6  Object {source: "Count", target: "Myriel", value: 2}
7  Object {source: "OldMan", target: "Myriel", value: 1}
Graph type 3:
1 data = Object {
2  name: "flare"
3  children: Array(10) [
4  0: Object {
5  name: "analytics"
6  children: Array(3) [
7  0: Object {name: "cluster", children: Array(4)}
8  1: Object {name: "graph", children: Array(5)}
9  2: Object {name: "optimization", children: Array(1)}
10 ]
11 }
12  1: Object {name: "animate", children: Array(12)}
13  2: Object {name: "data", children: Array(7)}
14  3: Object {name: "display", children: Array(4)}
15  4: Object {
16  name: "flex"
17  children: Array(1) [
18  0: Object {
19 }
20 ]
21 }
In our use case, we use the d3.js force-directed graph to create a visualization example for the Programming_language collection. Now, d3.js expects two arrays, namely nodes and links, in a JSON object, where each array is a JSON capturing the properties of nodes and relationships, respectively. The structure looks something like below:
1{nodes = [{'id': 'Python', 'group': 0, 'level': 1, 'label': 'Python'}, {'id': 'Netherlands', 'group': 1, 'level': 2, 'label': 'Netherlands'}],
2links = [{'source': 'Python', 'target': 'Netherlands', 'strength': 0.7}, {'source': 'Python', 'target': 'Guido Van Rossum', 'strength': 0.7}]}
Each of the JSON objects within the arrays have some mandatory fields and some optional fields. For example, nodes should have “id” as a mandatory field. Similarly, relationship objects should have “source” and “target” as the mandatory fields.
In order to create this structure from our Programming_language collection, we use a graph lookup that recursively creates the relationship between Python and its predecessor, the ABC programming language. Finally, that goes to Setl, the language from which ABC was inspired. 
MongoDB graphlookup performs a recursive search on a collection, with options for restricting the search by recursion depth and query filter.
The $graphLookup process works as follows:
  • Input documents are processed in the $graphLookup stage of an aggregation pipeline.
  • The search is directed to the collection specified by the “from” parameter.
  • For each input document, the search starts with the value specified by startWith.
  • $graphLookup compares this startWith value to the field indicated by connectToField in other documents within the “from” collection.
  • When a match is found, $graphLookup retrieves the value from connectFromField and checks other documents in the “from” collection for corresponding connectToField values. Matching documents are then added to an array specified by the as parameter.
  • This recursive process continues until no further matches are found or the maximum recursion depth, defined by maxDepth, is reached. 
  • Finally, $graphLookup appends the array to the original input document and completes its search for all input documents.
Finally, we create the nodes and relationship arrays and save them off to a JSON file. Please note that we also add Link Labels, which shows the relationship types between the nodes:
1try:
2 uri = "YOUR-MONGO-URL"
3 client = MongoClient(uri)
4 database = client["generic_graph"]
5 language_collection = database["Programming_language"]
6 language_pipeline = [
7 {
8 '$graphLookup': {
9 'from': 'Programming_language',
10 'startWith': '$SUCCESSOR_OF',
11 'connectFromField': 'id',
12 'connectToField': 'id',
13 'as': 'relations',
14 'maxDepth': 2
15 }
16 }, {
17 '$unwind': {
18 'path': '$relations',
19 'preserveNullAndEmptyArrays': False
20 }
21 }
22 ]
23 lang_aggCursor = language_collection.aggregate(language_pipeline)
24 nodes=[]
25 links=[]
26 for document in lang_aggCursor:
27 print(document)
28 source_node_dict = {}
29 source_node_dict['id'] = document.get('id')
30 source_node_dict['group'] = 0
31 source_node_dict['level'] = 1
32 source_node_dict['label'] = document.get('id')
33 nodes.append(source_node_dict)
34 for key in document.keys():
35 print(key)
36 target_node_dict = {}
37 link_dict = {}
38 if key=='_id' or key=='id' or key=='type' or key=='SUCCESSOR_OF':
39 continue
40 elif key=='relations':
41 target_node_dict['id']=document[key]['id']
42 target_node_dict['group']=1
43 target_node_dict['level']=2
44 target_node_dict['label'] = document[key]['id']
45 inspired_node_dict = {}
46 inspired_node_dict['id'] = document[key]['INSPIRED_BY'][0]
47 inspired_node_dict['group'] = 1
48 inspired_node_dict['level'] = 2
49 inspired_node_dict['label'] = document[key]['INSPIRED_BY'][0]
50 link_dict['source'] = target_node_dict.get('id')
51 link_dict['target'] = inspired_node_dict.get('id')
52 link_dict['strength'] = 0.7
53 link_dict['linkName'] = 'INSPIRED_BY'
54 link_dict_2 = {}
55 link_dict_2['source'] = source_node_dict.get('id')
56 link_dict_2['target'] = target_node_dict.get('id')
57 link_dict_2['strength'] = 0.7
58 link_dict_2['linkName'] = 'SUCCESSOR_OF'
59 nodes.append(target_node_dict)
60 nodes.append(inspired_node_dict)
61 links.append(link_dict)
62 links.append(link_dict_2)
63 continue
64 else:
65 target_node_dict['id'] = document[key][0]
66 target_node_dict['group']=1
67 target_node_dict['level']=2
68 target_node_dict['label'] = document[key][0]
69 link_dict['source'] = source_node_dict.get('id')
70 link_dict['target'] = target_node_dict.get('id')
71 link_dict['strength'] = 0.7
72 link_dict['linkName'] = key
73 nodes.append(target_node_dict)
74 links.append(link_dict)
75 print(nodes)
76 print(links)
77except Exception as e:
78 print(e)
79finally:
80 client.close()
81nodes_links = {"nodes": nodes,"links":links }
82import json
83with open("python-dependencies.json",'w') as f:
84 json.dump(nodes_links,f,indent=1)
This creates the python-dependecies.json file whose contents are shown as below with labels of nodes as well as links:
1{nodes=[{'id': 'Python', 'group': 0, 'level': 1, 'label': 'Python'}, {'id': 'Netherlands', 'group': 1, 'level': 2, 'label': 'Netherlands'}, {'id': 'Guido Van Rossum', 'group': 1, 'level': 2, 'label': 'Guido Van Rossum'}, {'id': 'Centrum Wiskunde & Informatica', 'group': 1, 'level': 2, 'label': 'Centrum Wiskunde & Informatica'}, {'id': 'Amoeba Operating System', 'group': 1, 'level': 2, 'label': 'Amoeba Operating System'}, {'id': 'Abc Programming Language', 'group': 1, 'level': 2, 'label': 'Abc Programming Language'}, {'id': 'Setl', 'group': 1, 'level': 2, 'label': 'Setl'}],
2links=[{'source': 'Python', 'target': 'Netherlands', 'strength': 0.7, 'linkName': 'DEVELOPED_IN'}, {'source': 'Python', 'target': 'Guido Van Rossum', 'strength': 0.7, 'linkName': 'CREATED_BY'}, {'source': 'Python', 'target': 'Centrum Wiskunde & Informatica', 'strength': 0.7, 'linkName': 'DEVELOPED_AT'}, {'source': 'Python', 'target': 'Amoeba Operating System', 'strength': 0.7, 'linkName': 'INTERFACE_WITH'}, {'source': 'Abc Programming Language', 'target': 'Setl', 'strength': 0.7, 'linkName': 'INSPIRED_BY'}, {'source': 'Python', 'target': 'Abc Programming Language', 'strength': 0.7, 'linkName': 'SUCCESSOR_OF'}]}
We then use this JSON file to create the nodes and links array in the d3.js code. We have reused some of the code from the GitHub URL and updated the Link Labels there. Finally, we run a local node http server and render the HTML. The output looks as below: 
Visualization graphs displaying Hierarchical structures
As we can see, we are able to display the relationship graph that was captured by our MongoDB collections using d3.js. 
If we take the embedded relations approach, the rendering becomes even more easy and generic. The following code reads from documents which have the embedded target nodes and edges data in the documents themselves. Simply doing a find on all documents of a collection gives us all the required nodes and edges to form the graph, as shown in the code snippet below:
1master_lookup_set = set()
2nodes=[]
3links=[]
4try:
5 uri = "YOUR-MONGODB-CLUSTER-URL"
6 client = MongoClient(uri)
7 database = client["embedded_graph_2"]
8 collection = database["Programming_language"]
9 cursor = collection.find({},{'_id':0,'id':1,'d3_edges':1,'d3_target_nodes':1,'d3_source_node':1})
10 for document in cursor:
11 print(document)
12 if document['id'] not in master_lookup_set:
13 master_lookup_set.add(document['id'])
14 nodes.append(document['d3_source_node'])
15 for link in document['d3_edges']:
16 links.append(link)
17 master_lookup_set.add(link['target'])
18 for target_node in document['d3_target_nodes']:
19 nodes.append(target_node)
20 else:
21 for link in document['d3_edges']:
22 links.append(link)
23except Exception as e:
24 print(e)
25finally:
26 client.close()
27
28nodes_links = {"nodes": nodes,"links":links }
29import json
30with open("python-dependencies_embedded_2.json",'w') as f:
31 json.dump(nodes_links,f,indent=1)
This generates the “python-dependencies_embedded_2.json” file, which, when fed to the d3.js HTML code, results in the following graph, which is exactly like the one shown above:
hierarchical structured delivered with JSON schema
As you can see, this is much simpler and more usable code, especially when we need to visualize nodes and relationships.

Conclusion

This article provides all the integration points and code snippets that help developers leverage a knowledge base with MongoDB for RAG architectures. It demonstrates that MongoDB’s JSON document is the base for generating graph models and visualizations powered by MongoDB Atlas Charts. Additionally, please note that most of the code is out of the box, focusing on MongoDB’s key value proposition around developer productivity.
Interested in a role on MongoDB’s Partner Presales team? We have several open roles **on our teams across the globe and would love for you to transform your career with us!
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Serverless Development with Kotlin, AWS Lambda, and MongoDB Atlas


Aug 01, 2023 | 6 min read
Tutorial

RAG Evaluation: Detecting Hallucinations With Patronus AI and MongoDB


Aug 15, 2024 | 11 min read
Tutorial

Building Generative AI Applications Using MongoDB: Harnessing the Power of Atlas Vector Search and Open Source Models


Sep 18, 2024 | 10 min read
Article

Using SuperDuperDB to Accelerate AI Development on MongoDB Atlas Vector Search


Sep 18, 2024 | 6 min read
Table of Contents