Building a Knowledge Base and Visualization Graphs for RAG With MongoDB

Prasad Pillalamarri, Shounak Acharya12 min read • Published Sep 02, 2024 • Updated Sep 02, 2024

AI Atlas Python

Rate this tutorial

Several solutions provide rich data to improve the performance of RAG systems. Each of these alternatives offers different strengths, and the choice depends on the specific requirements of the RAG system, such as the type of data being used, the complexity of queries, and the desired quality of the generated text. In practice, combining several of these methods often yields the best results, leveraging their respective advantages to enhance both retrieval and generation processes.

MongoDB provides support for implementing vector search for fast retrieval, pre-filters, hybrid search, and knowledge bases. All of the above options can be implemented out of the box. In this article, Shounak and I would like to highlight how MongoDB — and more importantly, the JSON-based document model — can easily be used to construct a knowledge base and store the relationships between entities and nodes in a RAG architecture. We will also extend it further and use the JSON document as the base to construct hierarchical network graphs or MongoDB Charts-based visualizations.

Document-based databases

Document databases store and index documents, allowing for fast retrieval based on complex queries. They are well-suited for storing large collections of text documents, web pages, or articles. The retriever can query these databases to fetch relevant documents based on keywords or semantic similarity, which the generator then uses to produce coherent text.

Atlas Vector Search pre-filter

Filtering your data is useful for narrowing the scope of your semantic search and ensuring that not all vectors are considered for comparison. The $vectorSearch filter option matches only BSON boolean, date, objectId, string, and numeric values.

Hybrid search systems

Combining full-text search with vector embeddings integrates multiple retrieval methods to leverage their strengths. For instance, embedding-based reranking is used for keyword-based retrieval, and then a full-text search is run. The hybrid system first refines the selection with advanced semantic techniques and then narrows down candidates using full-text search before passing it to the generator.

Knowledge bases

Large repositories of structured information are extracted from various sources. They often include a wide range of entities and relationships. The retriever can query these knowledge bases to fetch relevant facts and relationships, enhancing the context and detail in the generated text.

MongoDB Charts is a data visualization tool specifically designed for MongoDB Atlas, offering a fast, intuitive, and robust way to visualize your data. It supports a wide range of use cases, whether you're working with a dedicated cluster, a serverless instance, leveraging Atlas Data Federation to uncover valuable insights from combined Atlas and S3 data, or visualizing archived data in Online Archive.

Developing a knowledge base

In this article, we will create a dependency graph on the entities from a free text paragraph using the LLGraphTransformer class in LangChain and OpenAI as the LLM. We will pass in a text on the history of the Python programming language and ask it to return the entities and their relationships as shown in the code snippet below.

All of the below code snippets can be found in the GitHub repo.

1 llm = ChatOpenAI(temperature=0, model_name="gpt-4-turbo",api_key="YOUR-OPENAI-KEY")
2 
3 llm_transformer = LLMGraphTransformer(llm=llm)
4 text = """
5 Python was invented in the late 1980s by Guido van Rossum at Centrum Wiskunde & Informatica in the Netherlands as a successor to the ABC 
6 programming language, which was inspired by SETL capable of exception handling and interfacing with the Amoeba operating system. 
7 Its implementation began in December 1989. Python 2.0 was released on 16 October 2000, with many major new features such as list comprehensions, 
8 cycle-detecting garbage collection, reference counting, and Unicode support. Python 3.0, released on 3 December 2008, 
9 with many of its major features backported to Python 2.6.x and 2.7.x. Releases of Python 3 include the 2to3 utility, 
10 which automates the translation of Python 2 code to Python 3.
11 """
12 documents = [Document(page_content=text)]
13 graph_documents = llm_transformer.convert_to_graph_documents(documents)
14 print(f"Nodes:{graph_documents[0].nodes}")
15 print(f"Relationships:{graph_documents[0].relationships}")

This generates an output as below, capturing various nodes and their relationships:

1 Nodes:[Node(id='Python', type='Programming_language'), Node(id='Guido Van Rossum', type='Person'), Node(id='Centrum Wiskunde & Informatica', type='Organization'), Node(id='Netherlands', type='Country'), Node(id='Abc Programming Language', type='Programming_language'), Node(id='Setl', type='Programming_language'), Node(id='Amoeba Operating System', type='Operating_system'), Node(id='Python 2.0', type='Software_version'), Node(id='Python 3.0', type='Software_version'), Node(id='2To3 Utility', type='Software_tool')]
2 Relationships:[Relationship(source=Node(id='Python', type='Programming_language'), target=Node(id='Guido Van Rossum', type='Person'), type='CREATED_BY'), Relationship(source=Node(id='Python', type='Programming_language'), target=Node(id='Centrum Wiskunde & Informatica', type='Organization'), type='DEVELOPED_AT'), Relationship(source=Node(id='Python', type='Programming_language'), target=Node(id='Netherlands', type='Country'), type='DEVELOPED_IN'), Relationship(source=Node(id='Python', type='Programming_language'), target=Node(id='Abc Programming Language', type='Programming_language'), type='SUCCESSOR_OF'), Relationship(source=Node(id='Abc Programming Language', type='Programming_language'), target=Node(id='Setl', type='Programming_language'), type='INSPIRED_BY'), Relationship(source=Node(id='Python', type='Programming_language'), target=Node(id='Amoeba Operating System', type='Operating_system'), type='INTERFACE_WITH'), Relationship(source=Node(id='Python 3.0', type='Software_version'), target=Node(id='Python 2.0', type='Software_version'), type='BACKPORTED_TO'), Relationship(source=Node(id='Python 3.0', type='Software_version'), target=Node(id='2To3 Utility', type='Software_tool'), type='INCLUDES')]

As can be seen from the output above, the LLMGraphTransformer captured various entities like Python, Guido van Rossum, Netherlands, etc. and also assigned a type. For example, Python is a programming language, Guido van Rossum is a person, and Netherlands is a country.

The LLMGraphTransformer not only identifies nodes but also generates relationships between them. For instance, the output above establishes that Guido van Rossum created Python, a programming language. This connection is represented by a relationship object, which consists of a source (Python), a target (Guido van Rossum), and a relationship type (CREATED_BY). The output demonstrates multiple such relationships being captured between the identified node entities.

Now, using these node and relationship data structures, we can create MongoDB collections to capture the relationship graph inside MongoDB. In this example, we create a collection for each of the node types — for example, Programming_language, Country, Operating_system, etc. as shown below in the code snippet:

1 nodes = graph_documents[0].nodes
2 relationships = graph_documents[0].relationships
3 collections = set()
4 for node in nodes:
5     collections.add(node.type)
6 print(collections)
7 try:
8     uri = "MONGO-DB-URL"
9     client = MongoClient(uri)
10     database = client["generic_graph"]
11     for collection in collections:
12         database.create_collection(collection)
13 except Exception as e:
14     print(e)
15 finally:
16     client.close()

This creates one collection each based upon the node types as shown in the output below:

1 {'Programming_language', 'Software_tool', 'Person', 'Software_version', 'Organization', 'Country', 'Operating_system'}

We can apply other design patterns, like polymorphic design patterns, to create a single collection with multiple object types. However, in those cases, the code needs to be modified based on the domain knowledge in the graph. In our example, we have kept the pattern more generic so that the same pattern can be utilized for generating collections and corresponding relationships without much code modification for any knowledge base.

Now, in order to capture relationships between the documents across collections, we will use linking. In our case, we iterate through the relationship lists and do the following:

For the source of the relationship, we create an array attribute on the document.
The value of the array attribute is the target of the relationship type.
We create these array attributes on the source for each of the relationships where the current object is the source.

For example, in the Programming_language collection, we will have Python as one of the documents. Now, in the Python document, we will have array attributes for DEVELOPED_IN, CREATED_BY, DEVELOPED_AT, SUCCESSOR_OF, and INTERFACE_WITH, as shown in the screenshot below:

Similarly, for example, ABC programming language, which is a predecessor to Python, has been inspired by Setl, as shown in the following screenshot. Please note that both these are from the same collection called Programming_language:

However, if we observe properly, Setl does not have any linking as there were no relationships defined in the LLMGraphTransformer output.

The following code snippets show how to arrive at the above collections.

1  #Figuring out all relationship types per node types
2 node_relationship_types = {}
3 for node in nodes:
4     #print(f'On Node {node.id}')
5     node_relationship_types[node.id] = set()
6     for rel in relationships:
7         #print(f'Looking at Relationship for {rel.source.id}')
8         if rel.source.id == node.id:
9             node_relationship_types[node.id].add(rel.type)
10 print(node_relationship_types)

The above code creates a dictionary of all unique relationship types per source from the LLMGraphTransformer relationship list and gives the output as below:

1 {'Python': {'DEVELOPED_IN', 'CREATED_BY', 'DEVELOPED_AT', 'SUCCESSOR_OF', 'INTERFACE_WITH'}, 'Guido Van Rossum': set(), 'Centrum Wiskunde & Informatica': set(), 'Netherlands': set(), 'Abc Programming Language': {'INSPIRED_BY'}, 'Setl': set(), 'Amoeba Operating System': set(), 'Python 2.0': set(), 'Python 3.0': {'BACKPORTED_TO'}, '2To3 Utility': {'TRANSLATES_TO'}}

Once we know of all the relationships, we create the documents for each type of collection, linking other collections on the way, and insert into MongoDB as shown in the following snippet:

1 mongo_documents = []
2 for node in nodes:
3     document_dict = {}
4     document_dict['id'] = node.id
5     document_dict['type'] = node.type
6     document_relations = node_relationship_types[node.id]
7     for document_relation in document_relations:
8         document_dict[document_relation] = []
9     for rel in relationships:
10         if rel.source.id == node.id:
11             document_dict[rel.type].append(rel.target.id)
12     mongo_documents.append(document_dict)
13 print(mongo_documents)

The above snippet first generates documents to be inserted into the corresponding collections with all the details of linking to other documents, as shown in the output below:

1 [{'id': 'Python', 'type': 'Programming_language', 'DEVELOPED_IN': ['Netherlands'], 'CREATED_BY': ['Guido Van Rossum'], 'DEVELOPED_AT': ['Centrum Wiskunde & Informatica'], 'SUCCESSOR_OF': ['Abc Programming Language'], 'INTERFACE_WITH': ['Amoeba Operating System']}, {'id': 'Guido Van Rossum', 'type': 'Person'}, {'id': 'Centrum Wiskunde & Informatica', 'type': 'Organization'}, {'id': 'Netherlands', 'type': 'Country'}, {'id': 'Abc Programming Language', 'type': 'Programming_language', 'INSPIRED_BY': ['Setl']}, {'id': 'Setl', 'type': 'Programming_language'}, {'id': 'Amoeba Operating System', 'type': 'Operating_system'}, {'id': 'Python 2.0', 'type': 'Software_version'}, {'id': 'Python 3.0', 'type': 'Software_version', 'BACKPORTED_TO': ['Python 2.0']}, {'id': '2To3 Utility', 'type': 'Software_tool', 'TRANSLATES_TO': ['Python 3.0']}]

Finally, we add these documents to MongoDB. We figure out the collection on which to insert by looking at the “type” field that we have inserted in the documents in the step above:

1 try:
2     uri = "YOUR-MONGO-URL"
3     client = MongoClient(uri)
4     database = client["generic_graph"]
5     for mongo_document in mongo_documents:
6         collection = database[mongo_document['type']]
7         collection.insert_one(mongo_document)
8 except Exception as e:
9     print(e)
10 finally:
11     client.close()

Embedded child/parent doc supporting native MongoDB schema

When you have write once and read many use cases (1+W/99R), or write multiple times and read often (20W/80R), it is recommended to pre-compute the rendering schema expected by your rendering engine — in our case, d3.js — and save it along with your MongoDB documents. The following output shows the schema for the Python document in the Programming_language collection that we showed in the last section, which now stores all the target nodes and edges to the nodes, which start from this node:

1 [{'CREATED_BY': ['Guido Van Rossum'],
2   'DEVELOPED_AT': ['Centrum Wiskunde & Informatica'],
3   'DEVELOPED_IN': ['Netherlands'],
4   'INTERFACE_WITH': ['Amoeba Operating System'],
5   'SUCCESSOR_OF': ['Abc Programming Language'],
6   'd3_edges': [{'linkName': 'CREATED_BY',
7                 'source': 'Python',
8                 'strength': 0.7,
9                 'target': 'Guido Van Rossum'},
10                {'linkName': 'DEVELOPED_AT',
11                 'source': 'Python',
12                 'strength': 0.7,
13                 'target': 'Centrum Wiskunde & Informatica'},
14                {'linkName': 'DEVELOPED_IN',
15                 'source': 'Python',
16                 'strength': 0.7,
17                 'target': 'Netherlands'},
18                {'linkName': 'SUCCESSOR_OF',
19                 'source': 'Python',
20                 'strength': 0.7,
21                 'target': 'Abc Programming Language'},
22                {'linkName': 'INTERFACE_WITH',
23                 'source': 'Python',
24                 'strength': 0.7,
25                 'target': 'Amoeba Operating System'}],
26   'd3_source_node': {'group': 0, 'id': 'Python', 'label': 'Python', 'level': 1},
27   'd3_target_nodes': [{'group': 1,
28                        'id': 'Guido Van Rossum',
29                        'label': 'Guido Van Rossum',
30                        'level': 2},
31                       {'group': 1,
32                        'id': 'Centrum Wiskunde & Informatica',
33                        'label': 'Centrum Wiskunde & Informatica',
34                        'level': 2},
35                       {'group': 1,
36                        'id': 'Netherlands',
37                        'label': 'Netherlands',
38                        'level': 2},
39                       {'group': 1,
40                        'id': 'Abc Programming Language',
41                        'label': 'Abc Programming Language',
42                        'level': 2},
43                       {'group': 1,
44                        'id': 'Amoeba Operating System',
45                        'label': 'Amoeba Operating System',
46                        'level': 2}],
47   'id': 'Python',
48   'type': 'Programming_language'}]

This can be done while creating the mongo documents from the graph, as shown in the following snippet. Please note that the code for creating collections would still be the same as mentioned in the previous section.

1 mongo_documents = []
2 for node in nodes:
3     document_dict = {}
4     document_dict['id'] = node.id
5     document_dict['type'] = node.type
6     document_dict['d3_edges'] = []
7     document_dict['d3_target_nodes'] = []
8     document_dict['d3_source_node'] = {'id':node.id,'group':0,'level':1,'label':node.id}
9     document_relations = node_relationship_types[node.id]
10     for document_relation in document_relations:
11         document_dict[document_relation] = []
12     for rel in relationships:
13         if rel.source.id == node.id:
14             document_dict[rel.type].append(rel.target.id)
15             document_dict['d3_target_nodes'].append({'id':rel.target.id,'group':1,'level':2,'label':rel.target.id})
16             document_dict['d3_edges'].append({'source':node.id,'target':rel.target.id,'strength':0.7,'linkName':rel.type})
17     mongo_documents.append(document_dict)
18 pprint(mongo_documents)

The code above shows capturing and storing one-level relations. The same concepts can be utilized to store N-level relations per document based upon your use-case, which follows the subset pattern while designing MongoDB data.

Powering the knowledge base in rendering visualizations

You can leverage all graph types. Hierarchical graphs can be in a tree structure, disjoint force directed graph, or hierarchical arcs. Below, JSON documents can power the data sets to display these graphs.

Graph type 1:

1     Object {source: "X1", target: "X2", type: "suit"}
2     Object {source: "X2", target: "X3", type: "resolved"}
3     Object {source: "X3", target: "X4", type: "suit"}
4     Object {source: "X4", target: "X1", type: "suit"}
5     columns: Array(3) [
6      0: "source"
7      1: "target"
8      2: "type"
9     ]

Graph type 2:

1      Object {source: "Napoleon", target: "Myriel", value: 1}
2      Object {source: "Mlle.Baptistine", target: "Myriel", value: 8}
3      Object {source: "Mme.Magloire", target: "Myriel", value: 10}
4      Object {source: "Mme.Magloire", target: "Mlle.Baptistine", value: 6}
5      Object {source: "Cravatte", target: "Myriel", value: 1}
6      Object {source: "Count", target: "Myriel", value: 2}
7      Object {source: "OldMan", target: "Myriel", value: 1}

Graph type 3:

1     data = Object {
2      name: "flare"
3      children: Array(10) [
4      0: Object {
5      name: "analytics"
6      children: Array(3) [
7      0: Object {name: "cluster", children: Array(4)}
8      1: Object {name: "graph", children: Array(5)}
9      2: Object {name: "optimization", children: Array(1)}
10     ]
11     }
12      1: Object {name: "animate", children: Array(12)}
13      2: Object {name: "data", children: Array(7)}
14      3: Object {name: "display", children: Array(4)}
15      4: Object {
16      name: "flex"
17      children: Array(1) [
18      0: Object {
19     }
20     ]
21     }

In our use case, we use the d3.js force-directed graph to create a visualization example for the Programming_language collection. Now, d3.js expects two arrays, namely nodes and links, in a JSON object, where each array is a JSON capturing the properties of nodes and relationships, respectively. The structure looks something like below:

1 {nodes = [{'id': 'Python', 'group': 0, 'level': 1, 'label': 'Python'}, {'id': 'Netherlands', 'group': 1, 'level': 2, 'label': 'Netherlands'}],
2 links = [{'source': 'Python', 'target': 'Netherlands', 'strength': 0.7}, {'source': 'Python', 'target': 'Guido Van Rossum', 'strength': 0.7}]}

Each of the JSON objects within the arrays have some mandatory fields and some optional fields. For example, nodes should have “id” as a mandatory field. Similarly, relationship objects should have “source” and “target” as the mandatory fields.

In order to create this structure from our Programming_language collection, we use a graph lookup that recursively creates the relationship between Python and its predecessor, the ABC programming language. Finally, that goes to Setl, the language from which ABC was inspired.

MongoDB graphlookup performs a recursive search on a collection, with options for restricting the search by recursion depth and query filter.

The $graphLookup process works as follows:

Input documents are processed in the $graphLookup stage of an aggregation pipeline.
The search is directed to the collection specified by the “from” parameter.
For each input document, the search starts with the value specified by startWith.
$graphLookup compares this startWith value to the field indicated by connectToField in other documents within the “from” collection.
When a match is found, $graphLookup retrieves the value from connectFromField and checks other documents in the “from” collection for corresponding connectToField values. Matching documents are then added to an array specified by the as parameter.
This recursive process continues until no further matches are found or the maximum recursion depth, defined by maxDepth, is reached.
Finally, $graphLookup appends the array to the original input document and completes its search for all input documents.

Finally, we create the nodes and relationship arrays and save them off to a JSON file. Please note that we also add Link Labels, which shows the relationship types between the nodes:

1 try:
2     uri = "YOUR-MONGO-URL"
3     client = MongoClient(uri)
4     database = client["generic_graph"]
5     language_collection = database["Programming_language"]
6     language_pipeline = [
7     {
8         '$graphLookup': {
9             'from': 'Programming_language', 
10             'startWith': '$SUCCESSOR_OF', 
11             'connectFromField': 'id', 
12             'connectToField': 'id', 
13             'as': 'relations', 
14             'maxDepth': 2
15         }
16     }, {
17         '$unwind': {
18             'path': '$relations', 
19             'preserveNullAndEmptyArrays': False
20         }
21     }
22     ]
23     lang_aggCursor = language_collection.aggregate(language_pipeline)
24     nodes=[]
25     links=[]
26     for document in lang_aggCursor:
27         print(document)
28         source_node_dict = {}
29         source_node_dict['id'] = document.get('id')
30         source_node_dict['group'] = 0
31         source_node_dict['level'] = 1
32         source_node_dict['label'] = document.get('id')
33         nodes.append(source_node_dict)
34         for key in document.keys():
35             print(key)
36             target_node_dict = {}
37             link_dict = {}
38             if key=='_id' or key=='id' or key=='type' or key=='SUCCESSOR_OF':
39                 continue
40             elif key=='relations':
41                 target_node_dict['id']=document[key]['id']
42                 target_node_dict['group']=1
43                 target_node_dict['level']=2
44                 target_node_dict['label'] = document[key]['id']
45                 inspired_node_dict = {}
46                 inspired_node_dict['id'] = document[key]['INSPIRED_BY'][0]
47                 inspired_node_dict['group'] = 1
48                 inspired_node_dict['level'] = 2
49                 inspired_node_dict['label'] = document[key]['INSPIRED_BY'][0]
50                 link_dict['source'] = target_node_dict.get('id')
51                 link_dict['target'] = inspired_node_dict.get('id')
52                 link_dict['strength'] = 0.7
53                 link_dict['linkName'] = 'INSPIRED_BY'
54                 link_dict_2 = {}
55                 link_dict_2['source'] = source_node_dict.get('id')
56                 link_dict_2['target'] = target_node_dict.get('id')
57                 link_dict_2['strength'] = 0.7
58                 link_dict_2['linkName'] = 'SUCCESSOR_OF'
59                 nodes.append(target_node_dict)
60                 nodes.append(inspired_node_dict)
61                 links.append(link_dict)
62                 links.append(link_dict_2)
63                 continue
64             else:
65                 target_node_dict['id'] = document[key][0]
66                 target_node_dict['group']=1
67                 target_node_dict['level']=2
68                 target_node_dict['label'] = document[key][0]
69             link_dict['source'] = source_node_dict.get('id')
70             link_dict['target'] = target_node_dict.get('id')
71             link_dict['strength'] = 0.7
72             link_dict['linkName'] = key
73             nodes.append(target_node_dict)
74             links.append(link_dict)
75     print(nodes)
76     print(links)
77 except Exception as e:
78     print(e)
79 finally:
80     client.close()
81 nodes_links = {"nodes": nodes,"links":links }
82 import json
83 with open("python-dependencies.json",'w') as f:
84     json.dump(nodes_links,f,indent=1)

This creates the python-dependecies.json file whose contents are shown as below with labels of nodes as well as links:

1 {nodes=[{'id': 'Python', 'group': 0, 'level': 1, 'label': 'Python'}, {'id': 'Netherlands', 'group': 1, 'level': 2, 'label': 'Netherlands'}, {'id': 'Guido Van Rossum', 'group': 1, 'level': 2, 'label': 'Guido Van Rossum'}, {'id': 'Centrum Wiskunde & Informatica', 'group': 1, 'level': 2, 'label': 'Centrum Wiskunde & Informatica'}, {'id': 'Amoeba Operating System', 'group': 1, 'level': 2, 'label': 'Amoeba Operating System'}, {'id': 'Abc Programming Language', 'group': 1, 'level': 2, 'label': 'Abc Programming Language'}, {'id': 'Setl', 'group': 1, 'level': 2, 'label': 'Setl'}],
2 links=[{'source': 'Python', 'target': 'Netherlands', 'strength': 0.7, 'linkName': 'DEVELOPED_IN'}, {'source': 'Python', 'target': 'Guido Van Rossum', 'strength': 0.7, 'linkName': 'CREATED_BY'}, {'source': 'Python', 'target': 'Centrum Wiskunde & Informatica', 'strength': 0.7, 'linkName': 'DEVELOPED_AT'}, {'source': 'Python', 'target': 'Amoeba Operating System', 'strength': 0.7, 'linkName': 'INTERFACE_WITH'}, {'source': 'Abc Programming Language', 'target': 'Setl', 'strength': 0.7, 'linkName': 'INSPIRED_BY'}, {'source': 'Python', 'target': 'Abc Programming Language', 'strength': 0.7, 'linkName': 'SUCCESSOR_OF'}]}

We then use this JSON file to create the nodes and links array in the d3.js code. We have reused some of the code from the GitHub URL and updated the Link Labels there. Finally, we run a local node http server and render the HTML. The output looks as below:

As we can see, we are able to display the relationship graph that was captured by our MongoDB collections using d3.js.

If we take the embedded relations approach, the rendering becomes even more easy and generic. The following code reads from documents which have the embedded target nodes and edges data in the documents themselves. Simply doing a find on all documents of a collection gives us all the required nodes and edges to form the graph, as shown in the code snippet below:

1 master_lookup_set = set()
2 nodes=[]
3 links=[]
4 try:
5     uri = "YOUR-MONGODB-CLUSTER-URL"
6     client = MongoClient(uri)
7     database = client["embedded_graph_2"]
8     collection = database["Programming_language"]
9     cursor = collection.find({},{'_id':0,'id':1,'d3_edges':1,'d3_target_nodes':1,'d3_source_node':1})
10     for document in cursor:
11         print(document)
12         if document['id'] not in master_lookup_set:
13             master_lookup_set.add(document['id'])
14             nodes.append(document['d3_source_node'])
15             for link in document['d3_edges']:
16                 links.append(link)
17                 master_lookup_set.add(link['target'])
18             for target_node in document['d3_target_nodes']:
19                 nodes.append(target_node)
20         else:
21             for link in document['d3_edges']:
22                 links.append(link)
23 except Exception as e:
24     print(e)
25 finally:
26     client.close()
27 
28 nodes_links = {"nodes": nodes,"links":links }
29 import json
30 with open("python-dependencies_embedded_2.json",'w') as f:
31     json.dump(nodes_links,f,indent=1)

This generates the “python-dependencies_embedded_2.json” file, which, when fed to the d3.js HTML code, results in the following graph, which is exactly like the one shown above:

As you can see, this is much simpler and more usable code, especially when we need to visualize nodes and relationships.

Conclusion

This article provides all the integration points and code snippets that help developers leverage a knowledge base with MongoDB for RAG architectures. It demonstrates that MongoDB’s JSON document is the base for generating graph models and visualizations powered by MongoDB Atlas Charts. Additionally, please note that most of the code is out of the box, focusing on MongoDB’s key value proposition around developer productivity.

Interested in a role on MongoDB’s Partner Presales team? We have several open roles **on our teams across the globe and would love for you to transform your career with us!

Top Comments in Forums

There are no comments on this article yet.

Start the Conversation

Rate this tutorial

Tutorial

Building a Restaurant Locator Using Atlas, Neurelo, and AWS Lambda

Apr 02, 2024 | 8 min read

Tutorial

Tutorial: Build a Movie Search Engine Using Atlas Full-Text Search in 10 Minutes

Sep 09, 2024 | 10 min read

Tutorial

Using the Confluent Cloud With Atlas Stream Processing

Nov 19, 2024 | 5 min read

Article

Atlas Online Archive: Efficiently Manage the Data Lifecycle

Jan 13, 2025 | 8 min read

Document-based databases
Developing a knowledge base
Powering the knowledge base in rendering visualizations
Conclusion

Atlas

Building a Knowledge Base and Visualization Graphs for RAG With MongoDB

Document-based databases

Atlas Vector Search pre-filter

Hybrid search systems

Knowledge bases

Developing a knowledge base

Embedded child/parent doc supporting native MongoDB schema

Powering the knowledge base in rendering visualizations

Conclusion

Top Comments in Forums

Related

Building a Restaurant Locator Using Atlas, Neurelo, and AWS Lambda

Tutorial: Build a Movie Search Engine Using Atlas Full-Text Search in 10 Minutes

Using the Confluent Cloud With Atlas Stream Processing

Atlas Online Archive: Efficiently Manage the Data Lifecycle

Table of Contents

1	llm = ChatOpenAI(temperature=0, model_name="gpt-4-turbo",api_key="YOUR-OPENAI-KEY")
2
3	llm_transformer = LLMGraphTransformer(llm=llm)
4	text = """
5	Python was invented in the late 1980s by Guido van Rossum at Centrum Wiskunde & Informatica in the Netherlands as a successor to the ABC
6	programming language, which was inspired by SETL capable of exception handling and interfacing with the Amoeba operating system.
7	Its implementation began in December 1989. Python 2.0 was released on 16 October 2000, with many major new features such as list comprehensions,
8	cycle-detecting garbage collection, reference counting, and Unicode support. Python 3.0, released on 3 December 2008,
9	with many of its major features backported to Python 2.6.x and 2.7.x. Releases of Python 3 include the 2to3 utility,
10	which automates the translation of Python 2 code to Python 3.
11	"""
12	documents = [Document(page_content=text)]
13	graph_documents = llm_transformer.convert_to_graph_documents(documents)
14	print(f"Nodes:{graph_documents[0].nodes}")
15	print(f"Relationships:{graph_documents[0].relationships}")

1	nodes = graph_documents[0].nodes
2	relationships = graph_documents[0].relationships
3	collections = set()
4	for node in nodes:
5	collections.add(node.type)
6	print(collections)
7	try:
8	uri = "MONGO-DB-URL"
9	client = MongoClient(uri)
10	database = client["generic_graph"]
11	for collection in collections:
12	database.create_collection(collection)
13	except Exception as e:
14	print(e)
15	finally:
16	client.close()

1	#Figuring out all relationship types per node types
2	node_relationship_types = {}
3	for node in nodes:
4	#print(f'On Node {node.id}')
5	node_relationship_types[node.id] = set()
6	for rel in relationships:
7	#print(f'Looking at Relationship for {rel.source.id}')
8	if rel.source.id == node.id:
9	node_relationship_types[node.id].add(rel.type)
10	print(node_relationship_types)

1	mongo_documents = []
2	for node in nodes:
3	document_dict = {}
4	document_dict['id'] = node.id
5	document_dict['type'] = node.type
6	document_relations = node_relationship_types[node.id]
7	for document_relation in document_relations:
8	document_dict[document_relation] = []
9	for rel in relationships:
10	if rel.source.id == node.id:
11	document_dict[rel.type].append(rel.target.id)
12	mongo_documents.append(document_dict)
13	print(mongo_documents)

1	try:
2	uri = "YOUR-MONGO-URL"
3	client = MongoClient(uri)
4	database = client["generic_graph"]
5	for mongo_document in mongo_documents:
6	collection = database[mongo_document['type']]
7	collection.insert_one(mongo_document)
8	except Exception as e:
9	print(e)
10	finally:
11	client.close()

1	[{'CREATED_BY': ['Guido Van Rossum'],
2	'DEVELOPED_AT': ['Centrum Wiskunde & Informatica'],
3	'DEVELOPED_IN': ['Netherlands'],
4	'INTERFACE_WITH': ['Amoeba Operating System'],
5	'SUCCESSOR_OF': ['Abc Programming Language'],
6	'd3_edges': [{'linkName': 'CREATED_BY',
7	'source': 'Python',
8	'strength': 0.7,
9	'target': 'Guido Van Rossum'},
10	{'linkName': 'DEVELOPED_AT',
11	'source': 'Python',
12	'strength': 0.7,
13	'target': 'Centrum Wiskunde & Informatica'},
14	{'linkName': 'DEVELOPED_IN',
15	'source': 'Python',
16	'strength': 0.7,
17	'target': 'Netherlands'},
18	{'linkName': 'SUCCESSOR_OF',
19	'source': 'Python',
20	'strength': 0.7,
21	'target': 'Abc Programming Language'},
22	{'linkName': 'INTERFACE_WITH',
23	'source': 'Python',
24	'strength': 0.7,
25	'target': 'Amoeba Operating System'}],
26	'd3_source_node': {'group': 0, 'id': 'Python', 'label': 'Python', 'level': 1},
27	'd3_target_nodes': [{'group': 1,
28	'id': 'Guido Van Rossum',
29	'label': 'Guido Van Rossum',
30	'level': 2},
31	{'group': 1,
32	'id': 'Centrum Wiskunde & Informatica',
33	'label': 'Centrum Wiskunde & Informatica',
34	'level': 2},
35	{'group': 1,
36	'id': 'Netherlands',
37	'label': 'Netherlands',
38	'level': 2},
39	{'group': 1,
40	'id': 'Abc Programming Language',
41	'label': 'Abc Programming Language',
42	'level': 2},
43	{'group': 1,
44	'id': 'Amoeba Operating System',
45	'label': 'Amoeba Operating System',
46	'level': 2}],
47	'id': 'Python',
48	'type': 'Programming_language'}]

1	Object {source: "X1", target: "X2", type: "suit"}
2	Object {source: "X2", target: "X3", type: "resolved"}
3	Object {source: "X3", target: "X4", type: "suit"}
4	Object {source: "X4", target: "X1", type: "suit"}
5	columns: Array(3) [
6	0: "source"
7	1: "target"
8	2: "type"
9	]

1	Object {source: "Napoleon", target: "Myriel", value: 1}
2	Object {source: "Mlle.Baptistine", target: "Myriel", value: 8}
3	Object {source: "Mme.Magloire", target: "Myriel", value: 10}
4	Object {source: "Mme.Magloire", target: "Mlle.Baptistine", value: 6}
5	Object {source: "Cravatte", target: "Myriel", value: 1}
6	Object {source: "Count", target: "Myriel", value: 2}
7	Object {source: "OldMan", target: "Myriel", value: 1}

1	data = Object {
2	name: "flare"
3	children: Array(10) [
4	0: Object {
5	name: "analytics"
6	children: Array(3) [
7	0: Object {name: "cluster", children: Array(4)}
8	1: Object {name: "graph", children: Array(5)}
9	2: Object {name: "optimization", children: Array(1)}
10	]
11	}
12	1: Object {name: "animate", children: Array(12)}
13	2: Object {name: "data", children: Array(7)}
14	3: Object {name: "display", children: Array(4)}
15	4: Object {
16	name: "flex"
17	children: Array(1) [
18	0: Object {
19	}
20	]
21	}

1	{nodes = [{'id': 'Python', 'group': 0, 'level': 1, 'label': 'Python'}, {'id': 'Netherlands', 'group': 1, 'level': 2, 'label': 'Netherlands'}],
2	links = [{'source': 'Python', 'target': 'Netherlands', 'strength': 0.7}, {'source': 'Python', 'target': 'Guido Van Rossum', 'strength': 0.7}]}

1	master_lookup_set = set()
2	nodes=[]
3	links=[]
4	try:
5	uri = "YOUR-MONGODB-CLUSTER-URL"
6	client = MongoClient(uri)
7	database = client["embedded_graph_2"]
8	collection = database["Programming_language"]
9	cursor = collection.find({},{'_id':0,'id':1,'d3_edges':1,'d3_target_nodes':1,'d3_source_node':1})
10	for document in cursor:
11	print(document)
12	if document['id'] not in master_lookup_set:
13	master_lookup_set.add(document['id'])
14	nodes.append(document['d3_source_node'])
15	for link in document['d3_edges']:
16	links.append(link)
17	master_lookup_set.add(link['target'])
18	for target_node in document['d3_target_nodes']:
19	nodes.append(target_node)
20	else:
21	for link in document['d3_edges']:
22	links.append(link)
23	except Exception as e:
24	print(e)
25	finally:
26	client.close()
27
28	nodes_links = {"nodes": nodes,"links":links }
29	import json
30	with open("python-dependencies_embedded_2.json",'w') as f:
31	json.dump(nodes_links,f,indent=1)