Construindo com padrões: o padrão de collection única

Daniel Coupal4 min read • Published Jan 27, 2025 • Updated Jan 27, 2025

MongoDB Esquema

Classifique este artigo

Introdução

In a previous data modeling article, we described different data types and how they fare with data duplication. A many-to-many relationship with data that can't sustain data duplication may not be a good candidate for embedding the relationship or applying the Extended Reference Pattern. However, there is another way this relationship can still be optimized when working with the document model.

The Single Collection Pattern

The Single Collection Pattern is inspired by the Single Table Pattern and adapted for the document model. The Single Table Pattern has been described extensively by Amazon employees. It is used in many DynamoDB applications to reduce costs, improve performance, and avoid some of the common pitfalls developers run into with NoSQL databases.

The pattern is sometimes called the Adjacency Pattern. The Single Collection Pattern groups related documents of different types into a single collection. The documents would have otherwise been in various collections.

Three characteristics define the pattern:

All related documents that are frequently accessed together are stored in the same collection.
Relationships between the documents are stored as pointers or other structures within the document.
An index is built on the field or array that maps the relationships. Such an index supports retrieving all related documents in a single query without database join operations.

The pattern helps model many-to-many relationships, allowing one to keep only one copy of the information. This avoids using data duplication where the cost of the duplication exceeds the benefits.

There is an additional scenario where this pattern excels. This is the case where frequent updates are done on large documents. By breaking the document into smaller parts, a database engine that rewrites the whole document to disk for every change can perform smaller write operations and improve performance.

If the pattern is used to model a many-to-many relationship, one must use an array of references and maintain these references bidirectionally, at least one reference to itself, as shown in the example later. For the other types of relationships, one can alternatively build the relationship through a single field. For example, let's take a customer that has many orders, and each order has many line items. We can create a field _id carrying the information for each document type and the relationship of that document to its parent document.

1 {
2   "_id": "C12345",
3   "doc_type": "customer",
4   ...
5 }
6 
7 {
8   "_id": "C12345-I34562",
9   "doc_type": "invoice",
10   ...
11 }
12 
13 {
14   "_id": "C12345-I34562-L0001",
15   "doc_type": "line item",
16   ...
17 }

A query like this would retrieve an invoice and all its line items.

1 find({"_id": /^C12345-I34562/})

A use case with the Single Collection Pattern

Let's use an application that allows students to see the status of the classes they follow in a given semester. A class document will be the specific instance of the class, in other words, the class taught by a professor at a given time. The primary entity of our system is the student. This is the central entity that is queried by the system.

If we were to embed one side of the relationship, it would be the classes, the secondary entity. Because the system updates many things about a class, such as the next session and the summary of the previous session, we don't want to duplicate the information for each student. It makes sense to keep the entities separated, but let's apply the Single Collection Pattern.

1 {
2   "_id": "CS101-001",
3   "doc_type": "class",
4   "class_name": "Introduction to Programming",
5   "course_id": "CS101",
6   "instructor": {
7     "name": "Dr. Emily Smith",
8     "email": "emily.smith@example.com",
9     "office_hours": "Tuesdays and Thursdays 2:00 PM - 4:00 PM"
10   },
11   "semester": "Spring 2025",
12   "schedule": [
13     {
14       "day_time": "Monday 10:00 AM - 11:30 AM",
15       "location": "Room 101, Science Building"
16     },
17     {
18       "day_time": "Wednesday 10:00 AM - 11:30 AM",
19       "location": "Room 101, Science Building"
20     },
21     {
22       "day_time": "Friday 10:00 AM - 11:30 AM",
23       "location": "Room 101, Science Building"
24     }
25   ],
26   "current_topic": "Loops and Iterations",
27   "next_class_time": "2025-01-09T10:00:00Z",
28   "upcoming_session_summary": "We will explore different types of loops (for, while) and how to use them effectively in Python.",
29   "links": [
30     { "target": "CS101-001", "doc_type": "class" },
31     { "target": "S10023", "doc_type": "student" },
32     { "target": "S12345", "doc_type": "student" },
33     ...
34     { "target": "S12355", "doc_type": "student" }
35   ]
36 }

Note the doc_type e a target fields in the links array allow us to maintain the relationship between our entities and filter them by document type.

Example of a student document:

1 {
2   "_id": "S12345",
3   "doc_type": "student",
4   "name": "Jane Doe",
5   "major": "Computer Science",
6   "semester": "Spring 2025",
7   "registered_classes": [
8     {
9       "course_id": "CS101",
10       "class_instance_id": "CS101-001",
11       "class_name": "Introduction to Programming"
12     },
13     {
14       
15       "course_id": "MATH201",
16       "class_instance_id": "MATH201-002",
17       "class_name": "Calculus II"
18     }
19   ],
20   "links": [
21     { "target": "CS101-001", "doc_type": "class" },
22     { "target": "MATH201-002", "doc_type": "class" },
23     { "target": "S12345", "doc_type": "student" }
24   ]
25 }

If there is no need to query for all students in a given class, the links array does not need to include pointers to the classes. However, it needs to have one reference to itself so a query can return a "students" document and all associated "classes" documents.

Next, we index the links structure to optimize the queries.

1 db.students_classes.createIndex({"links.target":1, "links.doc_type": 1})

The following query retrieves a student and all the classes they take without doing any joins, providing top performance.

1 db.students_classes.find({"links.target": "S12345"})

The following query retrieves all the registered students in the CS101-001 class.

1 db.students_classes.find({"doc_type": "student", "links.target": "CS101-001"})

Conclusão

The Single Collection is excellent for modeling systems with high-velocity operations, low-latency requirements, and frequent write operations. For example, adding data in bits and pieces to documents over time. By breaking down a document, you avoid performance issues with storage engines that rewrite full documents upon any update. The other scenario where the pattern shines is its ability to model many-to-many relationships where data duplication would be a significant issue.

On the downside, the pattern adds complexity to the code and the management of the references. One should prefer embedding or the Extended Reference Pattern for the scenarios where the solution prefers simplicity or data duplication is acceptable and manageable. If the requirement for performance is for reading many sections of data at once, one would prefer preassembling the different pieces of data in a document with the shape to be read; in other words, one would prefer a larger document where entities are embedded.

To learn more about the other patterns, look at the list in this summary or the book MongoDB Data Modeling and Schema Design or take the free Data Modeling classes at MongoDB University.

Principais comentários nos fóruns

Ainda não há comentários sobre este artigo.

Iniciar a conversa

Classifique este artigo

Relacionado

Tutorial

Como implantar um aplicativo Spark com MongoDB no Fly.io

Dec 02, 2024 | 5 min read

Artigo

Java x Kotlin: sintaxe diferente, mesmas possibilidades

Nov 25, 2024 | 5 min read

Tutorial

Introdução ao desenvolvimento de backend em Kotlin usando Spring Boot 3 e MongoDB

Feb 21, 2023 | 6 min read

Tutorial

Integrar o Azure Key Vault com a criptografia em nível de campo no lado do cliente do MongoDB

May 24, 2022 | 9 min read

Sumário

Introdução
The Single Collection Pattern
A use case with the Single Collection Pattern
Conclusão

1	{
2	"_id": "C12345",
3	"doc_type": "customer",
4	...
5	}
6
7	{
8	"_id": "C12345-I34562",
9	"doc_type": "invoice",
10	...
11	}
12
13	{
14	"_id": "C12345-I34562-L0001",
15	"doc_type": "line item",
16	...
17	}

1	{
2	"_id": "CS101-001",
3	"doc_type": "class",
4	"class_name": "Introduction to Programming",
5	"course_id": "CS101",
6	"instructor": {
7	"name": "Dr. Emily Smith",
8	"email": "emily.smith@example.com",
9	"office_hours": "Tuesdays and Thursdays 2:00 PM - 4:00 PM"
10	},
11	"semester": "Spring 2025",
12	"schedule": [
13	{
14	"day_time": "Monday 10:00 AM - 11:30 AM",
15	"location": "Room 101, Science Building"
16	},
17	{
18	"day_time": "Wednesday 10:00 AM - 11:30 AM",
19	"location": "Room 101, Science Building"
20	},
21	{
22	"day_time": "Friday 10:00 AM - 11:30 AM",
23	"location": "Room 101, Science Building"
24	}
25	],
26	"current_topic": "Loops and Iterations",
27	"next_class_time": "2025-01-09T10:00:00Z",
28	"upcoming_session_summary": "We will explore different types of loops (for, while) and how to use them effectively in Python.",
29	"links": [
30	{ "target": "CS101-001", "doc_type": "class" },
31	{ "target": "S10023", "doc_type": "student" },
32	{ "target": "S12345", "doc_type": "student" },
33	...
34	{ "target": "S12355", "doc_type": "student" }
35	]
36	}

1	{
2	"_id": "S12345",
3	"doc_type": "student",
4	"name": "Jane Doe",
5	"major": "Computer Science",
6	"semester": "Spring 2025",
7	"registered_classes": [
8	{
9	"course_id": "CS101",
10	"class_instance_id": "CS101-001",
11	"class_name": "Introduction to Programming"
12	},
13	{
14
15	"course_id": "MATH201",
16	"class_instance_id": "MATH201-002",
17	"class_name": "Calculus II"
18	}
19	],
20	"links": [
21	{ "target": "CS101-001", "doc_type": "class" },
22	{ "target": "MATH201-002", "doc_type": "class" },
23	{ "target": "S12345", "doc_type": "student" }
24	]
25	}