Embedded Data Versus References
On this page
Effective data models support your application's needs. One key decision for your schema design is whether to embed data or use references.
Embedded Data Models
You can embed related data in a single document. In the following
example, the contact
and access
fields are embedded documents:
Embedded data models are often denormalized, because frequently-accessed data is duplicated in multiple collections.
Embedded data models let applications query related pieces of information in the same database record. As a result, applications require fewer queries and updates to complete common operations.
Use Cases
Use embedded data models in the following scenarios:
You have "contains" relationships between entities. For example, a
contacts
document that contains anaddress
. See Model One-to-One Relationships with Embedded Documents.You have one-to-many relationships between entities. In these relationships, the "many" or child documents are viewed in the context of the "one" or parent documents. See Model One-to-Many Relationships with Embedded Documents.
Embedding provides the following benefits:
Better performance for read operations
The ability to retrieve related data in a single database operation
The ability to to update related data in a single atomic write operation
Query Embedded Data
To query data within embedded documents, use dot notation. For examples of querying data in arrays and embedded documents, see:
Note
Document Size Limit
Documents in MongoDB must be smaller than 16 megabytes.
For large binary data, consider GridFS.
References
References store relationships between data by including links, called
references, from one document to another. In the following example,
the contact
and access
documents contain a reference to the
user
document.
References result in normalized data models because data is divided into multiple collections and not duplicated.
Use Cases
Use references to link related data in the following scenarios:
Embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication. For example, when the embedded data frequently changes.
You need to represent complex many-to-many relationships or large hierarchical data sets.
The related entity is frequently queried on its own. For example, if you have
employee
anddepartment
data, you may consider embedding department information in theemployee
documents. However, if you often query for a list of departments, your application will perform best with a separatedepartment
collection that is linked to theemployee
collection with a reference.
Query Normalized Data Models
To query normalized data in multiple collections, MongoDB provides the following aggregation stages:
For an example of normalized data models, see Model One-to-Many Relationships with Document References.
For examples of various tree models, see Model Tree Structures.
Learn More
For more information on data modeling with MongoDB, download the MongoDB Application Modernization Guide.
The download includes the following resources:
Presentation on the methodology of data modeling with MongoDB
White paper covering best practices and considerations for migrating to MongoDB from an RDBMS data model
Reference MongoDB schema with its RDBMS equivalent
Application Modernization scorecard