Model One-to-Many Relationships with Embedded Documents
On this page
Overview
This page describes a data model that uses embedded documents to describe a one-to-many relationship between connected data. Embedding connected data in a single document can reduce the number of read operations required to obtain data. In general, you should structure your schema so your application receives all of its required information in a single read operation.
Embedded Document Pattern
Consider the following example that maps patron and multiple address
relationships. The example illustrates the advantage of embedding over
referencing if you need to view many data entities in context of
another. In this one-to-many relationship between patron
and
address
data, the patron
has multiple address
entities.
In the normalized data model, the address
documents contain a
reference to the patron
document.
// patron document { _id: "joe", name: "Joe Bookreader" } // address documents { patron_id: "joe", // reference to patron document street: "123 Fake Street", city: "Faketon", state: "MA", zip: "12345" } { patron_id: "joe", street: "1 Some Other Street", city: "Boston", state: "MA", zip: "12345" }
If your application frequently retrieves the address
data with the
name
information, then your application needs to issue multiple
queries to resolve the references. A more optimal schema would be to
embed the address
data entities in the patron
data, as in the
following document:
{ "_id": "joe", "name": "Joe Bookreader", "addresses": [ { "street": "123 Fake Street", "city": "Faketon", "state": "MA", "zip": "12345" }, { "street": "1 Some Other Street", "city": "Boston", "state": "MA", "zip": "12345" } ] }
With the embedded data model, your application can retrieve the complete patron information with one query.
Subset Pattern
A potential problem with the embedded document pattern is that it can lead to large documents, especially if the embedded field is unbounded. In this case, you can use the subset pattern to only access data which is required by the application, instead of the entire set of embedded data.
Consider an e-commerce site that has a list of reviews for a product:
{ "_id": 1, "name": "Super Widget", "description": "This is the most useful item in your toolbox.", "price": { "value": NumberDecimal("119.99"), "currency": "USD" }, "reviews": [ { "review_id": 786, "review_author": "Kristina", "review_text": "This is indeed an amazing widget.", "published_date": ISODate("2019-02-18") }, { "review_id": 785, "review_author": "Trina", "review_text": "Nice product. Slow shipping.", "published_date": ISODate("2019-02-17") }, ... { "review_id": 1, "review_author": "Hans", "review_text": "Meh, it's okay.", "published_date": ISODate("2017-12-06") } ] }
The reviews are sorted in reverse chronological order. When a user visits a product page, the application loads the ten most recent reviews.
Instead of storing all of the reviews with the product, you can split the collection into two collections:
The
product
collection stores information on each product, including the product's ten most recent reviews:{ "_id": 1, "name": "Super Widget", "description": "This is the most useful item in your toolbox.", "price": { "value": NumberDecimal("119.99"), "currency": "USD" }, "reviews": [ { "review_id": 786, "review_author": "Kristina", "review_text": "This is indeed an amazing widget.", "published_date": ISODate("2019-02-18") } ... { "review_id": 776, "review_author": "Pablo", "review_text": "Amazing!", "published_date": ISODate("2019-02-16") } ] } The
review
collection stores all reviews. Each review contains a reference to the product for which it was written.{ "review_id": 786, "product_id": 1, "review_author": "Kristina", "review_text": "This is indeed an amazing widget.", "published_date": ISODate("2019-02-18") } { "review_id": 785, "product_id": 1, "review_author": "Trina", "review_text": "Nice product. Slow shipping.", "published_date": ISODate("2019-02-17") } ... { "review_id": 1, "product_id": 1, "review_author": "Hans", "review_text": "Meh, it's okay.", "published_date": ISODate("2017-12-06") }
By storing the ten most recent reviews in the product
collection, only the required subset of the overall data is returned in
the call to the product
collection. If a user wants to see
additional reviews, the application makes a call to the review
collection.
Tip
When considering where to split your data, the most frequently-accessed portion of the data should go in the collection that the application loads first. In this example, the schema is split at ten reviews because that is the number of reviews visible in the application by default.
Tip
See also:
To learn how to use the subset pattern to model one-to-one relationships between collections, see Model One-to-One Relationships with Embedded Documents.
Trade-Offs of the Subset Pattern
Using smaller documents containing more frequently-accessed data reduces the overall size of the working set. These smaller documents result in improved read performance for the data that the application accesses most frequently.
However, the subset pattern results in data duplication. In the example,
reviews are maintained in both the product
collection and the
reviews
collection. Extra steps must be taken to ensure that the
reviews are consistent between each collection. For example, when a
customer edits their review, the application may need to make two write
operations: one to update the product
collection and one to update
the reviews
collection.
You must also implement logic in your application to ensure that
the reviews in the product
collection are always the ten
most recent reviews for that product.
Other Sample Use Cases
In addition to product reviews, the subset pattern can also be a good fit to store:
Comments on a blog post, when you only want to show the most recent or highest-rated comments by default.
Cast members in a movie, when you only want to show cast members with the largest roles by default.