Avoid Unbounded Arrays
Storing arrays as field values lets you embed data and ensure that data that is accessed together is stored together. However, if you do not limit the number of elements in an array, your documents might exceed the 16MB BSON document size limit. An unbounded array can strain application resources and decrease index performance.
Instead of embedding entire datasets, use subsetting and referencing to bound arrays, which can improve performance and maintain manageable document sizes. When you subset data, you select only the necessary parts of your data to work with, which reduces memory usage and processing time by focusing only on relevant data. When you reference data, you link to external data sources rather than embedding them directly in your documents. This approach enhances performance and reduces document size. By using subsetting and referencing, you can bound arrays and manage your date more efficiently.
Example
Consider the following schema that tracks book reviews for a bookstore
application. The initial schema uses an array for the reviews
field.
{ title: "Harry Potter", author: "J.K. Rowling", publisher: "Scholastic", reviews: [ { user: "Alice", review: "Great book!", rating: 5 }, { user: "Bob", review: "Didn't like it!", rating: 1 }, { user: "Charlie", review: "Not bad, but could be better.", rating: 3 } ] }
In this schema, the reviews
field is an unbounded array. Every time a
new review is created for this book, the application adds a new sub-document
to the reviews
array. As more reviews are added, the array can grow too
large and strain application resources.
In this example, the bookstore application only needs to show three book reviews per book. To avoid unbounded arrays, you can use the subset design pattern or document references, depending on your use case.
Subset Pattern
Subsetting data is best for when you need quick access to data that is
not frequently updated. Using the subset pattern, you can embed three of
the reviews in the book document to return all required information in a
single operation. The other reviews are stored in a separate reviews
collection. This schema design pattern provides the following benefits:
Eliminate the unbounded array
Control the document size
Avoid use of multiple queries
The books
collection:
db.books.insertOne( [ { title: "Harry Potter", author: "J.K. Rowling", publisher: "Scholastic", reviews: [ { reviewer: "Alice", review: "Great book!", rating: 5 }, { reviewer: "Charlie", review: "Didn't like it.", rating: 1 }, { reviewer: "Bob", review: "Not bad, but could be better.", rating: 3 } ], } ] )
The reviews
collection:
db.reviews.insertMany( [ { reviewer: "Jason", review: "Did not enjoy!", rating: 1 }, { reviewer: "Pam", review: "Favorite book!", rating: 5 }, { reviewer: "Bob", review: "Not bad, but could be better.", rating: 3 } ] )
This approach duplicates data which causes updates to be expensive. This approach is best if reviews are not frequently updated.
Reference Data
Referencing data is best for when you need to manage large or frequently updated datasets without inflating document sizes.
To reference data, store reviews in a separate collection and add a
review_id
field to the documents in the reviews
collection. Use
the review_id
field to reference the reviews in the books
collection.
This approach solves the problem of the unbounded array, but it introduces
latency because you need to query the reviews
collection to retrieve
review information for the books
collection. Depending on your use case,
this additional latency may be an acceptable trade-off to avoid the issues
caused by unbounded arrays.
The books
collection:
db.books.insertMany( [ { title: "Harry Potter", author: "J.K. Rowling", publisher: "Scholastic", reviews: ["review1", "review2", "review3"] }, { title: "Pride and Prejudice", author: "Jane Austen", publisher: "Penguin", reviews: ["review4", "review5"] } ] )
The reviews
collection:
db.reviews.insertMany( [ { review_id: "review1", reviewer: "Jason", review: "Did not enjoy!", rating: 1 }, { review_id: "review2", reviewer: "Pam", review: "Favorite book!", rating: 5 }, { review_id: "review3", reviewer: "Bob", review: "Not bad, but could be better.", rating: 3 }, { review_id: "review4", reviewer: "Tina", review: "Amazing!", rating: 5 }, { review_id: "review5", reviewer: "Jacob", review: "A little overrated", rating: 4, } ] )
Use $lookup to Join on an Array Field
If your books
and reviews
information is stored in separate
collections, the application needs to perform a $lookup
operation to join the data.
The following aggregation operation joins the books
and reviews
collection from the previous example.
db.books.aggregate( [ { $lookup: { from: "reviews", localField: "reviews", foreignField: "review_id", as: "reviewDetails" } } ] )
The operation returns the following:
[ { _id: ObjectId('665de81eeda086b5e22dbcc9'), title: 'Harry Potter', author: 'J.K. Rowling', publisher: 'Scholastic', reviews: [ 'review1', 'review2', 'review3' ], reviewDetails: [ { _id: ObjectId('665de82beda086b5e22dbccb'), review_id: 'review1', reviewer: 'Jason', review: 'Did not enjoy!', rating: 1 }, { _id: ObjectId('665de82beda086b5e22dbccc'), review_id: 'review2', reviewer: 'Pam', review: 'Favorite book!', rating: 5 }, { _id: ObjectId('665de82beda086b5e22dbccd'), review_id: 'review3', reviewer: 'Bob', review: 'Not bad, but could be better.', rating: 3 } ] }, { _id: ObjectId('665de81eeda086b5e22dbcca'), title: 'Pride and Prejudice', author: 'Jane Austen', publisher: 'Penguin', reviews: [ 'review4', 'review5' ], reviewDetails: [ { _id: ObjectId('665de82beda086b5e22dbcce'), review_id: 'review4', reviewer: 'Tina', review: 'Amazing!', rating: 5 }, { _id: ObjectId('665de82beda086b5e22dbccf'), review_id: 'review5', reviewer: 'Jacob', review: 'A little overrated', rating: 4 } ] } ]
In this example, the $lookup
operation joins the books
collection
with the reviews
collection using the reviews
array in the book
document and the review_id
field in the reviews documents. The
reviewDetails
document stores the combined data.