/ /

Schema Design Anti-Patterns

Docs Home

Development

Data Modeling

Schema Design Anti-Patterns

Docs Home

Development

Data Modeling

Schema Design Anti-Patterns

Avoid Unbounded Arrays

Storing arrays as field values lets you embed data and ensure that data that is accessed together is stored together. However, if you do not limit the number of elements in an array, your documents might exceed the 16MB BSON document size limit. An unbounded array can strain application resources and decrease index performance.

Instead of embedding entire datasets, use subsetting and referencing to bound arrays, which can improve performance and maintain manageable document sizes. When you subset data, you select only the necessary parts of your data to work with, which reduces memory usage and processing time by focusing only on relevant data. When you reference data, you link to external data sources rather than embedding them directly in your documents. This approach enhances performance and reduces document size. By using subsetting and referencing, you can bound arrays and manage your date more efficiently.

Example

Consider the following schema that tracks book reviews for a bookstore application. The initial schema uses an array for the reviews field.

{
   title: "Harry Potter",
   author: "J.K. Rowling",
   publisher: "Scholastic",
   reviews: [
      {
         user: "Alice",
         review: "Great book!",
         rating: 5
      },
      {
         user: "Bob",
         review: "Didn't like it!",
         rating: 1
      },
      {
         user: "Charlie",
         review: "Not bad, but could be better.",
         rating: 3
      }
   ]
}

In this schema, the reviews field is an unbounded array. Every time a new review is created for this book, the application adds a new sub-document to the reviews array. As more reviews are added, the array can grow too large and strain application resources.

In this example, the bookstore application only needs to show three book reviews per book. To avoid unbounded arrays, you can use the subset design pattern or document references, depending on your use case.

Subset Pattern

Subsetting data is best for when you need quick access to data that is not frequently updated. Using the subset pattern, you can embed three of the reviews in the book document to return all required information in a single operation. The other reviews are stored in a separate reviews collection. This schema design pattern provides the following benefits:

Eliminate the unbounded array
Control the document size
Avoid use of multiple queries

The books collection:

db.books.insertOne( [
   {
      title: "Harry Potter",
      author: "J.K. Rowling",
      publisher: "Scholastic",
      reviews: [
        {
           reviewer: "Alice",
           review: "Great book!",
           rating: 5
        },
        {
           reviewer: "Charlie",
           review: "Didn't like it.",
           rating: 1
        },
        {
           reviewer: "Bob",
           review: "Not bad, but could be better.",
           rating: 3
        }
      ],
   }
] )

The reviews collection:

db.reviews.insertMany( [
   {
      reviewer: "Jason",
      review: "Did not enjoy!",
      rating: 1
   },
   {
      reviewer: "Pam",
      review: "Favorite book!",
      rating: 5
   },
   {
      reviewer: "Bob",
      review: "Not bad, but could be better.",
      rating: 3
   }
] )

This approach duplicates data which causes updates to be expensive. This approach is best if reviews are not frequently updated.

Reference Data

Referencing data is best for when you need to manage large or frequently updated datasets without inflating document sizes.

To reference data, store reviews in a separate collection and add a review_id field to the documents in the reviews collection. Use the review_id field to reference the reviews in the books collection.

This approach solves the problem of the unbounded array, but it introduces latency because you need to query the reviews collection to retrieve review information for the books collection. Depending on your use case, this additional latency may be an acceptable trade-off to avoid the issues caused by unbounded arrays.

The books collection:

db.books.insertMany( [
   {
      title: "Harry Potter",
      author: "J.K. Rowling",
      publisher: "Scholastic",
      reviews: ["review1", "review2", "review3"]
   },
   {
      title: "Pride and Prejudice",
      author: "Jane Austen",
      publisher: "Penguin",
      reviews: ["review4", "review5"]
   }
] )

The reviews collection:

db.reviews.insertMany( [
   {
      review_id: "review1",
      reviewer: "Jason",
      review: "Did not enjoy!",
      rating: 1
   },
   {
      review_id: "review2",
      reviewer: "Pam",
      review: "Favorite book!",
      rating: 5
   },
   {
      review_id: "review3",
      reviewer: "Bob",
      review: "Not bad, but could be better.",
      rating: 3
   },
   {
      review_id: "review4",
      reviewer: "Tina",
      review: "Amazing!",
      rating: 5
   },
   {
      review_id: "review5",
      reviewer: "Jacob",
      review: "A little overrated",
      rating: 4,
   }
] )

Use $lookup to Join on an Array Field

If your books and reviews information is stored in separate collections, the application needs to perform a $lookup operation to join the data.

The following aggregation operation joins the books and reviews collection from the previous example.

db.books.aggregate( [
   {
      $lookup: {
         from: "reviews",
         localField: "reviews",
         foreignField: "review_id",
         as: "reviewDetails"
      }
   }
] )

The operation returns the following:

[
   {
      _id: ObjectId('665de81eeda086b5e22dbcc9'),
      title: 'Harry Potter',
      author: 'J.K. Rowling',
      publisher: 'Scholastic',
      reviews: [ 'review1', 'review2', 'review3' ],
      reviewDetails: [
      {
         _id: ObjectId('665de82beda086b5e22dbccb'),
         review_id: 'review1',
         reviewer: 'Jason',
         review: 'Did not enjoy!',
         rating: 1
      },
      {
         _id: ObjectId('665de82beda086b5e22dbccc'),
         review_id: 'review2',
         reviewer: 'Pam',
         review: 'Favorite book!',
         rating: 5
      },
      {
         _id: ObjectId('665de82beda086b5e22dbccd'),
         review_id: 'review3',
         reviewer: 'Bob',
         review: 'Not bad, but could be better.',
         rating: 3
      } ]
   },
   {
      _id: ObjectId('665de81eeda086b5e22dbcca'),
      title: 'Pride and Prejudice',
      author: 'Jane Austen',
      publisher: 'Penguin',
      reviews: [ 'review4', 'review5' ],
      reviewDetails: [
      {
         _id: ObjectId('665de82beda086b5e22dbcce'),
         review_id: 'review4',
         reviewer: 'Tina',
         review: 'Amazing!',
         rating: 5
      },
      {
         _id: ObjectId('665de82beda086b5e22dbccf'),
         review_id: 'review5',
         reviewer: 'Jacob',
         review: 'A little overrated',
         rating: 4
      } ]
   }
]

In this example, the $lookup operation joins the books collection with the reviews collection using the reviews array in the book document and the review_id field in the reviews documents. The reviewDetails document stores the combined data.

Learn More

Back

Schema Design Anti-Patterns

Reduce the Number of Collections

Example

Subset Pattern

Reference Data

Use $lookup to Join on an Array Field

Learn More

Earn a Skill Badge