Working with arrays using the Extended Reference Pattern (ERP)

Hi all,

I’m are currently implementing the extended reference pattern to increase query speed and remove lookups. I hope somebody can give me feedback regarding best practices about working w/ arrays.

My model is something like this (simplified):

user.model.ts

{
  _id: ObjectId,
  email: string,
  name: string
}

tax.model.ts

{
  _id: ObjectId,
  name: string
  rate: number
}

expense.model.ts

{
  creator: ObjectId,
  reviewer: ObjectId,
  splits: [{
    name: string,
    tax: ObjectId
  }]
}

I’m considering the following two approaches:

  1. Adding new object extendedReferences containing the extended reference data

    expense.model.ts

    {
      creator: ObjectId,
      reviewer: ObjectId,
      splits: [{
        name: string,
        tax: ObjectId
     }],
     extendedReferences: {
      users: [{
        _id: ObjectId,
        email: string,
      }],
      taxes: [{
        _id: ObjectId,
        rate: number
      }],
     }
    }
    
  2. Replacing ObjectId w/ the extended reference object (like explained in the blog post)

    expense.model.ts

    {
      creator: {
        _id: ObjectId,
        email: string
      },
      reviewer: {
        _id: ObjectId,
        email: string
      },
      splits: [{
        name: string,
        tax: {
          _id: ObjectId,
          rate: number
        }
      }]
    }
    

Example for an expense and how I want to store the data using the 1st approach:

{
  creator: 1337,
  reviewer: 1337, 
  splits: [
    {
      name: "1st Split",
      tax: 1
    },
    {
      name: "2nd Split",
      tax: 2
    },
    {
      name: "3rd Split",
      tax: 1
    },
  ],
  extendedReferences: {
    users: [
      {
        _id: 1337,
        email: "user@gmail.com"
      }
    ],
    taxes: [
      {
        _id: 1,
        rate: 10
      },
      {
        _id: 2,
        rate: 20
      }
    ]
  }
}

I’m considering storing a single extended reference separately because the same tax can be used several times in the splits. Which I hope would give me the following advantages:

  1. Smaller document size
  2. Faster writes if extended reference data has to be updated

But I’m wondering if that 2nd point is true or premature optimization.
Furthermore, the downside is application logic for making searches when accessing extended reference data and I’m assuming more complex pipelines when working w/ that data because some find and replace steps are required.

What are the recommended best practices for this use case?

Hi :wave: @Steve,

Welcome to the MongoDB Community forums :sparkles:

Yes, by including an extended reference to the data that would most frequently be looked up/JOINed, we save a step in processing. By embedding the extended reference data directly in each document, it can simplify your queries and make it easier to access the necessary data without using $lookup.

Furthermore, to better understand your question, please share your common example query that you would be using without the extended reference pattern i.e., with the 3 initial collections user.model.ts, tax.model.ts, and expense.model.ts i.e., how would you do the queries using $lookup?

Also, it would be helpful if you could provide more context and details about your specific use case here.

Best,
Kushagra

1 Like

As I see, the number of references in expenses to users collection is limited to 2 i.e. creator and reviewer. For these 2, embedded approach is better i.e. the approach#2 of your. But for taxes, approach#1 suits more as number of splits is likely to grow and is not ‘fixed’. So overall, for your use case, a hybrid of 2 approaches seems more suited than either one of them.