Use array values as path to match documents

Ruben_FS · August 16, 2024, 9:45am

Hi everyone. Given a list of documents like:

[
  {
    _id: 1,
    field: {
      subfield1: [
        "value", "test"
      ],
      subfield2: [
        "value"]
    },
    paths: [
      "field.subfield1",
      "field.subfield2"
    ]
  },
  {
    _id: 2,
    field: {
      subfield1: [
        "value"
      ],
      subfield2: [
        "test"
      ]
    },
    paths: [
      "field.subfield2"
    ]
  },
  {
    _id: 3,
    field: {
      subfield1: [
        "test"
      ],
      subfield2: [
        "value"
      ]
    },
    paths: [
      "field.subfield2"
    ]
  },
  {
    _id: 4,
    field: {
      subfield1: [
        "value"
      ],
      subfield2: [
        "value", "test"
      ]
    },
    paths: [
      "field.subfield1"
    ]
  }
]

in which paths field is an array with paths to document. I have to return documents that have test as a value in at least one of the paths specified in the paths array. For example, the documents with _id 1 and _id 2 would be returned, because at least one of the values in paths is a path in the document with value test. Documents with _id 3 and _id 4 are not returned since no element of paths is a path in the document that has value test. The real case is bit more complicated, because there is one more level in field object, so, field would be something like:

field: {
    subfield1: {
        subsubfield1: ["value", "test"]
    }
}

and paths values would be something like field.subfield1.subsubfield1.

John_Sewell · August 16, 2024, 11:36am

I had a play trying various things but the multi level proved to be tricky, where it could be n levels deep, using straight aggregation.
There was a similar question here:

That converts the document / element to an array and then searches. I guess you could set limits on lookups, i.e. that they need to be max 3 deep and then work out all possible options but that’s starting to sound 1) like a complete hack and 2) horrible slow.

I’d be interested to see if anyone else comes up with an approach within the aggregation framework without resorting to having the calling code do something.

My plan was to be an $unwind, then the dynamic matching, then $group back up to re-form the documents that satisfy the criteria.

John

emmanuel_bernard · August 17, 2024, 4:07pm

Hi @Ruben_FS ,

Can you tell me if the structure of your documents is always the same ?
How many subfields and subsubfields could you have ?

Ruben_FS · August 19, 2024, 6:14am

Hi @emmanuel_bernard . This is the a better example of the documents. I would edit the original post, but I cannot:

  {
    _id: 1,
    field: {
      subfield1: {
        subsubfield1: [
          "value",
          "test"
        ]
      },
      subfield2: {
        subsufield2: [
          "value"
        ]
      }
    },
    paths: [
      "field.subfield1.subsubfield1",
      "field.subfield2.subsubfield2"
    ]
  },
  {
    _id: 2,
    field: {
      subfield1: {
        subsubfield1: [
          "value"
        ]
      },
      subfield2: {
        subsufield2: [
          "value",
          "test"
        ]
      }
    },
    paths: [
      "field.subfield2.subsubfield2"
    ]
  },
  {
    _id: 3,
    field: {
      subfield1: {
        subsubfield1: [
          "value"
        ]
      },
      subfield2: {
        subsufield2: [
          "value"
        ]
      }
    },
    paths: [
      "field.subfield1.subsubfield1"
    ]
  },
  {
    _id: 4,
    field: {
      subfield1: {
        subsubfield1: [
          "value"
        ]
      },
      subfield2: {
        subsufield2: [
          "value"
        ]
      }
    },
    paths: [
      "field.subfield1.subsubfield1"
      "field.subfield2.subsubfield2"
    ]
  }
]

I have to return documents that have test as a value in at least one of the paths specified in the paths array. For example, the documents with _id 1 and _id 2 would be returned, because at least one of the values in paths is a path in the document with value test . Documents with _id 3 and _id 4 are not returned since no element of paths is a path in the document that has value test .

So, the answer to your question is yes, the documents has always the same structure. The field structure always has 3 levels (i.e. field.subfield1.subsubfield1).

Thank you.

emmanuel_bernard · August 19, 2024, 1:02pm

Hi @Ruben_FS,

If the structure of your documents is always :

  {
    _id: 1
    field: {
      subfield1: {
        subsubfield1: [  ...  ]
      },
      subfield2: {
        subsubfield2: [  ...  ]
      }
    },
    paths: [
      "field.subfield1.subsubfield1",
      "field.subfield2.subsubfield2"
    ]
  }

You can use this query :

https://mongoplayground.net/p/QOlMkJ4EPpj

Ruben_FS · August 20, 2024, 7:42am

The problem is that properties into field are dynamic. That’s the hard part of the query, I guess. The structure I wrote above is just an example. field has a variable number of keys, as well as subfield1. And the names of subfieldN', and subsubfieldN` are dynamics, so I cannot hardcoded it in the query.

emmanuel_bernard · August 20, 2024, 8:10am

Hi @Ruben_FS ,

Could you give us a real example ?
It’s not easy to provide answers based on an abstract one.

If the properties are dynamic, the $objectToArray aggregation stage could be a good idea

Ruben_FS · August 20, 2024, 8:44pm

Hi @emmanuel_bernard . I posted the problem in stackoverflow and the community found a solution. Thank you in any case for your help and interest.

steevej · August 20, 2024, 8:59pm

And to add to this I want to mention that the solution on SO uses $objectToArray as hinted by

It is also important to note that this will be very slow since $objectToArray is called for all documents and what ever $match is done, it will be done on a non-indexed computed value.

John_Sewell · August 21, 2024, 7:50am

As steevej pointed out you may have huge performance issues with this, it may be worth looking at how flexible in the architecture / schema is.
How often do you update documents / insert new ones and read from them.
Given the above, work out an alternative storage structure or possibly pre-compute the data into something that’s easier to work with, and by direct link, easier to index.
If you have hundreds of thousands of documents that are constantly being searched for it’ll slow down your application.

I’d be very interested to hear back on how you got on implementing the solution on SO and how it performs with your dataset and use cases.

Good luck!

steevej · August 21, 2024, 12:14pm

I second the above.

My proposition would be to permanently migrate your collection with the result of $objectToArray. This will bring your model close to the attribute pattern. Dynamic field names often call for the attribute pattern. Your k: being the dynamic field name and v: the value.

The major problem with dynamic field names is that you cannot really index them.

system · August 26, 2024, 12:14pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.