in which paths field is an array with paths to document. I have to return documents that have test as a value in at least one of the paths specified in the paths array. For example, the documents with _id 1 and _id 2 would be returned, because at least one of the values in paths is a path in the document with value test. Documents with _id 3 and _id 4 are not returned since no element of paths is a path in the document that has value test. The real case is bit more complicated, because there is one more level in field object, so, field would be something like:
I had a play trying various things but the multi level proved to be tricky, where it could be n levels deep, using straight aggregation.
There was a similar question here:
That converts the document / element to an array and then searches. I guess you could set limits on lookups, i.e. that they need to be max 3 deep and then work out all possible options but that’s starting to sound 1) like a complete hack and 2) horrible slow.
I’d be interested to see if anyone else comes up with an approach within the aggregation framework without resorting to having the calling code do something.
My plan was to be an $unwind, then the dynamic matching, then $group back up to re-form the documents that satisfy the criteria.
I have to return documents that have test as a value in at least one of the paths specified in the paths array. For example, the documents with _id 1 and _id 2 would be returned, because at least one of the values in paths is a path in the document with value test . Documents with _id 3 and _id 4 are not returned since no element of paths is a path in the document that has value test .
So, the answer to your question is yes, the documents has always the same structure. The field structure always has 3 levels (i.e. field.subfield1.subsubfield1).
The problem is that properties into field are dynamic. That’s the hard part of the query, I guess. The structure I wrote above is just an example. field has a variable number of keys, as well as subfield1. And the names of subfieldN', and subsubfieldN` are dynamics, so I cannot hardcoded it in the query.
And to add to this I want to mention that the solution on SO uses $objectToArray as hinted by
It is also important to note that this will be very slow since $objectToArray is called for all documents and what ever $match is done, it will be done on a non-indexed computed value.
As steevej pointed out you may have huge performance issues with this, it may be worth looking at how flexible in the architecture / schema is.
How often do you update documents / insert new ones and read from them.
Given the above, work out an alternative storage structure or possibly pre-compute the data into something that’s easier to work with, and by direct link, easier to index.
If you have hundreds of thousands of documents that are constantly being searched for it’ll slow down your application.
I’d be very interested to hear back on how you got on implementing the solution on SO and how it performs with your dataset and use cases.
My proposition would be to permanently migrate your collection with the result of $objectToArray. This will bring your model close to the attribute pattern. Dynamic field names often call for the attribute pattern. Your k: being the dynamic field name and v: the value.
The major problem with dynamic field names is that you cannot really index them.