I am running the following query on a dataset with one billion records. The query becomes significantly slower when I add a sort by userId. The execution time increases drastically. Can anyone guide me on how to optimize this query for better performance?
We have set the index and used sorting, but it has not improved the grouping performance, and it’s still taking a long time. Do you have any suggestions for making this pipeline faster?
What works sometimes is to $group with _id only and do any computation in a $lookup pipeline.
Something like:
The theory, is that the first empty $group uses a lot less memory. While the computations are the same in the inner $group it is performed on a smaller set of documents and can be sent to the next stage faster.
But sorting on a computed value like totalMonthly will always be kind of slow since no index can be used. Sometimes the computed pattern is your best friend.