Improving Query Performance for Conditional Summation in groupby

I am running the following query on a dataset with one billion records. The query becomes significantly slower when I add a sort by userId. The execution time increases drastically. Can anyone guide me on how to optimize this query for better performance?

[  
    {
        "$group": {
            "_id": "$userId",
            "totalMonthly": { $sum: { $cond: [{ "$eq": ["$subscriptionType", "month"] }, 1, 0] } },
            "subscriptionAmt": { $sum: "$subscriptionAmt" },
            "subscriptionId": {
                "$first": "$subscriptionId"
            }
        }
    },
    {
        "$sort": { "totalMonthly": -1 }
    },
    {
        "$limit": 5
    }
]

Why do you $sort then?

If you must sort make sure you have the appropriate index.

We have set the index and used sorting, but it has not improved the grouping performance, and it’s still taking a long time. Do you have any suggestions for making this pipeline faster?

What works sometimes is to $group with _id only and do any computation in a $lookup pipeline.

Something like:

The theory, is that the first empty $group uses a lot less memory. While the computations are the same in the inner $group it is performed on a smaller set of documents and can be sent to the next stage faster.

But sorting on a computed value like totalMonthly will always be kind of slow since no index can be used. Sometimes the computed pattern is your best friend.

Hi @Kartik_Paul, any followup on this last proposal?

We reviewed your last solution, but it takes longer than the existing query.

Thanks for the feedback.