How to group and bucket on different ids at the same time

Hello, I would like to know if it’s possible (and if so, how to do it) to group entries by a certain key, and then bucket those same entries on another key, at the same time ?

For instance, I created this playground: Mongo playground
Here, I want to group entries by the user and bucket the sizes for each user, for instance with the boundaries [0; 10000; 100000], and get the number of entries in those slices.

For instance, here is what I expect the result to look like: Mongo playground

I’m not sure if it’s doable, I feel like it is using $facet, but not sure… The issue I have is that if I simply use a facet with a group on user, and do a bucket by size, the group will only a single accumulated value, and the bucket will fail to do what I want. For a similar reason, I can’t do these in two different steps, as either the ßize or the user will be lost with the accumulations.

Thanks for the help

1 Like

Hi @Yoann_Valeri,

You can do it like this:

  • $set - create the size_range field based on each document’s size property. You can do it with $cond and $lte operators.
  • $group - now, you can group by the user and size_range properties together, and you can track the count using the $sum operator.
  • $project - map the output to make it look like your desired result.
db.collection.aggregate([
  {
    "$set": {
      "size_range": {
        "$cond": {
          "if": {
            "$lte": [
              "$size",
              10000
            ]
          },
          "then": [
            0,
            10000
          ],
          "else": {
            "$cond": {
              "if": {
                "$lte": [
                  "$size",
                  100000
                ]
              },
              "then": [
                10000,
                100000
              ],
              "else": [
                100000,
                "+inf"
              ]
            }
          }
        }
      }
    }
  },
  {
    "$group": {
      "_id": {
        user: "$user",
        size: "$size_range"
      },
      "count": {
        "$sum": 1
      }
    }
  },
  {
    "$project": {
      "user": "$_id.user",
      "size": "$_id.size",
      "count": "$count",
      "_id": 0
    }
  }
])

Working example

1 Like

Thanks for the help, turns out I don’t even need to use buckets at all then…
However, is there a simpler way to do this if you have more boundaries to take into account ? For instance, for the use case I have, there might be up to 10 boundaries, and writing all the conditions that way seems impossible for me…

I suppose a for loop might be usable if you define in an earlier set all the boundaries, but if you didn’t do it here, I suppose there’s something I’m missing.