Aggregating 50M of records

i have a timeseries collection whose size is expected to have 50M records, right now i’m doing a poc with 3M records in which i aggregate just using group and it takes 20 sec, how can i make it fast? Note: i have a index on source field but mongo ends up doing collscan.

Below query:

[
  {
    $group: {
      _id: "$source",
      sum: {
        $sum: 1
      }
    }
  }
]

same problem I am also getting, for me it is taking around 10 sec for 6 million records. Just a normal project and group. my actual logic is taking around a minute.
I wanted to use this for analytics but the performance is not looking great. Did you find any solution.

Try some of the things that were share in

@Ayush_Tiwari2, can you provide on any findings you got while try the proposed alternative.

Please provide some feedback.

will try this and share the findings

1 Like