How fast is MongoDB aggregation?
While our examples have been realistic and useful in the right context, they've also been relatively small. We've only used two stages in the aggregate pipeline.
This isn't the full potential of the aggregate pipeline, though—far from it.
The aggregation pipeline allows you to perform complex operations that will allow any range of insights into your collections. There are dozens of pipeline stages as well as a wide range of operations you can utilize to build most any analysis on your data you'd imagine.
While the aggregation pipeline is extremely powerful, how performant is it compared to doing these types of analytics on our own?
Let's use the example aggregation query from before:
In our MongoDB example, we're using two stages: one to add an itemsTotal field, and the other to calculate the average of itemsTotal across all documents.
To match this behavior in Node.js, we'll use Array.prototype.map and Array.prototype.reduce as relevant stand-ins:
Running each of the code snippets above against a collection of 5,000 documents yielded the following timing results:
Aggregation took 103.46ms.
Manual iteration through the cursor took 881.32ms.
That's a difference of over 8.5x! While the difference might be in milliseconds here, we're using an extremely small collection size. It's not difficult to imagine how drastic the timing differences would be if our collection held a million or more documents. Remember that an aggregation pipeline runs in the MongoDB server and can be optimized before running, while when you iterate over a cursor to process data client-side, you add a lot of latency due to fetching pages of data from that cursor. The best approach is probably a mix of both.