Aggregation Pipeline
On this page
An aggregation pipeline consists of one or more stages that process documents:
Each stage performs an operation on the input documents. For example, a stage can filter documents, group documents, and calculate values.
The documents that are output from one stage are input to the next stage.
An aggregation pipeline can return results for groups of documents. For example, return the total, average, maximum, and minimum values.
Starting in MongoDB 4.2, you can update documents with an aggregation pipeline if you use the stages shown in Updates with Aggregation Pipeline.
Note
Aggregation pipelines run with the
db.collection.aggregate()
method do not modify documents in
a collection, unless the pipeline contains a $merge
or
$out
stage.
You can run aggregation pipelines in the UI for deployments hosted in MongoDB Atlas.
When you run aggregation pipelines on MongoDB Atlas deployments in the MongoDB Atlas UI, you can preview the results at each stage.
Complete Aggregation Pipeline Example
Create the following collection that contains orders for products:
db.orders.insertMany( [ { _id: 0, productName: "Steel beam", status: "new", quantity: 10 }, { _id: 1, productName: "Steel beam", status: "urgent", quantity: 20 }, { _id: 2, productName: "Steel beam", status: "urgent", quantity: 30 }, { _id: 3, productName: "Iron rod", status: "new", quantity: 15 }, { _id: 4, productName: "Iron rod", status: "urgent", quantity: 50 }, { _id: 5, productName: "Iron rod", status: "urgent", quantity: 10 } ] )
The following aggregation pipeline example contains two stages and returns the total quantity of urgent orders for each product:
db.orders.aggregate( [ { $match: { status: "urgent" } }, { $group: { _id: "$productName", sumQuantity: { $sum: "$quantity" } } } ] )
The $match
stage:
Filters the documents to those with a
status
ofurgent
.Outputs the filtered documents to the
$group
stage.
The $group
stage:
Groups the input documents by
productName
.Uses
$sum
to calculate the totalquantity
for eachproductName
, which is stored in thesumQuantity
field returned by the aggregation pipeline.
Example output:
[ { _id: 'Steel beam', sumQuantity: 50 }, { _id: 'Iron rod', sumQuantity: 60 } ]
Tip
See also:
MongoDB provides the db.collection.aggregate()
method in the
mongo
shell and the aggregate
command to
run the aggregation pipeline.
Aggregation Pipeline Stages
An aggregation pipeline consists of one or more stages that process documents:
Each stage transforms the documents as they pass through the pipeline.
A stage does not have to output one document for every input document. For example, some stages may produce new documents or filter out documents.
The same stage can appear multiple times in the pipeline with these stage exceptions:
$out
,$merge
, and$geoNear
.For all available stages, see Aggregation Pipeline Stages.
Run an Aggregation Pipeline
To run an aggregation pipeline, use:
Update Documents Using an Aggregation Pipeline
Starting in MongoDB 4.2, you can use the aggregation pipeline to update documents using these methods:
Pipeline Expressions
Some pipeline stages accept a pipeline expression as the operand. Pipeline expressions specify the transformation to apply to the input documents. Expressions have a document structure and can contain other expression.
Pipeline expressions can only operate on the current document in the pipeline and cannot refer to data from other documents: expression operations provide in-memory transformation of documents.
Generally, expressions are stateless and are only evaluated when seen by the aggregation process with one exception: accumulator expressions.
The accumulators, used in the $group
stage, maintain their
state (for example, totals, maximums, minimums, and related data) as
documents progress through the pipeline. Some accumulators are available
in the $project
stage; however, when used in the
$project
stage, the accumulators do not maintain their state
across documents.
Starting in version 4.4, MongoDB provides the $accumulator
and
$function
aggregation operators. These operators provide
users with the ability to define custom aggregation expressions in
JavaScript.
For more information on expressions, see Expressions.
Aggregation Pipeline Behavior
In MongoDB, the aggregate
command operates on a single
collection, logically passing the entire collection into the
aggregation pipeline. To optimize the operation, wherever possible, use
the following strategies to avoid scanning the entire collection.
Pipeline Operators and Indexes
MongoDB's query planner analyzes an aggregation pipeline to determine whether indexes can be used to improve pipeline performance. For example, the following pipeline stages can take advantage of indexes:
Note
The following pipeline stages do not represent a complete list of all stages which can use an index.
$match
- The
$match
stage can use an index to filter documents if it occurs at the beginning of a pipeline. $sort
- The
$sort
stage can use an index as long as it is not preceded by a$project
,$unwind
, or$group
stage. $group
The
$group
stage can sometimes use an index to find the first document in each group if all of the following criteria are met:The
$group
stage is preceded by a$sort
stage that sorts the field to group by,There is an index on the grouped field which matches the sort order and
See Optimization to Return the First Document of Each Group for an example.
$geoNear
- The
$geoNear
pipeline operator takes advantage of a geospatial index. When using$geoNear
, the$geoNear
pipeline operation must appear as the first stage in an aggregation pipeline.
Changed in version 3.2: Starting in MongoDB 3.2, indexes can cover an aggregation pipeline. In MongoDB 2.6 and 3.0, indexes could not cover an aggregation pipeline since even when the pipeline uses an index, aggregation still requires access to the actual documents.
Early Filtering
If your aggregation operation requires only a subset of the data in a
collection, use the $match
, $limit
, and
$skip
stages to restrict the documents that enter at the
beginning of the pipeline. When placed at the beginning of a pipeline,
$match
operations use suitable indexes to scan only
the matching documents in a collection.
Placing a $match
pipeline stage followed by a
$sort
stage at the start of the pipeline is logically
equivalent to a single query with a sort and can use an index. When
possible, place $match
operators at the beginning of the
pipeline.
Considerations
Aggregation Pipeline Limitations
An aggregation pipeline has some limitations on the value types and the result size. See Aggregation Pipeline Limits.
Aggregation Pipeline Optimization
An aggregation pipeline has an internal optimization phase that provides improved performance for certain sequences of operators. See Aggregation Pipeline Optimization.
Aggregation on Sharded Collections
An aggregation pipeline supports operations on sharded collections. See Aggregation Pipeline and Sharded Collections.
Aggregation Pipeline as an Alternative to Map-Reduce
An aggregation pipeline provides better performance and usability than a map-reduce operation.
Map-reduce operations can be rewritten using aggregation pipeline
operators, such as
$group
, $merge
, and others.
For map-reduce operations that require custom functionality, MongoDB
provides the $accumulator
and $function
aggregation operators starting in version 4.4. Use these operators to
define custom aggregation expressions in JavaScript.
For examples of aggregation pipeline alternatives to map-reduce operations, see Map-Reduce to Aggregation Pipeline and Map-Reduce Examples.