Docs Menu
Docs Home
/ / /
Rust Driver
/

Aggregation

On this page

  • Overview
  • Analogy
  • Compare Aggregation and Find Operations
  • Server Limitations
  • Examples
  • Age Insights by Genre
  • Group by Time Component
  • Calculate Popular Genres
  • Additional Information
  • API Documentation

In this guide, you can learn how to perform aggregation operations in the Rust driver.

Aggregation operations process data in your MongoDB collections based on specifications you can set in an aggregation pipeline. An aggregation pipeline consists of one or more stages. Each stage performs an operation based on its expression operators. After the driver executes the aggregation pipeline, it returns an aggregated result.

This guide includes the following sections:

  • Compare Aggregation and Find Operations describes the functionality differences between aggregation and find operations

  • Server Limitations describes the server limitations on memory usage for aggregation operations

  • Examples provides examples of aggregations for different use cases

  • Additional Information provides links to resources and API documentation for types and methods mentioned in this guide

Aggregation operations function similarly to car factories with assembly lines. The assembly lines have stations with specialized tools to perform specific tasks. For example, when building a car, the assembly line begins with the frame. Then, as the car frame moves through the assembly line, each station assembles a separate part. The result is a transformed final product, the finished car.

The assembly line represents the aggregation pipeline, the individual stations represent the aggregation stages, the specialized tools represent the expression operators, and the finished product represents the aggregated result.

The following table lists the different tasks you can perform with find operations, compared to what you can achieve with aggregation operations. The aggregation framework provides expanded functionality that allows you to transform and manipulate your data.

Find Operations
Aggregation Operations
Select certain documents to return
Select which fields to return
Sort the results
Limit the results
Count the results
Select certain documents to return
Select which fields to return
Sort the results
Limit the results
Count the results
Rename fields
Compute new fields
Summarize data
Connect and merge data sets

When performing aggregation operations, consider the following limitations:

  • Returned documents must not violate the BSON document size limit of 16 megabytes.

  • Pipeline stages have a memory limit of 100 megabytes by default. If required, you can exceed this limit by setting the allow_disk_use field in your AggregateOptions.

  • The $graphLookup operator has a strict memory limit of 100 megabytes and ignores the allow_disk_use setting.

The examples in this section use the following sample documents. Each document represents a user profile on a book review website and contains information about their name, age, genre interests, and date that they were last active on the website:

{ "name": "Sonya Mehta", "age": 23, "genre_interests": ["fiction", "mystery", "memoir"], "last_active": { "$date": "2023-05-13T00:00:00.000Z" } },
{ "name": "Selena Sun", "age": 45, "genre_interests": ["fiction", "literary", "theory"], "last_active": { "$date": "2023-05-25T00:00:00.000Z" } },
{ "name": "Carter Johnson", "age": 56, "genre_interests": ["literary", "self help"], "last_active": { "$date": "2023-05-31T00:00:00.000Z" } },
{ "name": "Rick Cortes", "age": 18, "genre_interests": ["sci-fi", "fantasy", "memoir"], "last_active": { "$date": "2023-07-01T00:00:00.000Z" } },
{ "name": "Belinda James", "age": 76, "genre_interests": ["literary", "nonfiction"], "last_active": { "$date": "2023-06-11T00:00:00.000Z" } },
{ "name": "Corey Saltz", "age": 29, "genre_interests": ["fiction", "sports", "memoir"], "last_active": { "$date": "2023-01-23T00:00:00.000Z" } },
{ "name": "John Soo", "age": 16, "genre_interests": ["fiction", "sports"], "last_active": { "$date": "2023-01-03T00:00:00.000Z" } },
{ "name": "Lisa Ray", "age": 39, "genre_interests": ["poetry", "art", "memoir"], "last_active": { "$date": "2023-05-30T00:00:00.000Z" } },
{ "name": "Kiran Murray", "age": 20, "genre_interests": ["mystery", "fantasy", "memoir"], "last_active": { "$date": "2023-01-30T00:00:00.000Z" } },
{ "name": "Beth Carson", "age": 31, "genre_interests": ["mystery", "nonfiction"], "last_active": { "$date": "2023-08-04T00:00:00.000Z" } },
{ "name": "Thalia Dorn", "age": 21, "genre_interests": ["theory", "literary", "fiction"], "last_active": { "$date": "2023-08-19T00:00:00.000Z" } },
{ "name": "Arthur Ray", "age": 66, "genre_interests": ["sci-fi", "fantasy", "fiction"], "last_active": { "$date": "2023-11-27T00:00:00.000Z" } }

The following example calculates the average, minimum, and maximum age of users interested in each genre.

The aggregation pipeline contains the following stages:

  • An $unwind stage to separate each array entry in the genre_interests field into a new document.

  • A $group stage to group documents by the value of the genre_interests field. This stage finds the average, minimum, and maximum user age by using the $avg, $min, and $max operators.

let age_pipeline = vec![
doc! { "$unwind": doc! { "path": "$genre_interests" } },
doc! { "$group": doc! {
"_id": "$genre_interests",
"avg_age": doc! { "$avg": "$age" },
"min_age": doc! { "$min": "$age" },
"max_age": doc! { "$max": "$age" }
} }
];
let mut results = my_coll.aggregate(age_pipeline).await?;
while let Some(result) = results.try_next().await? {
println!("* {:?}", result);
}
* { "_id": "memoir", "avg_age": 25.8, "min_age": 18, "max_age": 39 }
* { "_id": "sci-fi", "avg_age": 42, "min_age": 18, "max_age": 66 }
* { "_id": "fiction", "avg_age": 33.333333333333336, "min_age": 16, "max_age": 66 }
* { "_id": "nonfiction", "avg_age": 53.5, "min_age": 31, "max_age": 76 }
* { "_id": "self help", "avg_age": 56, "min_age": 56, "max_age": 56 }
* { "_id": "poetry", "avg_age": 39, "min_age": 39, "max_age": 39 }
* { "_id": "literary", "avg_age": 49.5, "min_age": 21, "max_age": 76 }
* { "_id": "fantasy", "avg_age": 34.666666666666664, "min_age": 18, "max_age": 66 }
* { "_id": "mystery", "avg_age": 24.666666666666668, "min_age": 20, "max_age": 31 }
* { "_id": "theory", "avg_age": 33, "min_age": 21, "max_age": 45 }
* { "_id": "art", "avg_age": 39, "min_age": 39, "max_age": 39 }
* { "_id": "sports", "avg_age": 22.5, "min_age": 16, "max_age": 29 }

The following example finds how many users were last active in each month.

The aggregation pipeline contains the following stages:

  • $project stage to extract the month from the last_active field as a number into the month_last_active field

  • $group stage to group documents by the month_last_active field and count the number of documents for each month

  • $sort stage to set an ascending sort on the month

let last_active_pipeline = vec![
doc! { "$project": { "month_last_active" : doc! { "$month" : "$last_active" } } },
doc! { "$group": doc! { "_id" : doc! {"month_last_active": "$month_last_active"} ,
"number" : doc! { "$sum" : 1 } } },
doc! { "$sort": { "_id.month_last_active" : 1 } }
];
let mut results = my_coll.aggregate(last_active_pipeline).await?;
while let Some(result) = results.try_next().await? {
println!("* {:?}", result);
}
* { "_id": { "month_last_active": 1 }, "number": 3 }
* { "_id": { "month_last_active": 5 }, "number": 4 }
* { "_id": { "month_last_active": 6 }, "number": 1 }
* { "_id": { "month_last_active": 7 }, "number": 1 }
* { "_id": { "month_last_active": 8 }, "number": 2 }
* { "_id": { "month_last_active": 11 }, "number": 1 }

The following example finds the three most popular genres based on how often they appear in users' interests.

The aggregation pipeline contains the following stages:

  • $unwind stage to separate each array entry in the genre_interests field into a new document

  • $group stage to group documents by the genre_interests field and count the number of documents for each genre

  • $sort stage to set a descending sort on the genre popularity

  • $limit stage to show only the first three genres

let popularity_pipeline = vec![
doc! { "$unwind" : "$genre_interests" },
doc! { "$group" : doc! { "_id" : "$genre_interests" , "number" : doc! { "$sum" : 1 } } },
doc! { "$sort" : doc! { "number" : -1 } },
doc! { "$limit": 3 }
];
let mut results = my_coll.aggregate(popularity_pipeline).await?;
while let Some(result) = results.try_next().await? {
println!("* {:?}", result);
}
* { "_id": "fiction", "number": 6 }
* { "_id": "memoir", "number": 5 }
* { "_id": "literary", "number": 4 }

To learn more about the concepts mentioned in this guide, see the following Server manual entries:

To learn more about the behavior of the aggregate() method, see the Aggregation Operations section of the Retrieve Data guide.

To learn more about sorting results within an aggregation pipeline, see the Sort Results guide.

To learn more about the methods and types mentioned in this guide, see the following API documentation:

Back

Schema Validation