MongoDB C# Aggregation Pipeline Basics

Markus Wildgruber5 min read • Published Oct 11, 2024 • Updated Oct 11, 2024

.NET C#

Rate this tutorial

While basic CRUD statements like find, insert, update, and delete can take you a long way when building your application, sooner or later, users will want to look at data in different forms. As an example, if you have lots of data in a time series collection, calculating key performance indicators helps users derive meaningful conclusions from the data, e.g.:

What was the average value for the sensor yesterday?
What was the range of values in the last month?
How many readings were received in the last hour?

This is where aggregation operations come in and help to transform and summarize data so that they present the information in a way that is easy to grasp and answers questions without having to look at the individual values. For instance, aggregations can be used to calculate KPIs, to group data, in paging operations, and many other scenarios.

This tutorial puts its focus on how to use aggregations with the MongoDB C# driver; if you want to learn about aggregation operations in depth, please have a look at the excellent e-book Practical MongoDB Aggregations.

You can run aggregations on the fly when querying your data. In fact, when using the MongoDB C# driver, you might run aggregations without even noticing as the driver often uses aggregations under the hood instead of plain find commands.

Especially for complex operations, it is common practice to run aggregations in the background and store the data in a collection beforehand. This can be achieved by terminating the pipeline with a $merge or $out stage that writes the aggregation result into a collection.

This pattern is called on-demand materialized views. The complex pipeline is run seldom whereas the more common read requests do not put too much load on the server. The data in the on-demand materialized view can be updated periodically or in reaction to data changes.

When setting up an aggregation pipeline, MongoDB Compass is a good starting point. You can open the collection in your development cluster and put together an aggregation pipeline in the graphical user interface. While the graphical user interface offers more guidance, there also is a text-based aggregation pipeline editor that can be used by more experienced developers.

Samples used in this tutorial

This tutorial shows several ways you can run aggregations from C# code. The samples use the sample_mflix database so that you can easily try out the code on your own MongoDB Atlas cluster. Please see the Get Started section in the MongoDB Atlas documentation on how to deploy up a free cluster and load the sample dataset.

The sample database contains the movies collection with a document structure similar to this:

1 {
2   "_id": ObjectId("573a139af29313caabcef0ad"),
3   "imdb": {
4     "rating": 8.2,
5     "votes": 297933,
6     "id": 112641
7   },
8   "year": 1995,
9   "title": "Casino",
10   "cast": [
11     "Robert De Niro",
12     "Sharon Stone",
13     "Joe Pesci",
14     "James Woods"
15   ]
16 }

In our example, we want to filter by an actor, group by the year, and order the documents by the average rating of the movies of the year. This can be achieved by using the following aggregation pipeline:

1 [
2   {
3     $match: {
4       cast: "Robert De Niro"
5     }
6   },
7   {
8     $group: {
9       _id: "$year",
10       rating: { $avg: "$imdb.rating" }
11     }
12   },
13   {
14     $sort: {
15       rating: -1
16     }
17   }
18 ]

When running the pipeline in MongoDB Compass, we receive the following result:

1 [
2   {
3     "_id": 1974,
4     "rating": 9.1
5   },
6   {
7     "_id": 1980,
8     "rating": 8.3
9   },
10   {
11     "_id": 1995,
12     "rating": 8.25
13   },
14   {
15     "_id": 1990,
16     "rating": 8.25
17   },
18 // ...
19 ]

To support the aggregation, we create the following POCOs in C#:

1 [BsonIgnoreExtraElements]
2 [BsonNoId]
3 public class Movie
4 {
5     [BsonElement("title")]
6     public required string Title { get; set; }
7 
8     [BsonElement("year")]
9     public required int Year { get; set; }
10 
11     [BsonElement("cast")]
12     public List<string> Cast { get; set; } = new();
13 
14     [BsonElement("imdb")]
15     public Imdb Imdb { get; set; } = new();
16 }
17 
18 [BsonIgnoreExtraElements]
19 public class Imdb
20 {
21     [BsonElement("rating")]
22     public double Rating { get; set; }
23 }
24 
25 public class RatingByYear
26 {
27     public int Year { get; set; }
28 
29     public double AvgRating { get; set; }
30 }

Aggregation Methods for IMongoCollection

The basic and most powerful way to run an aggregation pipeline in C# is to use the AggregateAsync method of IMongoCollection<T>. This method takes a pipeline definition as its most important input parameter and returns a cursor. For our sample, we could depict the aggregation pipeline like this:

1 var pipeline = new EmptyPipelineDefinition<Movie>()
2     .Match(x => x.Cast.Contains("Robert De Niro"))
3     .Group(
4         x => x.Year, 
5         x => new RatingByYear() 
6         { 
7             Year = x.Key, 
8             AvgRating = x.Average(y => y.Imdb.Rating)
9         })
10     .Sort(Builders<RatingByYear>.Sort.Descending(x => x.AvgRating));
11 var result = await (await movies.AggregateAsync(pipeline)).ToListAsync();

First, we define the pipeline and add the necessary stages to the pipeline. As you can see in the code above, you use lambda expressions or the Builder<T> that you might know from putting together CRUD statements. After running the aggregation, we use ToListAsync to store the aggregation result in a list.

For our sample, the basic stages $match, $group, and $sort are sufficient, but there is a wide variety of methods that you can use to set up complex aggregation pipelines. If there is no explicit method for a specific pipeline stage, you can use the AppendStage method to append a stage that is defined as BsonDocument. We will have a closer look at this method in a follow-up to this tutorial.

Fluent interface

The previous sample made use of a fluent interface when defining the aggregation pipeline. In addition to AggregateAsync, IMongoCollection<T> also offers an Aggregate method that is the starting point for the fluent aggregation interface:

1 var result = await movies
2     .Aggregate()
3     .Match(x => x.Cast.Contains("Robert De Niro"))
4     .Group(
5         x => x.Year,
6         x => new RatingByYear()
7         {
8             Year = x.Key,
9             AvgRating = x.Average(y => y.Imdb.Rating)
10         })
11     .Sort(Builders<RatingByYear>.Sort.Descending(x => x.AvgRating))
12     .ToListAsync();

This shortens the code above by some lines but basically offers the same functionality as defining the pipeline manually.

Using LINQ

The MongoDB C# driver offers a powerful LINQ provider that is able to transform LINQ statements into a MongoDB aggregation pipeline. This way, developers can use LINQ statements in their code as they are used to; an aggregation pipeline is created under the hood and executed against the database when the results are enumerated:

1 var result = movies
2     .AsQueryable()
3     .Where(x => x.Cast.Contains("Robert De Niro"))
4     .GroupBy(x => x.Year)
5     .Select(x => new RatingByYear()
6     {
7         Year = x.Key,
8         AvgRating = x.Average(y => y.Imdb.Rating)
9     })
10     .OrderByDescending(x => x.AvgRating)
11     .ToList();

Or alternatively in query syntax:

1 var result = (from m in movies.AsQueryable()
2              where m.Cast.Contains("Robert De Niro")
3              group m by m.Year into grp
4              select new RatingByYear()
5              {
6                  Year = grp.Key,
7                  AvgRating = grp.Average(y => y.Imdb.Rating)
8              }
9              into x
10              orderby x.AvgRating descending
11              select x)
12              .ToList();

While there is still a chance that a specific LINQ query cannot be transformed into an aggregation pipeline, these cases have become very rare with LINQ provider v3 and often can be solved by restructuring the LINQ statement. For other cases, the provider is maintained by MongoDB and extended regularly if there is something missing to support a LINQ query.

Comparison of the methods

As you can see, it is very easy to set up and run aggregation pipelines with the MongoDB C# driver. In fact, there are a variety of ways to achieve this goal. Which method is the best to use in your project?

LINQ is widely used in C# projects and is a query technique that developers learn very early on their journey with C#. Using the LINQ-based approach enables developers to benefit from the power of aggregation pipelines without deep MongoDB knowledge.

Of course, this abstraction also means less control over the aggregation pipelines; especially in complex scenarios, using the methods of IMongoCollection<T> offers the flexibility to put the pipeline together so that it fits the purpose perfectly.

Which method do you use and prefer? Let us know in the MongoDB Developer Community Forums!

Top Comments in Forums

There are no comments on this article yet.

Start the Conversation

Rate this tutorial

Tutorial

Designing a Strategy to Develop a Game with Unity and MongoDB

Apr 02, 2024 | 7 min read

Quickstart

Build Your First .NET Core Application with MongoDB Atlas

Jun 04, 2024 | 6 min read

Article

The C# Driver Version 3.0 is Here! What Do You Need to Know?

Oct 18, 2024 | 5 min read