Docs Menu

Transform Your Data with Aggregation

In this guide, you can learn how to use the MongoDB PHP Library to perform aggregation operations.

Aggregation operations process data in your MongoDB collections and return computed results. The MongoDB Aggregation framework, which is part of the Query API, is modeled on the concept of data processing pipelines. Documents enter a pipeline that contains one or more stages, and this pipeline transforms the documents into an aggregated result.

An aggregation operation is similar to a car factory. A car factory has an assembly line, which contains assembly stations with specialized tools to do specific jobs, like drills and welders. Raw parts enter the factory, and then the assembly line transforms and assembles them into a finished product.

The aggregation pipeline is the assembly line, aggregation stages are the assembly stations, and operator expressions are the specialized tools.

You can use find operations to perform the following actions:

  • Select which documents to return

  • Select which fields to return

  • Sort the results

You can use aggregation operations to perform the following actions:

  • Run find operations

  • Rename fields

  • Calculate fields

  • Summarize data

  • Group values

Consider the following limitations when performing aggregation operations:

  • Returned documents cannot violate the BSON document size limit of 16 megabytes.

  • Pipeline stages have a memory limit of 100 megabytes by default. You can exceed this limit by creating an options array that sets the allowDiskUse option to true and passing the array to the MongoDB\Collection::aggregate() method.

    Important

    $graphLookup Exception

    The $graphLookup stage has a strict memory limit of 100 megabytes and ignores the allowDiskUse option.

The PHP library provides the following APIs to create aggregation pipelines:

  • Array API: Create aggregation pipelines by passing arrays that specify the aggregation stages.

  • Aggregation Builder: Create aggregation pipelines by using factory methods to make your application more type-safe and debuggable.

The following sections describe each API and provide examples for creating aggregation pipelines.

To perform an aggregation, pass an array containing the pipeline stages as BSON documents to the MongoDB\Collection::aggregate() method, as shown in the following code:

$pipeline = [
['<stage>' => <parameters>],
['<stage>' => <parameters>],
...
];
$cursor = $collection->aggregate($pipeline);

The examples in this section use the restaurants collection in the sample_restaurants database from the Atlas sample datasets. To learn how to create a free MongoDB Atlas cluster and load the sample datasets, see the Get Started with Atlas guide.

The following code example produces a count of the number of bakeries in each borough of New York. To do so, it uses an aggregation pipeline that contains the following stages:

  1. $match stage to filter for documents in which the cuisine field contains the value 'Bakery'

  2. $group stage to group the matching documents by the borough field, accumulating a count of documents for each distinct value

$pipeline = [
['$match' => ['cuisine' => 'Bakery']],
['$group' => ['_id' => '$borough', 'count' => ['$sum' => 1]]],
];
$cursor = $collection->aggregate($pipeline);
foreach ($cursor as $doc) {
echo json_encode($doc), PHP_EOL;
}
{"_id":"Brooklyn","count":173}
{"_id":"Queens","count":204}
{"_id":"Bronx","count":71}
{"_id":"Staten Island","count":20}
{"_id":"Missing","count":2}
{"_id":"Manhattan","count":221}

To view information about how MongoDB executes your operation, you can instruct the MongoDB query planner to explain it. When MongoDB explains an operation, it returns execution plans and performance statistics. An execution plan is a potential way in which MongoDB can complete an operation. When you instruct MongoDB to explain an operation, it returns both the plan MongoDB executed and any rejected execution plans.

To explain an aggregation operation, construct a MongoDB\Operation\Aggregate object and pass the database, collection, and pipeline stages as parameters. Then, pass the MongoDB\Operation\Aggregate object to the MongoDB\Collection::explain() method.

The following example instructs MongoDB to explain the aggregation operation from the preceding section:

$pipeline = [
['$match' => ['cuisine' => 'Bakery']],
['$group' => ['_id' => '$borough', 'count' => ['$sum' => 1]]],
];
$aggregate = new MongoDB\Operation\Aggregate(
$collection->getDatabaseName(),
$collection->getCollectionName(),
$pipeline
);
$result = $collection->explain($aggregate);
echo json_encode($result), PHP_EOL;
{"explainVersion":"2","queryPlanner":{"namespace":"sample_restaurants.restaurants",
"indexFilterSet":false,"parsedQuery":{"cuisine":{"$eq":"Bakery"}},"queryHash":"865F14C3",
"planCacheKey":"D56D6F10","optimizedPipeline":true,"maxIndexedOrSolutionsReached":false,
"maxIndexedAndSolutionsReached":false,"maxScansToExplodeReached":false,"winningPlan":{
... }

To create an aggregation pipeline by using the Aggregation Builder, perform the following actions:

  1. Create an array to store the pipeline stages.

  2. For each stage, call the a factory method from the Stage that shares the same name as your desired aggregation stage. For example, to create an $unwind stage, call the Stage::unwind() method.

  3. Within the body of the Stage method, use methods from other builder classes such as Query, Expression, or Accumulator to express your aggregation specifications.

The following code demonstrates the template for constructing aggregation pipelines:

$pipeline = [
Stage::<factory method>(
<stage specification>
),
Stage::<factory method>(
<stage specification>
),
...
];
$cursor = $collection->aggregate($pipeline);

The examples in this section are adapted from the MongoDB Server manual. Each example provides a link to the sample data that you can insert into your database to test the aggregation operation.

Tip

Operations with Builders

You can use builders to support non-aggregation operations such as find and update operations. To learn more, see the Operations with Builders guide.

This example uses the sample data given in the Calculate Count, Sum, and Average section of the $group stage reference in the Server manual.

The following code example calculates the total sales amount, average sales quantity, and sale count for each day in the year 2014. To do so, it uses an aggregation pipeline that contains the following stages:

  1. $match stage to filter for documents that contain a date field in which the year is 2014

  2. $group stage to group the documents by date and calculate the total sales amount, average sales quantity, and sale count for each group

  3. $sort stage to sort the results by the total sale amount for each group in descending order

$pipeline = [
MongoDB\Builder\Stage::match(
date: [
MongoDB\Builder\Query::gte(new MongoDB\BSON\UTCDateTime(new DateTimeImmutable('2014-01-01'))),
MongoDB\Builder\Query::lt(new MongoDB\BSON\UTCDateTime(new DateTimeImmutable('2015-01-01'))),
],
),
MongoDB\Builder\Stage::group(
_id: MongoDB\Builder\Expression::dateToString(MongoDB\Builder\Expression::dateFieldPath('date'), '%Y-%m-%d'),
totalSaleAmount: MongoDB\Builder\Accumulator::sum(
MongoDB\Builder\Expression::multiply(
MongoDB\Builder\Expression::numberFieldPath('price'),
MongoDB\Builder\Expression::numberFieldPath('quantity'),
),
),
averageQuantity: MongoDB\Builder\Accumulator::avg(
MongoDB\Builder\Expression::numberFieldPath('quantity'),
),
count: MongoDB\Builder\Accumulator::sum(1),
),
MongoDB\Builder\Stage::sort(
totalSaleAmount: MongoDB\Builder\Type\Sort::Desc,
),
];
$cursor = $collection->aggregate($pipeline);
foreach ($cursor as $doc) {
echo json_encode($doc), PHP_EOL;
}
{"_id":"2014-04-04","totalSaleAmount":{"$numberDecimal":"200"},"averageQuantity":15,"count":2}
{"_id":"2014-03-15","totalSaleAmount":{"$numberDecimal":"50"},"averageQuantity":10,"count":1}
{"_id":"2014-03-01","totalSaleAmount":{"$numberDecimal":"40"},"averageQuantity":1.5,"count":2}

This example uses the sample data given in the Unwind Embedded Arrays section of the $unwind stage reference in the Server manual.

The following code example groups sold items by their tags and calculates the total sales amount for each tag. To do so, it uses an aggregation pipeline that contains the following stages:

  1. $unwind stage to output a separate document for each element in the items array

  2. $unwind stage to output a separate document for each element in the items.tags arrays

  3. $group stage to group the documents by the tag value and calculate the total sales amount of items that have each tag

$pipeline = [
MongoDB\Builder\Stage::unwind(MongoDB\Builder\Expression::arrayFieldPath('items')),
MongoDB\Builder\Stage::unwind(MongoDB\Builder\Expression::arrayFieldPath('items.tags')),
MongoDB\Builder\Stage::group(
_id: MongoDB\Builder\Expression::fieldPath('items.tags'),
totalSalesAmount: MongoDB\Builder\Accumulator::sum(
MongoDB\Builder\Expression::multiply(
MongoDB\Builder\Expression::numberFieldPath('items.price'),
MongoDB\Builder\Expression::numberFieldPath('items.quantity'),
),
),
),
];
$cursor = $collection->aggregate($pipeline);
foreach ($cursor as $doc) {
echo json_encode($doc), PHP_EOL;
}
{"_id":"office","totalSalesAmount":{"$numberDecimal":"1019.60"}}
{"_id":"school","totalSalesAmount":{"$numberDecimal":"104.85"}}
{"_id":"stationary","totalSalesAmount":{"$numberDecimal":"264.45"}}
{"_id":"electronics","totalSalesAmount":{"$numberDecimal":"800.00"}}
{"_id":"writing","totalSalesAmount":{"$numberDecimal":"60.00"}}

This example uses the sample data given in the Perform a Single Equality Join with $lookup section of the $lookup stage reference in the Server manual.

The following code example joins the documents from the orders collection with the documents from the inventory collection by using the item field from the orders collection and the sku field from the inventory collection.

To do so, the example uses an aggregation pipeline that contains a $lookup stage that specifies the collection to retrieve data from and the local and foreign field names.

$pipeline = [
MongoDB\Builder\Stage::lookup(
from: 'inventory',
localField: 'item',
foreignField: 'sku',
as: 'inventory_docs',
),
];
/* Performs the aggregation on the orders collection */
$cursor = $collection->aggregate($pipeline);
foreach ($cursor as $doc) {
echo json_encode($doc), PHP_EOL;
}
{"_id":1,"item":"almonds","price":12,"quantity":2,"inventory_docs":[{"_id":1,"sku":"almonds","description":"product 1","instock":120}]}
{"_id":2,"item":"pecans","price":20,"quantity":1,"inventory_docs":[{"_id":4,"sku":"pecans","description":"product 4","instock":70}]}
{"_id":3,"inventory_docs":[{"_id":5,"sku":null,"description":"Incomplete"},{"_id":6}]}

To view a tutorial that uses the MongoDB PHP Library to create complex aggregation pipelines, see Complex Aggregation Pipelines with Vanilla PHP and MongoDB in the MongoDB Developer Center.

To view more examples of aggregation pipelines built by using the Aggregation Builder, see the Stage class test suite in the PHP library source code on GitHub.

To learn more about the topics discussed in this guide, see the following pages in the MongoDB Server manual:

You can perform full-text searches by using the Atlas Search feature. To learn more, see the Atlas Search guide.

You can perform similarity searches on vector embeddings by using the Atlas Vector Search feature. To learn more, see the Atlas Vector Search guide.

To learn more about the methods discussed in this guide, see the following API documentation: