$percentile (aggregation)
Definition
$percentile
New in version 7.0.
Returns an array of scalar values that correspond to specified percentile values.
You can use
$percentile
as an accumulator in the$group
stage or as an aggegation expression.
Syntax
The syntax for $percentile
is:
{ $percentile: { input: <expression>, p: [ <expression1>, <expression2>, ... ], method: <string> } }
Command Fields
$percentile
takes the following fields:
Field | Type | Necessity | Description |
---|---|---|---|
| Expression | Required |
|
| Expression | Required |
|
| String | Required | The method that |
Behavior
You can use $percentile
in:
$group
stages as an accumulator$setWindowFields
stages as an accumulator$project
stages as an aggregation expression
$percentile
has the following characteristics as an accumulator, it:
Calculates a single result for all the documents in the stage.
Uses the t-digest algorithm to calculate approximate, percentile based metrics.
Uses approximate methods to scale to large volumes of data.
$percentile
has the following characteristics as an aggregation
expression, it:
Accepts an array as input
Calculates a separate result for each input document
Type of Operation
In a $group
stage, $percentile
is an accumulator and calculates
a value for all documents in the window.
In a $project
stage, $percentile
is an aggregation expression and
calculates values for each document.
In $setWindowFields
stages, $percentile
returns a result
for each document like an aggregation expression, but the results are
computed over groups of documents like an accumulator.
Calculation Considerations
In $group
stages, $percentile
always uses an approximate
calculation method.
In $project
stages, $percentile
might use the discrete
calculation method even when the approximate method is specified.
In $setWindowFields
stages, the workload determines the calculation
method that $percentile
uses.
The computed percentiles $percentile
returns might vary, even on the
same datasets. This is because the algorithm calculates approximate
values.
Duplicate samples can cause ambiguity. If there are a large number of duplicates, the percentile values may not represent the actual sample distribution. Consider a data set where all the samples are the same. All of the values in the data set fall at or below any percentile. A "50th percentile" value would actually represent either 0 or 100 percent of the samples.
$percentile
returns the minimum value for p = 0.0
.
$percentile
returns the maximum value for p = 1.0
.
Array Input
If you use $percentile
as an aggregation expression in a
$project
stage, you can use an array as input.
The syntax is:
{ $percentile: { input: [ <expression1, <expression2>, .., <expressionN> ], p: [ <expression1>, <expression2>, ... ], method: <string> } }
Window Functions
A window function lets you calculate results over a moving "window" of
neighboring documents. As each document passes though the pipeline, the
$setWindowFields
stage:
Recomputes the set of documents in the current window
calculates a value for all documents in the set
returns a single value for that document
You can use $percentile
in a $setWindowFields
stage to calculate
rolling statistics for time series or
other related data.
When you use $percentile
in a $setWindowField
stage, the
input
value must be a field name. If you enter an array instead of a
field name, the operation fails.
Examples
The following examples use the testScores
collection. Create the
collection:
db.testScores.insertMany( [ { studentId: "2345", test01: 62, test02: 81, test03: 80 }, { studentId: "2356", test01: 60, test02: 83, test03: 79 }, { studentId: "2358", test01: 67, test02: 82, test03: 78 }, { studentId: "2367", test01: 64, test02: 72, test03: 77 }, { studentId: "2369", test01: 60, test02: 53, test03: 72 } ] )
Calculate a Single Value as an Accumulator
Create an accumulator that calculates a single percentile value:
db.testScores.aggregate( [ { $group: { _id: null, test01_percentiles: { $percentile: { input: "$test01", p: [ 0.95 ], method: 'approximate' } }, } } ] )
Output:
{ _id: null, test01_percentiles: [ 67 ] }
The _id
field value is null
so $group
selects all the
documents in the collection.
The percentile
accumulator takes its input data from the test01
field.
In this example, the percentiles array, p
, has one value so the
$percentile
operator only calculates one term for the test01
data. The 95th percentile value is 67
.
Calculate Multiple Values as an Accumulator
Create an accumulator that calculates multiple percentile values:
db.testScores.aggregate( [ { $group: { _id: null, test01_percentiles: { $percentile: { input: "$test01", p: [ 0.5, 0.75, 0.9, 0.95 ], method: 'approximate' } }, test02_percentiles: { $percentile: { input: "$test02", p: [ 0.5, 0.75, 0.9, 0.95 ], method: 'approximate' } }, test03_percentiles: { $percentile: { input: "$test03", p: [ 0.5, 0.75, 0.9, 0.95 ], method: 'approximate' } }, test03_percent_alt: { $percentile: { input: "$test03", p: [ 0.9, 0.5, 0.75, 0.95 ], method: 'approximate' } }, } } ] )
Output:
{ _id: null, test01_percentiles: [ 62, 64, 67, 67 ], test02_percentiles: [ 81, 82, 83, 83 ], test03_percentiles: [ 78, 79, 80, 80 ], test03_percent_alt: [ 80, 78, 79, 80 ] }
The _id
field value is null
so $group
selects all the
documents in the collection.
The percentile
accumulator calculates values for three fields,
test01
, test02
, and test03
.
The accumulator calculates the 50th, 75th, 90th, and 95th percentile values for each input field.
The percentile values are returned in the same order as the elements of
p
. The values in test03_percentiles
and test03_percent_alt
are the same, but their order is different. The order of elements in
each result array matches the corresponding order of elements in p
.
Use $percentile
in a $project
Stage
In a $project
stage, $percentile
is an aggregation expression and
calculates values for each document.
You can use a field name or an array as input in a $project
stage.
db.testScores.aggregate( [ { $project: { _id: 0, studentId: 1, testPercentiles: { $percentile: { input: [ "$test01", "$test02", "$test03" ], p: [ 0.5, 0.95 ], method: 'approximate' } } } } ] )
Output:
{ studentId: '2345', testPercentiles: [ 80, 81 ] }, { studentId: '2356', testPercentiles: [ 79, 83 ] }, { studentId: '2358', testPercentiles: [ 78, 82 ] }, { studentId: '2367', testPercentiles: [ 72, 77 ] }, { studentId: '2369', testPercentiles: [ 60, 72 ] }
When $percentile
is an aggregation expression there is a result for
each studentId
.
Use $percentile
in a $setWindowField
Stage
To base your percentile values on local data trends, use $percentile
in a $setWindowField
aggregation pipeline stage.
This example creates a window to filter scores:
db.testScores.aggregate( [ { $setWindowFields: { sortBy: { test01: 1 }, output: { test01_95percentile: { $percentile: { input: "$test01", p: [ 0.95 ], method: 'approximate' }, window: { range: [ -3, 3 ] } } } } }, { $project: { _id: 0, studentId: 1, test01_95percentile: 1 } } ] )
Output:
{ studentId: '2356', test01_95percentile: [ 62 ] }, { studentId: '2369', test01_95percentile: [ 62 ] }, { studentId: '2345', test01_95percentile: [ 64 ] }, { studentId: '2367', test01_95percentile: [ 67 ] }, { studentId: '2358', test01_95percentile: [ 67 ] }
In this example, the percentile calculation for each document also incorporates data from the three documents before and after it.
Learn More
The $median
operator is a special case of the
$percentile
operator that uses a fixed value of p: [ 0.5 ]
.
For more information on window functions, see:
$setWindowFields
.