Docs Menu
Docs Home
/
MongoDB Manual
/ / /

$fill (aggregation)

On this page

  • Definition
  • Syntax
  • Behavior and Restrictions
  • partitionByFields Restrictions
  • linear Behavior
  • locf Behavior
  • Comparison of $fill and Aggregation Operators
  • Examples
  • Fill Missing Field Values with a Constant Value
  • Fill Missing Field Values with Linear Interpolation
  • Fill Missing Field Values Based on the Last Observed Value
  • Fill Data for Distinct Partitions
  • Indicate if a Field was Populated Using $fill
$fill

New in version 5.3.

Populates null and missing field values within documents.

You can use $fill to populate missing data points:

  • In a sequence based on surrounding values.

  • With a fixed value.

The $fill stage has this syntax:

{
$fill: {
partitionBy: <expression>,
partitionByFields: [ <field 1>, <field 2>, ... , <field n> ],
sortBy: {
<sort field 1>: <sort order>,
<sort field 2>: <sort order>,
...,
<sort field n>: <sort order>
},
output: {
<field 1>: { value: <expression> },
<field 2>: { method: <string> },
...
}
}
}

The $fill stage takes a document with these fields:

Field
Necessity
Description

partitionBy

Optional

Specifies an expression to group the documents. In the $fill stage, a group of documents is known as a partition.

If you omit partitionBy and partitionByFields, $fill uses one partition for the entire collection.

partitionBy and partitionByFields are mutually exclusive.

See an example.

Optional

Specifies an array of fields as the compound key to group the documents. In the $fill stage, each group of documents is known as a partition.

If you omit partitionBy and partitionByFields, $fill uses one partition for the entire collection.

partitionBy and partitionByFields are mutually exclusive.

See partitionByFields Restrictions.

Required if method is specified in at least one output.<field>.

Otherwise, optional.

Specifies the field or fields to sort the documents within each partition. Uses the same syntax as the $sort stage.

Required

Specifies an object containing each field for which to fill missing values. You can specify multiple fields in the output object.

The object name is the name of the field to fill. The object value specifies how the field is filled.

Required

Specifies an object indicating how to fill missing values in the target field.

The object name must be either value or method. If the name is:

$fill returns an error if any field name in the partitionByFields array:

  • Evaluates to a non-string value.

  • Begins with $.

The linear fill method fills null and missing fields using linear interpolation based on the surrounding non-null values in the sequence.

  • For each document where the field is null or missing, linearFill fills those fields in proportion to the missing value range between surrounding non-null values according to the sortBy order. To determine the values for missing fields, linearFill uses:

    • The difference of surrounding non-null values.

    • The number of null fields to fill between the surrounding values.

  • The linear method can fill multiple consecutive null values if those values are preceded and followed by non-null values according to the sortBy order.

    Example

    If a collection contains these documents:

    { index: 0, value: 0 },
    { index: 1, value: null },
    { index: 2, value: null },
    { index: 3, value: null },
    { index: 4, value: 10 }

    After using the linear fill method to fill the null values, the documents become:

    { index: 0, value: 0 },
    { index: 1, value: 2.5 },
    { index: 2, value: 5 },
    { index: 3, value: 7.5 },
    { index: 4, value: 10 }
  • null values that are not preceded and followed by non-null values remain null.

  • To use the linear fill method, you must also use the sortBy field to sort your data.

    • When using the linear fill method, $fill returns an error if there are any repeated values in the sortBy field in a single partition.

For a complete example using the linear fill method, see Fill Missing Field Values with Linear Interpolation.

locf stands for last observation carried forward.

  • If a field being filled contains both null and non-null values, locf sets the null and missing values to the field's last known non-null value according to the sortBy order.

    • If the field contains only null or missing values in a partition, locf sets the field value to null for that partition.

    • null and missing field values that appear before non-null values in the sort order remain null.

  • To use the locf fill method, you must also use the sortBy field to sort your data.

For a complete example using the locf fill method, see Fill Missing Field Values Based on the Last Observed Value.

To fill null and missing field values within a document you can use:

  • The $fill stage.

    When you use the $fill stage, the field you specify in the output is the same field used as the source data.

  • The $linearFill and $locf aggregation operators.

    When you $linearFill or $locf, you can set values for a different field than the field used as the source data.

The examples in this section show how to use $fill to fill missing values:

A shoe store maintains a dailySales collection that contains a document summarizing each day's sales. The shoe store sells these types of shoes:

  • boots

  • sandals

  • sneakers

Create the following dailySales collection:

db.dailySales.insertMany( [
{
"date": ISODate("2022-02-02"),
"bootsSold": 10,
"sandalsSold": 20,
"sneakersSold": 12
},
{
"date": ISODate("2022-02-03"),
"bootsSold": 7,
"sneakersSold": 18
},
{
"date": ISODate("2022-02-04"),
"sneakersSold": 5
}
] )

Not all of the documents in the dailySales collection contain each shoe type. If a shoe type is missing, it means there were no shoes of that type sold on the corresponding date.

The following example uses $fill to set the quantities sold to 0 for the missing shoe types for each day's sales:

db.dailySales.aggregate( [
{
$fill:
{
output:
{
"bootsSold": { value: 0 },
"sandalsSold": { value: 0 },
"sneakersSold": { value: 0 }
}
}
}
] )

In the preceding pipeline:

  • $fill fills in values for missing fields.

  • output specifies:

    • The names of the fields to fill in.

    • The value to set the filled in fields to. In this example, the output specifies a constant value of 0.

Example output:

[
{
_id: ObjectId("6202df9f394d47411658b51e"),
date: ISODate("2022-02-02T00:00:00.000Z"),
bootsSold: 10,
sandalsSold: 20,
sneakersSold: 12
},
{
_id: ObjectId("6202df9f394d47411658b51f"),
date: ISODate("2022-02-03T00:00:00.000Z"),
bootsSold: 7,
sneakersSold: 18,
sandalsSold: 0
},
{
_id: ObjectId("6202df9f394d47411658b520"),
date: ISODate("2022-02-04T00:00:00.000Z"),
sneakersSold: 5,
bootsSold: 0,
sandalsSold: 0
}
]

Create a stock collection that contains tracks a single company's stock price at hourly intervals:

db.stock.insertMany( [
{
time: ISODate("2021-03-08T09:00:00.000Z"),
price: 500
},
{
time: ISODate("2021-03-08T10:00:00.000Z"),
},
{
time: ISODate("2021-03-08T11:00:00.000Z"),
price: 515
},
{
time: ISODate("2021-03-08T12:00:00.000Z")
},
{
time: ISODate("2021-03-08T13:00:00.000Z")
},
{
time: ISODate("2021-03-08T14:00:00.000Z"),
price: 485
}
] )

The price field is missing for some of the documents in the collection.

To populate the missing price values using linear interpolation, use $fill with the linear fill method:

db.stock.aggregate( [
{
$fill:
{
sortBy: { time: 1 },
output:
{
"price": { method: "linear" }
}
}
}
] )

In the preceding pipeline:

  • $fill fills in values for missing fields.

  • sortBy: { time: 1 } sorts the documents by the time field in ascending order, from earliest to latest.

  • output specifies:

    • price as the field for which to fill in missing values.

    • { method: "linear" } as the fill method. The linear fill method fills missing price values using linear interpolation based on the surrounding price values in the sequence.

Example output:

[
{
_id: ObjectId("620ad41c394d47411658b5e9"),
time: ISODate("2021-03-08T09:00:00.000Z"),
price: 500
},
{
_id: ObjectId("620ad41c394d47411658b5ea"),
time: ISODate("2021-03-08T10:00:00.000Z"),
price: 507.5
},
{
_id: ObjectId("620ad41c394d47411658b5eb"),
time: ISODate("2021-03-08T11:00:00.000Z"),
price: 515
},
{
_id: ObjectId("620ad41c394d47411658b5ec"),
time: ISODate("2021-03-08T12:00:00.000Z"),
price: 505
},
{
_id: ObjectId("620ad41c394d47411658b5ed"),
time: ISODate("2021-03-08T13:00:00.000Z"),
price: 495
},
{
_id: ObjectId("620ad41c394d47411658b5ee"),
time: ISODate("2021-03-08T14:00:00.000Z"),
price: 485
}
]

Create a restaurantReviews collection that contains review scores for a single restaurant over time:

db.restaurantReviews.insertMany( [
{
date: ISODate("2021-03-08"),
score: 90
},
{
date: ISODate("2021-03-09"),
score: 92
},
{
date: ISODate("2021-03-10")
},
{
date: ISODate("2021-03-11")
},
{
date: ISODate("2021-03-12"),
score: 85
},
{
date: ISODate("2021-03-13")
}
] )

The score field is missing for some of the documents in the collection.

To populate the missing score fields and ensure that there are no gaps in the data, use $fill. In the following example, $fill uses the locf fill method to fill the missing score values with the previous score in the sequence:

db.restaurantReviews.aggregate( [
{
$fill:
{
sortBy: { date: 1 },
output:
{
"score": { method: "locf" }
}
}
}
] )

In the preceding pipeline:

  • $fill fills in missing score values.

  • sortBy: { date: 1 } sorts the documents by the date field in ascending order, from earliest to latest.

  • output specifies:

    • score as the field for which to fill in missing values.

    • { method: "locf" } as the fill method. The locf fill method fills missing score values with the last observed score in the sequence.

Example output:

[
{
_id: ObjectId("62040bc9394d47411658b553"),
date: ISODate("2021-03-08T00:00:00.000Z"),
score: 90
},
{
_id: ObjectId("62040bc9394d47411658b554"),
date: ISODate("2021-03-09T00:00:00.000Z"),
score: 92
},
{
_id: ObjectId("62040bc9394d47411658b555"),
date: ISODate("2021-03-10T00:00:00.000Z"),
score: 92
},
{
_id: ObjectId("62040bc9394d47411658b556"),
date: ISODate("2021-03-11T00:00:00.000Z"),
score: 92
},
{
_id: ObjectId("62040bc9394d47411658b557"),
date: ISODate("2021-03-12T00:00:00.000Z"),
score: 85
},
{
_id: ObjectId("62040bc9394d47411658b558"),
date: ISODate("2021-03-13T00:00:00.000Z"),
score: 85
}
]

Consider the previous example with restaurant reviews but instead of tracking a single restaurant, the collection now contains reviews for multiple restaurants.

Create a collection named restaurantReviewsMultiple and populate the collection with these documents:

db.restaurantReviewsMultiple.insertMany( [
{
date: ISODate("2021-03-08"),
restaurant: "Joe's Pizza",
score: 90
},
{
date: ISODate("2021-03-08"),
restaurant: "Sally's Deli",
score: 75
},
{
date: ISODate("2021-03-09"),
restaurant: "Joe's Pizza",
score: 92
},
{
date: ISODate("2021-03-09"),
restaurant: "Sally's Deli"
},
{
date: ISODate("2021-03-10"),
restaurant: "Joe's Pizza"
},
{
date: ISODate("2021-03-10"),
restaurant: "Sally's Deli",
score: 68
},
{
date: ISODate("2021-03-11"),
restaurant: "Joe's Pizza",
score: 93
},
{
date: ISODate("2021-03-11"),
restaurant: "Sally's Deli"
}
] )

The score field is missing for some of the documents in the collection.

To populate the missing score fields and ensure that there are no gaps in the data, use $fill. In the following example, $fill uses the locf fill method to fill the missing score values with the previous score in the sequence:

db.restaurantReviewsMultiple.aggregate( [
{
$fill:
{
sortBy: { date: 1 },
partitionBy: { "restaurant": "$restaurant" },
output:
{
"score": { method: "locf" }
}
}
}
] )

In the preceding pipeline:

  • $fill fills in missing score values.

  • sortBy: { date: 1 } sorts the documents by the date field in ascending order, from earliest to latest.

  • partitionBy: { "restaurant": "$restaurant" } partitions the data by restaurant. There are two restaurants: Joe's Pizza and Sally's Deli.

  • output specifies:

    • score as the field for which to fill in missing values.

    • { method: "locf" } as the fill method. The locf fill method fills missing score values with the last observed score in the sequence.

Example output:

[
{
_id: ObjectId("620559f4394d47411658b58f"),
date: ISODate("2021-03-08T00:00:00.000Z"),
restaurant: "Joe's Pizza",
score: 90
},
{
_id: ObjectId("620559f4394d47411658b591"),
date: ISODate("2021-03-09T00:00:00.000Z"),
restaurant: "Joe's Pizza",
score: 92
},
{
_id: ObjectId("620559f4394d47411658b593"),
date: ISODate("2021-03-10T00:00:00.000Z"),
restaurant: "Joe's Pizza",
score: 92
},
{
_id: ObjectId("620559f4394d47411658b595"),
date: ISODate("2021-03-11T00:00:00.000Z"),
restaurant: "Joe's Pizza",
score: 93
},
{
_id: ObjectId("620559f4394d47411658b590"),
date: ISODate("2021-03-08T00:00:00.000Z"),
restaurant: "Sally's Deli",
score: 75
},
{
_id: ObjectId("620559f4394d47411658b592"),
date: ISODate("2021-03-09T00:00:00.000Z"),
restaurant: "Sally's Deli",
score: 75
},
{
_id: ObjectId("620559f4394d47411658b594"),
date: ISODate("2021-03-10T00:00:00.000Z"),
restaurant: "Sally's Deli",
score: 68
},
{
_id: ObjectId("620559f4394d47411658b596"),
date: ISODate("2021-03-11T00:00:00.000Z"),
restaurant: "Sally's Deli",
score: 68
}
]

When you populate missing values, the output does not indicate if a value was populated with the $fill operator or if the value existed in the document originally. To distinguish between filled and preexisting values, you can use a $set stage before $fill and set a new field based on whether the value exists.

For example, create a restaurantReviews collection that contains review scores for a restaurant over time:

db.restaurantReviews.insertMany( [
{
date: ISODate("2021-03-08"),
score: 90
},
{
date: ISODate("2021-03-09"),
score: 92
},
{
date: ISODate("2021-03-10")
},
{
date: ISODate("2021-03-11")
},
{
date: ISODate("2021-03-12"),
score: 85
},
{
date: ISODate("2021-03-13")
}
] )

The score field is missing for some of the documents in the collection. You can populate missing score values with the $fill operator.

Create a pipeline to perform the following actions:

  • Add a new field to each document (using $set) indicating if the document's score field exists prior to the $fill operator populating values. This new field is called valueExisted.

  • Populate missing score values with the last observed score in the sequence. The fill method locf stands for "last observation carried forward".

The pipeline looks like this:

db.restaurantReviews.aggregate( [
{
$set: {
"valueExisted": {
"$ifNull": [
{ "$toBool": { "$toString": "$score" } },
false
]
}
}
},
{
$fill: {
sortBy: { date: 1 },
output:
{
"score": { method: "locf" }
}
}
}
] )

Note

Handling Values of Zero

In the $ifNull expression, the score values are converted to strings, then to booleans. The $toBool expression always converts strings to true. If the score values are not converted to strings, score values of 0 will have valueExisted set to false.

Output:

[
{
_id: ObjectId("63595116b1fac2ee2e957f15"),
date: ISODate("2021-03-08T00:00:00.000Z"),
score: 90,
valueExisted: true
},
{
_id: ObjectId("63595116b1fac2ee2e957f16"),
date: ISODate("2021-03-09T00:00:00.000Z"),
score: 92,
valueExisted: true
},
{
_id: ObjectId("63595116b1fac2ee2e957f17"),
date: ISODate("2021-03-10T00:00:00.000Z"),
valueExisted: false,
score: 92
},
{
_id: ObjectId("63595116b1fac2ee2e957f18"),
date: ISODate("2021-03-11T00:00:00.000Z"),
valueExisted: false,
score: 92
},
{
_id: ObjectId("63595116b1fac2ee2e957f19"),
date: ISODate("2021-03-12T00:00:00.000Z"),
score: 85,
valueExisted: true
},
{
_id: ObjectId("63595116b1fac2ee2e957f1a"),
date: ISODate("2021-03-13T00:00:00.000Z"),
valueExisted: false,
score: 85
}
]

Back

$facet