$densify (aggregation)

項目一覧

定義
構文
行動と限界
例

定義

$densify

バージョン 5.1 で追加。

フィールド内の特定の値が欠落しているドキュメントのシーケンス内に新しいドキュメントを作成します。

You can use $densify to:

Fill gaps in time series data.
Add missing values between groups of data.
Populate your data with a specified range of values.

構文

$densifyステージの構文は次のとおりです。

{
   $densify: {
      field: <fieldName>,
      partitionByFields: [ <field 1>, <field 2> ... <field n> ],
      range: {
         step: <number>,
         unit: <time unit>,
         bounds: < "full" || "partition" > || [ < lower bound >, < upper bound > ]
      }
   }
}

$densifyステージは次のフィールドを持つドキュメントを取得します。

フィールド	必要性	説明
フィールド	必須	The field to densify. The values of the specified `field` must either be all numeric values or all dates. Documents that do not contain the specified `field` continue through the pipeline unmodified. `<field>` を埋め込みドキュメントまたは配列で指定するには、ドット表記を使用します。 For restrictions, see `field` 制限事項.
partitionByFields	任意	The set of fields to act as the compound key to group the documents. In the `$densify` stage, each group of documents is known as a partition. If you omit this field, `$densify` uses one partition for the entire collection. For an example, see Densifiction with Partitions. For restrictions, see `partitionByFields` 制限事項.
範囲	必須	An object that specifies how the data is densified.
range.bounds	必須	You can specify `range.bounds` as either: An array: `[ < lower bound >, < upper bound > ]`, A string: either `"full"` or `"partition"`. If `bounds` is an array: `$densify` adds documents spanning the range of values within the specified bounds. The data type for the bounds must correspond to the data type in the フィールド being densified. For behavior details, see `range.bounds` 動作. `bounds`が`"full"`の場合 `$densify` adds documents spanning the full range of values of the `field` being densified. `bounds`が`"partition"`の場合 `$densify` adds documents to each partition, similar to if you had run a `full` range densification on each partition individually.
range.step	必須	The amount to increment the フィールド value in each document. `$densify` creates a new document for each `step` between the existing documents. If range.unit is specified, `step` must be an integer. Otherwise, `step` can be any numeric value.
range.unit	Required if フィールド is a date.	The unit to apply to the step field when incrementing date values in フィールド. You can specify one of the following values for `unit` as a string: `millisecond` `second` `minute` `hour` `day` `week` `month` `quarter` `year` For an example, see Densify Time Series Data.

行動と限界

`field` 制限事項

For documents that contain the specified フィールド, $densify errors if:

Any document in the collection has a field value of type date and the 単位 field is not specified.
Any document in the collection has a field value of type numeric and the 単位 field is specified.
The field name begins with $. You must rename the field if you want to densify it. To rename fields, use $project.

`partitionByFields` 制限事項

$densify errors if any field name in the partitionByFields array:

非 string 値として評価されます。
$から始まります。

`range.bounds` 動作

If range.bounds is an array:

The lower bound indicates the start value for the added documents, irrespective of documents already in the collection.
The lower bound is inclusive.
The upper bound is exclusive.
$densify does not filter out documents with フィールド values outside of the specified bounds.

注意

Starting in MongoDB 8.0, $densify treats bounds with an equal lower and upper bound as an empty set and does not generate a document with the bound as the field value.

In prior versions, $densify treats bounds with an equal lower and upper bound as a closed interval and generates a document with the bound value as a field value if the collection does not already contain a document with the bound value.

For example, a range.bounds of [10, 10] generates an extra document with field value 10 in versions prior to 8.0, but does not generate such a document in 8.0 and later.

Order of Output

$densify does not guarantee sort order of the documents it outputs.

To guarantee sort order, use $sort on the field you want to sort by.

例

Densify Time Series Data

Create a weather collection that contains temperature readings over four hour intervals.

db.weather.insertMany( [
   {
       "metadata": { "sensorId": 5578, "type": "temperature" },
       "timestamp": ISODate("2021-05-18T00:00:00.000Z"),
       "temp": 12
   },
   {
       "metadata": { "sensorId": 5578, "type": "temperature" },
       "timestamp": ISODate("2021-05-18T04:00:00.000Z"),
       "temp": 11
   },
   {
       "metadata": { "sensorId": 5578, "type": "temperature" },
       "timestamp": ISODate("2021-05-18T08:00:00.000Z"),
       "temp": 11
   },
   {
       "metadata": { "sensorId": 5578, "type": "temperature" },
       "timestamp": ISODate("2021-05-18T12:00:00.000Z"),
       "temp": 12
   }
] )

This example uses the $densify stage to fill in the gaps between the four-hour intervals to achieve hourly granularity for the data points:

db.weather.aggregate( [
   {
      $densify: {
         field: "timestamp",
         range: {
            step: 1,
            unit: "hour",
            bounds:[ ISODate("2021-05-18T00:00:00.000Z"), ISODate("2021-05-18T08:00:00.000Z") ]
         }
      }
   }
] )

この例では、次のことが行われます。

The $densify stage fills in the gaps of time in between the recorded temperatures.
- field: "timestamp" densifies the timestamp field.
- range:
  - step: 1 increments the timestamp field by 1 unit.
  - unit: hour densifies the timestamp field by the hour.
  - bounds: [ ISODate("2021-05-18T00:00:00.000Z"), ISODate("2021-05-18T08:00:00.000Z") ] sets the range of time that is densified.

In the following output, the $densify stage fills in the gaps of time between the hours of 00:00:00 and 08:00:00.

[
  {
    _id: ObjectId("618c207c63056cfad0ca4309"),
    metadata: { sensorId: 5578, type: 'temperature' },
    timestamp: ISODate("2021-05-18T00:00:00.000Z"),
    temp: 12
  },
  { timestamp: ISODate("2021-05-18T01:00:00.000Z") },
  { timestamp: ISODate("2021-05-18T02:00:00.000Z") },
  { timestamp: ISODate("2021-05-18T03:00:00.000Z") },
  {
    _id: ObjectId("618c207c63056cfad0ca430a"),
    metadata: { sensorId: 5578, type: 'temperature' },
    timestamp: ISODate("2021-05-18T04:00:00.000Z"),
    temp: 11
  },
  { timestamp: ISODate("2021-05-18T05:00:00.000Z") },
  { timestamp: ISODate("2021-05-18T06:00:00.000Z") },
  { timestamp: ISODate("2021-05-18T07:00:00.000Z") },
  {
    _id: ObjectId("618c207c63056cfad0ca430b"),
    metadata: { sensorId: 5578, type: 'temperature' },
    timestamp: ISODate("2021-05-18T08:00:00.000Z"),
    temp: 11
  }
  {
    _id: ObjectId("618c207c63056cfad0ca430c"),
    metadata: { sensorId: 5578, type: 'temperature' },
    timestamp: ISODate("2021-05-18T12:00:00.000Z"),
    temp: 12
  }
]

Densifiction with Partitions

Create a coffee collection that contains data for two varieties of coffee beans:

db.coffee.insertMany( [
   {
      "altitude": 600,
      "variety": "Arabica Typica",
      "score": 68.3
   },
   {
      "altitude": 750,
      "variety": "Arabica Typica",
      "score": 69.5
   },
   {
      "altitude": 950,
      "variety": "Arabica Typica",
      "score": 70.5
   },
   {
      "altitude": 1250,
      "variety": "Gesha",
      "score": 88.15
   },
   {
     "altitude": 1700,
     "variety": "Gesha",
     "score": 95.5,
     "price": 1029
   }
] )

Densify the Full Range of Values

This example uses $densify to densify the altitude field for each coffee variety:

db.coffee.aggregate( [
   {
      $densify: {
         field: "altitude",
         partitionByFields: [ "variety" ],
         range: {
            bounds: "full",
            step: 200
         }
      }
   }
] )

The example aggregation:

Partitions the documents by variety to create one grouping for Arabica Typica and one for Gesha coffee.
Specifies a full range, meaning that the data is densified across the full range of existing documents for each partition.
Specifies a step of 200, meaning new documents are created at altitude intervals of 200.

The aggregation outputs the following documents:

[
   {
     _id: ObjectId("618c031814fbe03334480475"),
     altitude: 600,
     variety: 'Arabica Typica',
     score: 68.3
   },
   {
     _id: ObjectId("618c031814fbe03334480476"),
     altitude: 750,
     variety: 'Arabica Typica',
     score: 69.5
   },
   { variety: 'Arabica Typica', altitude: 800 },
   {
     _id: ObjectId("618c031814fbe03334480477"),
     altitude: 950,
     variety: 'Arabica Typica',
     score: 70.5
   },
   { variety: 'Gesha', altitude: 600 },
   { variety: 'Gesha', altitude: 800 },
   { variety: 'Gesha', altitude: 1000 },
   { variety: 'Gesha', altitude: 1200 },
   {
     _id: ObjectId("618c031814fbe03334480478"),
     altitude: 1250,
     variety: 'Gesha',
     score: 88.15
   },
   { variety: 'Gesha', altitude: 1400 },
   { variety: 'Gesha', altitude: 1600 },
   {
     _id: ObjectId("618c031814fbe03334480479"),
     altitude: 1700,
     variety: 'Gesha',
     score: 95.5,
     price: 1029
   },
   { variety: 'Arabica Typica', altitude: 1000 },
   { variety: 'Arabica Typica', altitude: 1200 },
   { variety: 'Arabica Typica', altitude: 1400 },
   { variety: 'Arabica Typica', altitude: 1600 }
 ]

This image visualizes the documents created with $densify:

クリックして拡大します

The darker squares represent the original documents in the collection.
The lighter squares represent the documents created with $densify.

Densify Values within Each Partition

This example uses $densify to only densify gaps in the altitude field within each variety:

db.coffee.aggregate( [
   {
      $densify: {
         field: "altitude",
         partitionByFields: [ "variety" ],
         range: {
            bounds: "partition",
            step: 200
         }
      }
   }
] )