$percentile（集計）

項目一覧

定義

構文
コマンドフィールド
動作
例
詳細

定義

$percentile

バージョン 7.0 で追加。

指定されたパーセンタイル値に対応するスカラー値の配列を返します。

は、 $percentileステージの$group アキュムレータとして、または集計式として使用できます。

構文

$percentileの構文は次のとおりです。

{
   $percentile: {
      input: <expression>,
      p: [ <expression1>, <expression2>, ... ],
      method: <string>
   }
}

コマンドフィールド

$percentile は、次のフィールドがあります。

フィールド	タイプ	必要性	説明
`input`	式	必須	`$percentile` は、このデータのパーセンタイル値を計算します。 `input`は、フィールド名または数値型に評価される式である必要があります。式を数値型に変換できない場合、 `$percentile`計算はそれを無視します。
`p`	式	必須	`$percentile` は、 `p`の各要素のパーセンタイル値を計算します。要素はパーセンテージを表し、 `0.0`から`1.0`までの数値として評価する必要があります。 `$percentile` は、 `p`の要素と同じ順序で結果を返します。
`method`	文字列	必須	`mongod`がパーセンタイル値を計算するのに使用する方法。メソッドは`'approximate'`である必要があります。

動作

$percentileは次の場所で使用できます。

$group アキュムレータとしてのステージ
$setWindowFields アキュムレータとしてのステージ
$project 集計式としてのステージ

$percentile は、アキュムレータとして次の特性があります。

ステージ内のすべてのドキュメントに対して単一の結果を計算します。
t ダイジェストアルゴリズムを使用して、近似パーセンタイルベースのメトリクスを計算します。
近似メソッドを使用して、大規模なデータをスケーリングします。

$percentile 集計式として次の特性があります。

入力として配列を受け入れ
入力ドキュメントごとに個別の結果を計算

操作タイプ

$groupステージでは、 $percentileはアキュムレータで、ウィンドウ内のすべてのドキュメントの値を計算します。

$projectステージでは、 $percentileは集計式であり、各ドキュメントの値を計算します。

$setWindowFieldsステージでは、 $percentileは集計式のように各ドキュメントの結果を返しますが、結果はアキュムレータのようにドキュメントをグループ化して計算されます。

計算に関する考慮事項

$groupステージでは、 $percentileは常に近似計算方法を使用します。

$projectステージでは、近似方法が指定されている場合でも、 $percentileは離散計算方法を使用する場合があります。

$setWindowFieldsステージでは、 $percentileが使用する計算方法はワークロードによって決まります。

$percentileが返す計算パーセンタイルは、同じデータセットでも異なる場合があります。これは、このアルゴリズムが近似値を計算するためです。

重複するサンプルはあいまいさを引き起こす可能性があります。重複が多数ある場合、パーセンタイル値は実際のサンプル分布を表さない可能性があります。すべてのサンプルが同じデータセットを考えてみましょう。データセット内のすべての値は、任意のパーセンタイル以下になります。「50 パーセンタイル」値は、実際にはサンプルの 0 または 100% のいずれかを表します。

$percentile はp = 0.0の最小値を返します。

$percentile はp = 1.0の最大値を返します。

配列入力

$projectステージで集計式として$percentileを使用する場合は、配列を入力として使用できます。構文は次のとおりです。

{
   $percentile: {
      input: [ <expression1, <expression2>, .., <expressionN> ],
      p: [ <expression1>, <expression2>, ... ],
      method: <string>
   }
}

ウィンドウ関数

ウィンドウ関数を使用すると、横にあるドキュメントの移動する「ウィンドウ」にわたる結果を計算できます。各ドキュメントがパイプラインを通過する際、 $setWindowFieldsステージは次のことを行います。

現在のウィンドウ内のドキュメントセットを再計算する
セット内のすべてのドキュメントの値を計算します
は、そのドキュメントの単一の値を返します

$setWindowFieldsステージで$percentileを使用して、時系列やその他の関連データのローリング統計を計算できます。

$setWindowFieldステージで$percentileを使用する場合、 input値はフィールド名である必要があります。フィールド名ではなく配列を入力すると、操作は失敗します。

例

次の例では testScores コレクションを使用します。コレクションを作成します。

db.testScores.insertMany( [
   { studentId: "2345", test01: 62, test02: 81, test03: 80 },
   { studentId: "2356", test01: 60, test02: 83, test03: 79 },
   { studentId: "2358", test01: 67, test02: 82, test03: 78 },
   { studentId: "2367", test01: 64, test02: 72, test03: 77 },
   { studentId: "2369", test01: 60, test02: 53, test03: 72 }
] )

アキュムレータとして単一の値を計算

単一のパーセンタイル値を計算するアキュムレータを作成します。

db.testScores.aggregate( [
   {
      $group: {
         _id: null,
         test01_percentiles: {
            $percentile: {
               input: "$test01",
               p: [ 0.95 ],
               method: 'approximate'
            }
         },
      }
   }
] )

出力:

{ _id: null, test01_percentiles: [ 67 ] }

_idフィールドの値はnullであるため、 $groupはコレクション内のすべてのドキュメントを選択します。

percentileアキュムレータはtest01フィールドから入力データを取得します。

この例では、パーセンタイル配列であるpの値が 1 つあるため、 $percentile演算子はtest01データに対して 1 つのタームのみを計算します。 95 パーセンタイル値は67です。

アキュムレータとして複数の値を計算

複数のパーセンタイル値を計算するアキュムレータを作成します。

db.testScores.aggregate( [
   {
       $group: {
          _id: null,
          test01_percentiles: {
             $percentile: {
                input: "$test01",
                p: [ 0.5, 0.75, 0.9, 0.95 ],
                method: 'approximate'
             }
          },
          test02_percentiles: {
             $percentile: {
                input: "$test02",
                p: [ 0.5, 0.75, 0.9, 0.95 ],
                method: 'approximate'
             }
          },
          test03_percentiles: {
             $percentile: {
                input: "$test03",
                p: [ 0.5, 0.75, 0.9, 0.95 ],
                method: 'approximate'
             }
          },
          test03_percent_alt: {
             $percentile: {
                input: "$test03",
                p: [ 0.9, 0.5, 0.75, 0.95 ],
                method: 'approximate'
             }
          },
       }
    }
] )

出力:

{
    _id: null,
   test01_percentiles: [ 62, 64, 67, 67 ],
   test02_percentiles: [ 81, 82, 83, 83 ],
   test03_percentiles: [ 78, 79, 80, 80 ],
   test03_percent_alt: [ 80, 78, 79, 80 ]
}

_idフィールドの値はnullであるため、 $groupはコレクション内のすべてのドキュメントを選択します。

percentileアキュムレータは、 test01 、 test02 、 test03の 3 つのフィールドの値を計算します。

アキュムレータは、各入力フィールドに対して 50、75、90、95 パーセンタイル値を計算します。

パーセンタイル値は、 pの要素と同じ順序で返されます。 test03_percentilesとtest03_percent_altの値は同じですが、その順序は異なります。各結果配列内の要素の順序は、 p内の要素の対応する順序と一致します。

ステージでのの使用`$percentile$project`

$projectステージでは、 $percentileは集計式であり、各ドキュメントの値を計算します。

$projectステージでは、フィールド名または配列を入力として使用できます。

db.testScores.aggregate( [
   {
      $project: {
         _id: 0,
         studentId: 1,
         testPercentiles: {
            $percentile: {
               input: [ "$test01", "$test02", "$test03" ],
               p: [ 0.5, 0.95 ],
               method: 'approximate'
            }
         }
      }
   }
] )

出力:

{ studentId: '2345', testPercentiles: [ 80, 81 ] },
{ studentId: '2356', testPercentiles: [ 79, 83 ] },
{ studentId: '2358', testPercentiles: [ 78, 82 ] },
{ studentId: '2367', testPercentiles: [ 72, 77 ] },
{ studentId: '2369', testPercentiles: [ 60, 72 ] }

$percentileが集計式の場合、各studentIdの結果が存在します。

ステージでのの使用`$percentile$setWindowField`

ローカルデータの傾向に基づいてパーセンタイル値を演算するには、 $setWindowField集計パイプラインステージで$percentileを使用します。

この例では、スコアをフィルタリングするウィンドウを作成します。

db.testScores.aggregate( [
   {
      $setWindowFields: {
         sortBy: { test01: 1 },
         output: {
            test01_95percentile: {
               $percentile: {
                  input: "$test01",
                  p: [ 0.95 ],
                  method: 'approximate'
               },
               window: {
                  range: [ -3, 3 ]
               }
            }
         }
      }
   },
   {
      $project: {
         _id: 0,
         studentId: 1,
         test01_95percentile: 1
      }
   }
] )

出力:

{ studentId: '2356', test01_95percentile: [ 62 ] },
{ studentId: '2369', test01_95percentile: [ 62 ] },
{ studentId: '2345', test01_95percentile: [ 64 ] },
{ studentId: '2367', test01_95percentile: [ 67 ] },
{ studentId: '2358', test01_95percentile: [ 67 ] }

この例では、各ドキュメントのパーセンタイル計算には、その前後の 3 つのドキュメントのデータも含まれています。

詳細

$median演算子は、固定値p: [ 0.5 ]を使用する$percentile演算子の特殊なケースです。

ウィンドウ関数の詳細については、 $setWindowFieldsを参照してください。

戻る

$or

$pow

定義

構文