Docs 主页

/ /

聚合管道阶段

Docs 主页

/ /

$bucketAuto（聚合阶段）

此版本的文档已存档，不再提供支持。要升级6.0部署，请参阅 MongoDB 7.0升级程序。

定义

$bucketAuto

根据指定的表达式，将接收到的文档归类到特定数量的群组中（称为“存储桶”）。自动确定存储桶边界，以尝试将文档均匀地分配到指定数量的存储桶中。

每个存储桶在输出中都表示为文档。每个存储桶的文档包含：

指定存储桶边界的 _id 对象。
- _id.min 字段指定存储桶的包含下限。
- _id.max 字段指定存储桶的上边界。此边界对于数列中的最后一个存储桶具有包含性，但对所有其他存储桶具有独占性。
包含存储桶中的文档数量的 count 字段。如果未指定 output 文档，则默认包含 count 字段。

$bucketAuto 阶段具有以下形式：

{
  $bucketAuto: {
      groupBy: <expression>,
      buckets: <number>,
      output: {
         <output1>: { <$accumulator expression> },
         ...
      }
      granularity: <string>
  }
}

字段

类型

说明

groupBy

表达式(expression)

用于对文档进行群组的表达式。要指定字段路径（Field Path），请在字段名称前加上美元符号$ ，并用引号括起来。

buckets

整型

一个 32 位正整数，用于指定将输入文档按组分到的存储桶的数量。

output

文档

可选。文档，它指定了除 _id 字段之外要包含在输出文档中的字段。要指定要包含的字段，必须使用累加器表达式：

<outputfield1>: { <accumulator>: <expression1> },
...

指定 output 时，输出文档不包含默认的 count 字段。明确指定 count 表达式为 output 文档的一部分，以便包含：

output: {
  <outputfield1>: { <accumulator>: <expression1> },
  ...
  count: { $sum: 1 }
}

granularity

字符串

可选。一个字符串，用于指定首选数字系列，用于确保计算出的边界边以首选舍入数字或其 10 次方结束。

仅当所有 groupBy 值都是数值且没有一个是 NaN 时可用。

支持的granularity值为：

`"R5"` `"R10"` `"R20"` `"R40"` `"R80"` `"1-2-5"`	`"E6"` `"E12"` `"E24"` `"E48"` `"E96"` `"E192"` `"POWERSOF2"`

Considerations

`$bucketAuto` 和内存限制

$bucketAuto 阶段的RAM限制为 100 MB。默认下，如果阶段超过此限制， MongoDB会自动将临时文件写入磁盘。有关详细信息，请参阅 allowDiskUseByDefault。

提示

聚合管道限制

行为

如果出现以下情况，则相应值可能小于指定的存储桶数量：

输入文档的数量小于指定的存储桶数。
groupBy 表达式的唯一值的数量小于 buckets 的指定数量。
granularity 的间隔数量少于 buckets 的数量。
granularity 不够精细，无法将文档平均分布到指定数量的 buckets。

如果 groupBy 表达式引用数组或文档，那么在确定存储桶边界之前，值的排列顺序与 $sort 中使用的顺序相同。

文档在存储桶中的均匀分布取决于 groupBy 字段的关联基数或唯一值的数量。如果关联基数不够高，$bucketAuto 阶段可能无法将结果平均分布于各个存储桶。

粒度

$bucketAuto 接受一个可选的 granularity 参数，该参数可确保所有存储桶的边界遵循指定的首选数字系列。使用首选数字系列可以更好地控制在 groupBy表达式中的值范围中设立存储桶边界的位置。当 groupBy表达式的范围呈指数扩展时，它们还可用于帮助以对数方式均匀地设立存储桶边界。

Renard 数列

Renard 数列是一组数字，其计算方法是取 10 的 5、10、20、40 或 80 次方根，然后包括与该根的各次方相对应的值，这些值介于 1.0 到 10.0 之间（在 R80 的情况下为 10.3）。

将 granularity 设置为 R5、R10、R20、R40 或 R80 以将存储桶边界限制为数列的值。当 groupBy值（R80为10.3）超出 1.0 到 10.0 范围时，该数列的值将乘以 10 的幂。

例子

R5序列基于 10 的五次方根（1.58），并包括该根的各种幂（四舍五入），直到达到 10。R5序列的推导如下：

10 ^0/5 = 1
10 ^{1 / 5} = 1.584 ~ 1.6
10 ^{2 / 5} = 2.511 ~ 2.5
10 ^{3 / 5} = 3.981 ~ 4.0
10 ^{4 / 5} = 6.309 ~ 6.3
10 ^5/5 = 10

同样的方法也适用于其他 Renard 数列，可提供更细的粒度，即 1.0 和 10.0 之间的更多间隔（R80 为 10.3）。

E 系列

E 数列与Renard^数列相似，因为它们将从1.0到10.0的区间再除以^第6 、 12 、 24 ^、 48 ^、 96^个，或具有特定相对误差的十进制的192^方根。

将 granularity 设置为 E6、E12、E24、E48、E96 或 E192，以将存储桶边界限制为数列的值。当 groupBy 值超出 1.0 到 10.0范围，数列的值将乘以 10 的幂。要学习；了解有关 E 系列及其各自相对误差的更多信息，请参阅首选数字系列。

1-2-5 系列

1-2-5 数列的行为类似于三值 Renard 数列（如果存在这样的数列数）。

将 granularity 设置为 1-2-5 可将存储桶边界限制为 10 的三次方根的各个幂（四舍五入到一位有效数字）。

例子

以下值属于 1-2-5 系列：0.1、0.2、0.5、1、2、5、10、20、50、100、200、500、1000 等等...

“2 的次方”序列

将 granularity 设置为 POWERSOF2，可将存储桶边界限制为 2 的幂次。

例子

以下数字遵循“2 的次方”序列：

2 ⁰ = 1
2 ¹ = 2
2 ² = 4
2 ³ = 8
2 ⁴ = 16
2 ⁵ = 32
依此类推...

一个常见的实现方式是，诸如内存之类的各种计算机组件通常遵循首选数的 POWERSOF2 集合：

1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, and so on....

比较不同的颗粒度

以下操作演示了指定不同的 granularity 值会如何影响 $bucketAuto 确定存储桶边界的方式。things 集合的 _id 的值从 0 到 99 不等：

{ _id: 0 }
{ _id: 1 }
...
{ _id: 99 }

将 granularity 的不同值替换为以下操作：

db.things.aggregate( [
  {
    $bucketAuto: {
      groupBy: "$_id",
      buckets: 5,
      granularity: <granularity>
    }
  }
] )

下表中的结果表明不同的 granularity 值如何产生不同的存储桶边界：

粒度	结果	注意
无粒度	{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 } { "_id" : { "min" : 20, "max" : 40 }, "count" : 20 } { "_id" : { "min" : 40, "max" : 60 }, "count" : 20 } { "_id" : { "min" : 60, "max" : 80 }, "count" : 20 } { "_id" : { "min" : 80, "max" : 99 }, "count" : 20 }
R20	{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 } { "_id" : { "min" : 20, "max" : 40 }, "count" : 20 } { "_id" : { "min" : 40, "max" : 63 }, "count" : 23 } { "_id" : { "min" : 63, "max" : 90 }, "count" : 27 } { "_id" : { "min" : 90, "max" : 100 }, "count" : 10 }
E24	{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 } { "_id" : { "min" : 20, "max" : 43 }, "count" : 23 } { "_id" : { "min" : 43, "max" : 68 }, "count" : 25 } { "_id" : { "min" : 68, "max" : 91 }, "count" : 23 } { "_id" : { "min" : 91, "max" : 100 }, "count" : 9 }
1-2-5	{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 } { "_id" : { "min" : 20, "max" : 50 }, "count" : 30 } { "_id" : { "min" : 50, "max" : 100 }, "count" : 50 }	指定的存储桶数超过数列中的区间数。
POWERSOF2	{ "_id" : { "min" : 0, "max" : 32 }, "count" : 32 } { "_id" : { "min" : 32, "max" : 64 }, "count" : 32 } { "_id" : { "min" : 64, "max" : 128 }, "count" : 36 }	指定的存储桶数超过数列中的区间数。

示例

请考虑包含以下文档的集合 artwork：

{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
    "price" : Decimal128("199.99"),
    "dimensions" : { "height" : 39, "width" : 21, "units" : "in" } }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
    "price" : Decimal128("280.00"),
    "dimensions" : { "height" : 49, "width" : 32, "units" : "in" } }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
    "price" : Decimal128("76.04"),
    "dimensions" : { "height" : 25, "width" : 20, "units" : "in" } }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
    "price" : Decimal128("167.30"),
    "dimensions" : { "height" : 24, "width" : 36, "units" : "in" } }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
    "price" : Decimal128("483.00"),
    "dimensions" : { "height" : 20, "width" : 24, "units" : "in" } }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
    "price" : Decimal128("385.00"),
    "dimensions" : { "height" : 30, "width" : 46, "units" : "in" } }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch",
    "price" : Decimal128("159.00"),
    "dimensions" : { "height" : 24, "width" : 18, "units" : "in" } }
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
    "price" : Decimal128("118.42"),
    "dimensions" : { "height" : 24, "width" : 20, "units" : "in" } }

单分面聚合

在以下操作中，根据 price 字段中的值将输入文档分为四个存储桶：

db.artwork.aggregate( [
  {
    $bucketAuto: {
        groupBy: "$price",
        buckets: 4
    }
  }
] )

该操作将返回以下文档：

{
  "_id" : {
    "min" : Decimal128("76.04"),
    "max" : Decimal128("159.00")
  },
  "count" : 2
}
{
  "_id" : {
    "min" : Decimal128("159.00"),
    "max" : Decimal128("199.99")
  },
  "count" : 2
}
{
  "_id" : {
    "min" : Decimal128("199.99"),
    "max" : Decimal128("385.00")
  },
  "count" : 2
}
{
  "_id" : {
    "min" : Decimal128("385.00"),
    "max" : Decimal128("483.00")
  },
  "count" : 2
}

多分面聚合

$bucketAuto阶段可用于$facet阶段，以对来自artwork的同一组输入文档处理多个聚合管道。

以下聚合管道根据 price、year 和计算出的 area 将 artwork 集合的文档分组到存储桶：

db.artwork.aggregate( [
  {
    $facet: {
      "price": [
        {
          $bucketAuto: {
            groupBy: "$price",
            buckets: 4
          }
        }
      ],
      "year": [
        {
          $bucketAuto: {
            groupBy: "$year",
            buckets: 3,
            output: {
              "count": { $sum: 1 },
              "years": { $push: "$year" }
            }
          }
        }
      ],
      "area": [
        {
          $bucketAuto: {
            groupBy: {
              $multiply: [ "$dimensions.height", "$dimensions.width" ]
            },
            buckets: 4,
            output: {
              "count": { $sum: 1 },
              "titles": { $push: "$title" }
            }
          }
        }
      ]
    }
  }
] )

该操作将返回以下文档：

{
  "area" : [
    {
      "_id" : { "min" : 432, "max" : 500 },
      "count" : 3,
      "titles" : [
        "The Scream",
        "The Persistence of Memory",
        "Blue Flower"
      ]
    },
    {
      "_id" : { "min" : 500,   "max" : 864 },
      "count" : 2,
      "titles" : [
        "Dancer",
        "The Pillars of Society"
      ]
    },
    {
      "_id" : { "min" : 864, "max" : 1568 },
      "count" : 2,
      "titles" : [
        "The Great Wave off Kanagawa",
        "Composition VII"
      ]
    },
    {
      "_id" : { "min" : 1568, "max" : 1568 },
      "count" : 1,
      "titles" : [
        "Melancholy III"
      ]
    }
  ],
  "price" : [
    {
      "_id" : { "min" : Decimal128("76.04"), "max" : Decimal128("159.00") },
      "count" : 2
    },
    {
      "_id" : { "min" : Decimal128("159.00"), "max" : Decimal128("199.99") },
      "count" : 2
    },
    {
      "_id" : { "min" : Decimal128("199.99"), "max" : Decimal128("385.00") },
      "count" : 2 },
    {
      "_id" : { "min" : Decimal128("385.00"), "max" : Decimal128("483.00") },
      "count" : 2
    }
  ],
  "year" : [
    { "_id" : { "min" : null, "max" : 1913 }, "count" : 3, "years" : [ 1902 ] },
    { "_id" : { "min" : 1913, "max" : 1926 }, "count" : 3, "years" : [ 1913, 1918, 1925 ] },
    { "_id" : { "min" : 1926, "max" : 1931 }, "count" : 2, "years" : [ 1926, 1931 ] }
  ]
}

本页上的C#示例使用Atlas示例数据集中的 sample_mflix数据库。要学习；了解如何创建免费的MongoDB Atlas 群集并加载示例数据集，请参阅MongoDB .NET/ C#驱动程序文档中的入门。

以下 Movie 类对 sample_mflix.movies 集合中的文档进行建模：

public class Movie
{
    public ObjectId Id { get; set; }
    public int Runtime { get; set; }
    
    public string Title { get; set; }
    public string Rated { get; set; }
    public List<string> Genres { get; set; }
    public string Plot { get; set; }
    
    public ImdbData Imdb { get; set; }
    public int Year { get; set; }
    public int Index { get; set; }
    
    public string[] Comments { get; set; }
   
    [BsonElement("lastupdated")]
    public DateTime LastUpdated { get; set; }
}

注意

用于 Pascal Case 的 ConventionPack

此页面上的 C# 类在其属性名称中使用 Pascal 命名法，而 MongoDB 集合中的字段名称则使用 camel 命名法。为了解决这种差异，可以在应用程序启动时使用以下代码注册一个 ConventionPack：

var camelCaseConvention = new ConventionPack { new CamelCaseElementNameConvention() };
ConventionRegistry.Register("CamelCase", camelCaseConvention, type => true);

要使用MongoDB .NET/ C#驱动程序将 $bucketAuto 阶段添加到聚合管道，请对 PipelineDefinition对象调用 BucketAuto() 方法。

以下示例创建了一个管道阶段，该阶段根据文档的 Runtime字段的值将文档均匀分布到五个存储桶中：

var pipeline = new EmptyPipelineDefinition<Movie>()
    .BucketAuto(
        groupBy: m => m.Runtime,
        buckets: 5);

您可以使用AggregateBucketAutoOptions对象指定基于数字的首选方案来设立边界值。以下示例执行与上一示例相同的 $bucketAuto 操作，但还将存储桶边界设置为 2 次方：

var bucketAutoOptions = new AggregateBucketAutoOptions()
{
    Granularity = new AggregateBucketAutoGranularity("POWERSOF2")
};
var pipeline = new EmptyPipelineDefinition<Movie>()
    .BucketAuto(
        groupBy: m => m.Runtime,
        buckets: 5,
        options: bucketAutoOptions);

本页上的 Node.js 示例使用 Atlas 示例数据集中的 sample_mflix数据库。要学习如何创建免费的MongoDB Atlas 集群并加载示例数据集，请参阅MongoDB Node.js驱动程序文档中的入门。

要使用MongoDB Node.js驱动程序将 $bucketAuto 阶段添加到聚合管道，请在管道对象中使用 $bucketAuto操作符。

以下示例创建了一个管道阶段，该阶段根据文档的 runtime字段的值将文档均匀分布到五个存储桶中。然后，该示例运行聚合管道：

const pipeline = [
  {
    $bucketAuto: {
      groupBy: "$runtime",
      buckets: 5
    }
  }
];
const cursor = collection.aggregate(pipeline);
return cursor;

以下示例执行与上一示例相同的 $bucketAuto 操作，但使用 granularity 参数将存储桶边界设置为 2 的幂：

const pipeline = [
  {
    $bucketAuto: {
      groupBy: "$runtime",
      buckets: 5,
      granularity: "POWERSOF2"
    }
  }
];
const cursor = collection.aggregate(pipeline);
return cursor;

了解详情

要学习；了解有关相关管道阶段的更多信息，请参阅 $bucket 指南。

后退

$bucket

来年

$changeStream

定义

Considerations

$bucketAuto 和内存限制

提示

行为

粒度

Renard 数列

例子

E 系列

1-2-5 系列

例子

“2 的次方”序列

例子

比较不同的颗粒度

示例

单分面聚合

多分面聚合

注意

用于 Pascal Case 的 ConventionPack

了解详情

`$bucketAuto` 和内存限制