Docs Menu
Docs Home
/
MongoDB Manual
/ / /

Group Data with the Bucket Pattern

On this page

  • About this Task
  • Steps
  • Group the data by customerId
  • Add an identifier and count for each bucket
  • Next Steps
  • Query for Data with the Bucket Pattern
  • Insert Data with the Bucket Pattern
  • Results
  • Learn More

The bucket pattern separates long series of data into distinct objects. Separating large data series into smaller groups can improve query access patterns and simplify application logic. Bucketing is useful when you have similar objects that relate to a central entity, such as stock trades made by a single user.

You can use the bucket pattern for pagination by grouping your data based on the elements that your application shows per page. This approach uses MongoDB's flexible data model to store data according to the data your applications needs.

Tip

Time series collections apply the bucket pattern automatically, and are suitable for most applications that involve bucketing time series data.

Consider the following schema that tracks stock trades. The initial schema does not use the bucket pattern, and stores each trade in an individual document.

db.trades.insertMany(
[
{
"ticker" : "MDB",
"customerId": 123,
"type" : "buy",
"quantity" : 419,
"date" : ISODate("2023-10-26T15:47:03.434Z")
},
{
"ticker" : "MDB",
"customerId": 123,
"type" : "sell",
"quantity" : 29,
"date" : ISODate("2023-10-30T09:32:57.765Z")
},
{
"ticker" : "GOOG",
"customerId": 456,
"type" : "buy",
"quantity" : 50,
"date" : ISODate("2023-10-31T11:16:02.120Z")
}
]
)

The application shows stock trades made by a single customer at a time, and shows 10 trades per page. To simplify the application logic, use the bucket pattern to group the trades by customerId in groups of 10.

1

Reorganize the schema to have a single document for each customerId:

{
"customerId": 123,
"history": [
{
"type": "buy",
"ticker": "MDB",
"qty": 419,
"date": ISODate("2023-10-26T15:47:03.434Z")
},
{
"type": "sell",
"ticker": "MDB",
"qty": 29,
"date": ISODate("2023-10-30T09:32:57.765Z")
}
]
},
{
"customerId": 456,
"history": [
{
"type" : "buy",
"ticker" : "GOOG",
"quantity" : 50,
"date" : ISODate("2023-10-31T11:16:02.120Z")
}
]
}

With the bucket pattern:

  • Documents with common customerId values are condensed into a single document, with the customerId being a top-level field.

  • Trades for that customer are grouped into an embedded array field, called history.

2
1db.trades.drop()
2
3db.trades.insertMany(
4 [
5 {
6 "_id": "123_1698349623",
7 "customerId": 123,
8 "count": 2,
9 "history": [
10 {
11 "type": "buy",
12 "ticker": "MDB",
13 "qty": 419,
14 "date": ISODate("2023-10-26T15:47:03.434Z")
15 },
16 {
17 "type": "sell",
18 "ticker": "MDB",
19 "qty": 29,
20 "date": ISODate("2023-10-30T09:32:57.765Z")
21 }
22 ]
23 },
24 {
25 "_id": "456_1698765362",
26 "customerId": 456,
27 "count": 1,
28 "history": [
29 {
30 "type" : "buy",
31 "ticker" : "GOOG",
32 "quantity" : 50,
33 "date" : ISODate("2023-10-31T11:16:02.120Z")
34 }
35 ]
36 },
37 ]
38)

The _id field value is a concatenation of the customerId and the first trade time in seconds (since the unix epoch) in the history field.

The count field indicates how many elements are in that document's history array. The count field is used to implement pagination logic.

After you update your schema to use the bucket pattern, update your application logic for reading and writing data. See the following sections:

In the updated schema, each document contains data for a single page in the application. You can use the _id and count field to determine how to return and update data.

To query for data on the appropriate page, use a regex query to return data for a specified customerId, and use skip to return to the data for the correct page. The regex query on _id uses the default _id index, which results in performant queries without the need for an additional index.

The following query returns data for the first page of trades for customer 123:

db.trades.find( { "_id": /^123_/ } ).sort( { _id: 1 } ).limit(1)

To return data for later pages, specify a skip value of one less than the page you want to show data for. For example, to show data for page 10, run the following query:

db.trades.find( { "_id": /^123_/ } ).sort( { _id: 1 } ).skip(9).limit(1)

Note

The preceding query returns no results because the sample data only contains documents for the first page.

Now that the schema uses the bucket pattern, update your application logic to insert new trades into the correct bucket. Use an update command to insert the trade into the bucket with the appropriate customerId and bucket.

The following command inserts a new trade for customerId: 123:

db.trades.updateOne( { "_id": /^123_/, "count": { $lt: 10 } },
{
"$push": {
"history": {
"type": "buy",
"ticker": "MSFT",
"qty": 42,
"date": ISODate("2023-11-02T11:43:10")
}
},
"$inc": { "count": 1 },
"$setOnInsert": { "_id": "123_1698939791", "customerId": 123 }
},
{ upsert: true }
)

The application displays 10 trades per page. The update filter searches for a document for customerId: 123 where the count is less than 10, meaning that bucket does not contain a full page of data.

  • If there is a document that matches "_id": /^123_/ and its count is less than 10, the update command pushes the new trade into the matched document's history array.

  • If there is not a matching document, the update command inserts a new document with the new trade (because upsert is true). The _id field of the new document is a concatenation of the customerId and the and the time in seconds since the unix epoch of the trade.

The logic for update commands avoids unbounded arrays by ensuring that no history array contains more than 10 documents.

After you run the update operation, the trades collection has the following documents:

[
{
_id: '123_1698349623',
customerId: 123,
count: 3,
history: [
{
type: 'buy',
ticker: 'MDB',
qty: 419,
date: ISODate("2023-10-26T15:47:03.434Z")
},
{
type: 'sell',
ticker: 'MDB',
qty: 29,
date: ISODate("2023-10-30T09:32:57.765Z")
},
{
type: 'buy',
ticker: 'MSFT',
qty: 42,
date: ISODate("2023-11-02T11:43:10.000Z")
}
]
},
{
_id: '456_1698765362',
customerId: 456,
count: 1,
history: [
{
type: 'buy',
ticker: 'GOOG',
quantity: 50,
date: ISODate("2023-10-31T11:16:02.120Z")
}
]
}
]

After you implement the bucket pattern, you don't need to incorporate pagination logic to return results in your application. The way the data is stored matches the way it is used in the application.

Back

Group Data