Group Data with the Bucket Pattern

On this page

About this Task

Steps
Group the data by customerId
Add an identifier and count for each bucket
Next Steps
Query for Data with the Bucket Pattern
Insert Data with the Bucket Pattern
Results
Learn More

The bucket pattern separates long series of data into distinct objects. Separating large data series into smaller groups can improve query access patterns and simplify application logic. Bucketing is useful when you have similar objects that relate to a central entity, such as stock trades made by a single user.

You can use the bucket pattern for pagination by grouping your data based on the elements that your application shows per page. This approach uses MongoDB's flexible data model to store data according to the data your applications needs.

Tip

Time series collections apply the bucket pattern automatically, and are suitable for most applications that involve bucketing time series data.

About this Task

Consider the following schema that tracks stock trades. The initial schema does not use the bucket pattern, and stores each trade in an individual document.

db.trades.insertMany(
  [
    {
      "ticker" : "MDB",
      "customerId": 123,
      "type" : "buy",
      "quantity" : 419,
      "date" : ISODate("2023-10-26T15:47:03.434Z")
    },
    {
      "ticker" : "MDB",
      "customerId": 123,
      "type" : "sell",
      "quantity" : 29,
      "date" : ISODate("2023-10-30T09:32:57.765Z")
    },
    {
      "ticker" : "GOOG",
      "customerId": 456,
      "type" : "buy",
      "quantity" : 50,
      "date" : ISODate("2023-10-31T11:16:02.120Z")
    }
  ]
)

The application shows stock trades made by a single customer at a time, and shows 10 trades per page. To simplify the application logic, use the bucket pattern to group the trades by customerId in groups of 10.

Steps

Group the data by customerId

Reorganize the schema to have a single document for each customerId:

{
  "customerId": 123,
  "history": [
    {
      "type": "buy",
      "ticker": "MDB",
      "qty": 419,
      "date": ISODate("2023-10-26T15:47:03.434Z")
    },
    {
      "type": "sell",
      "ticker": "MDB",
      "qty": 29,
      "date": ISODate("2023-10-30T09:32:57.765Z")
    }
  ]
},
{
  "customerId": 456,
  "history": [
    {
      "type" : "buy",
      "ticker" : "GOOG",
      "quantity" : 50,
      "date" : ISODate("2023-10-31T11:16:02.120Z")
    }
  ]
}

With the bucket pattern:

Documents with common customerId values are condensed into a single document, with the customerId being a top-level field.
Trades for that customer are grouped into an embedded array field, called history.

Add an identifier and count for each bucket

1 db.trades.drop()
2 
3 db.trades.insertMany(
4   [
5     {
6       "_id": "123_1698349623",
7       "customerId": 123,
8       "count": 2,
9       "history": [
10         {
11           "type": "buy",
12           "ticker": "MDB",
13           "qty": 419,
14           "date": ISODate("2023-10-26T15:47:03.434Z")
15         },
16         {
17           "type": "sell",
18           "ticker": "MDB",
19           "qty": 29,
20           "date": ISODate("2023-10-30T09:32:57.765Z")
21         }
22       ]
23     },
24     {
25       "_id": "456_1698765362",
26       "customerId": 456,
27       "count": 1,
28       "history": [
29         {
30           "type" : "buy",
31           "ticker" : "GOOG",
32           "quantity" : 50,
33           "date" : ISODate("2023-10-31T11:16:02.120Z")
34         }
35       ]
36     },
37   ]
38 )

The _id field value is a concatenation of the customerId and the first trade time in seconds (since the unix epoch) in the history field.

The count field indicates how many elements are in that document's history array. The count field is used to implement pagination logic.

Next Steps

After you update your schema to use the bucket pattern, update your application logic for reading and writing data. See the following sections:

Query for Data with the Bucket Pattern
Insert Data with the Bucket Pattern

Query for Data with the Bucket Pattern

In the updated schema, each document contains data for a single page in the application. You can use the _id and count field to determine how to return and update data.

To query for data on the appropriate page, use a regex query to return data for a specified customerId, and use skip to return to the data for the correct page. The regex query on _id uses the default _id index, which results in performant queries without the need for an additional index.

The following query returns data for the first page of trades for customer 123:

db.trades.find( { "_id": /^123_/ } ).sort( { _id: 1 } ).limit(1)

To return data for later pages, specify a skip value of one less than the page you want to show data for. For example, to show data for page 10, run the following query:

db.trades.find( { "_id": /^123_/ } ).sort( { _id: 1 } ).skip(9).limit(1)

Note

The preceding query returns no results because the sample data only contains documents for the first page.

Insert Data with the Bucket Pattern

Now that the schema uses the bucket pattern, update your application logic to insert new trades into the correct bucket. Use an update command to insert the trade into the bucket with the appropriate customerId and bucket.

The following command inserts a new trade for customerId: 123:

db.trades.updateOne( { "_id": /^123_/, "count": { $lt: 10 } },
   {
      "$push": {
         "history": {
         "type": "buy",
         "ticker": "MSFT",
         "qty": 42,
         "date": ISODate("2023-11-02T11:43:10")
       }
    },
    "$inc": { "count": 1 },
    "$setOnInsert": { "_id": "123_1698939791", "customerId": 123 }
   },
   { upsert: true }
)

The application displays 10 trades per page. The update filter searches for a document for customerId: 123 where the count is less than 10, meaning that bucket does not contain a full page of data.

If there is a document that matches "_id": /^123_/ and its count is less than 10, the update command pushes the new trade into the matched document's history array.
If there is not a matching document, the update command inserts a new document with the new trade (because upsert is true). The _id field of the new document is a concatenation of the customerId and the time in seconds since the unix epoch of the trade.

The logic for update commands avoids unbounded arrays by ensuring that no history array contains more than 10 documents.

After you run the update operation, the trades collection has the following documents:

[
  {
    _id: '123_1698349623',
    customerId: 123,
    count: 3,
    history: [
      {
        type: 'buy',
        ticker: 'MDB',
        qty: 419,
        date: ISODate("2023-10-26T15:47:03.434Z")
      },
      {
        type: 'sell',
        ticker: 'MDB',
        qty: 29,
        date: ISODate("2023-10-30T09:32:57.765Z")
      },
      {
        type: 'buy',
        ticker: 'MSFT',
        qty: 42,
        date: ISODate("2023-11-02T11:43:10.000Z")
      }
    ]
  },
  {
    _id: '456_1698765362',
    customerId: 456,
    count: 1,
    history: [
      {
        type: 'buy',
        ticker: 'GOOG',
        quantity: 50,
        date: ISODate("2023-10-31T11:16:02.120Z")
      }
    ]
  }
]

Results

After you implement the bucket pattern, you don't need to incorporate pagination logic to return results in your application. The way the data is stored matches the way it is used in the application.

Learn More

Back

Group Data

Outlier Pattern

1	db.trades.drop()
2
3	db.trades.insertMany(
4	[
5	{
6	"_id": "123_1698349623",
7	"customerId": 123,
8	"count": 2,
9	"history": [
10	{
11	"type": "buy",
12	"ticker": "MDB",
13	"qty": 419,
14	"date": ISODate("2023-10-26T15:47:03.434Z")
15	},
16	{
17	"type": "sell",
18	"ticker": "MDB",
19	"qty": 29,
20	"date": ISODate("2023-10-30T09:32:57.765Z")
21	}
22	]
23	},
24	{
25	"_id": "456_1698765362",
26	"customerId": 456,
27	"count": 1,
28	"history": [
29	{
30	"type" : "buy",
31	"ticker" : "GOOG",
32	"quantity" : 50,
33	"date" : ISODate("2023-10-31T11:16:02.120Z")
34	}
35	]
36	},
37	]
38	)