Group Data with the Bucket Pattern
On this page
The bucket pattern separates long series of data into distinct objects. Separating large data series into smaller groups can improve query access patterns and simplify application logic. Bucketing is useful when you have similar objects that relate to a central entity, such as stock trades made by a single user.
You can use the bucket pattern for pagination by grouping your data based on the elements that your application shows per page. This approach uses MongoDB's flexible data model to store data according to the data your applications needs.
Tip
Time series collections apply the bucket pattern automatically, and are suitable for most applications that involve bucketing time series data.
About this Task
Consider the following schema that tracks stock trades. The initial schema does not use the bucket pattern, and stores each trade in an individual document.
db.trades.insertMany( [ { "ticker" : "MDB", "customerId": 123, "type" : "buy", "quantity" : 419, "date" : ISODate("2023-10-26T15:47:03.434Z") }, { "ticker" : "MDB", "customerId": 123, "type" : "sell", "quantity" : 29, "date" : ISODate("2023-10-30T09:32:57.765Z") }, { "ticker" : "GOOG", "customerId": 456, "type" : "buy", "quantity" : 50, "date" : ISODate("2023-10-31T11:16:02.120Z") } ] )
The application shows stock trades made by a single customer at a time,
and shows 10 trades per page. To simplify the application logic, use the
bucket pattern to group the trades by customerId
in groups of 10.
Steps
Group the data by customerId
Reorganize the schema to have a single document for each
customerId
:
{ "customerId": 123, "history": [ { "type": "buy", "ticker": "MDB", "qty": 419, "date": ISODate("2023-10-26T15:47:03.434Z") }, { "type": "sell", "ticker": "MDB", "qty": 29, "date": ISODate("2023-10-30T09:32:57.765Z") } ] }, { "customerId": 456, "history": [ { "type" : "buy", "ticker" : "GOOG", "quantity" : 50, "date" : ISODate("2023-10-31T11:16:02.120Z") } ] }
With the bucket pattern:
Documents with common
customerId
values are condensed into a single document, with thecustomerId
being a top-level field.Trades for that customer are grouped into an embedded array field, called
history
.
Add an identifier and count for each bucket
1 db.trades.drop() 2 3 db.trades.insertMany( 4 [ 5 { 6 "_id": "123_1698349623", 7 "customerId": 123, 8 "count": 2, 9 "history": [ 10 { 11 "type": "buy", 12 "ticker": "MDB", 13 "qty": 419, 14 "date": ISODate("2023-10-26T15:47:03.434Z") 15 }, 16 { 17 "type": "sell", 18 "ticker": "MDB", 19 "qty": 29, 20 "date": ISODate("2023-10-30T09:32:57.765Z") 21 } 22 ] 23 }, 24 { 25 "_id": "456_1698765362", 26 "customerId": 456, 27 "count": 1, 28 "history": [ 29 { 30 "type" : "buy", 31 "ticker" : "GOOG", 32 "quantity" : 50, 33 "date" : ISODate("2023-10-31T11:16:02.120Z") 34 } 35 ] 36 }, 37 ] 38 )
The _id
field value is a concatenation of the customerId
and the first trade time in seconds (since the unix epoch)
in the history
field.
The count
field indicates how many elements are in that
document's history
array. The count
field is used to
implement pagination logic.
Next Steps
After you update your schema to use the bucket pattern, update your application logic for reading and writing data. See the following sections:
Query for Data with the Bucket Pattern
In the updated schema, each document contains data for a single page in
the application. You can use the _id
and count
field to
determine how to return and update data.
To query for data on the appropriate page, use a regex query to return
data for a specified customerId
, and use skip
to return to the data for the correct page. The regex
query on _id
uses the default _id index,
which results in performant queries without the need for an additional
index.
The following query returns data for the first page of trades for
customer 123
:
db.trades.find( { "_id": /^123_/ } ).sort( { _id: 1 } ).limit(1)
To return data for later pages, specify a skip
value of one less
than the page you want to show data for. For example, to show data for
page 10, run the following query:
db.trades.find( { "_id": /^123_/ } ).sort( { _id: 1 } ).skip(9).limit(1)
Note
The preceding query returns no results because the sample data only contains documents for the first page.
Insert Data with the Bucket Pattern
Now that the schema uses the bucket pattern, update your application
logic to insert new trades into the correct bucket. Use an update
command to insert the trade into the bucket with the appropriate
customerId
and bucket.
The following command inserts a new trade for customerId: 123
:
db.trades.updateOne( { "_id": /^123_/, "count": { $lt: 10 } }, { "$push": { "history": { "type": "buy", "ticker": "MSFT", "qty": 42, "date": ISODate("2023-11-02T11:43:10") } }, "$inc": { "count": 1 }, "$setOnInsert": { "_id": "123_1698939791", "customerId": 123 } }, { upsert: true } )
The application displays 10 trades per page. The update filter searches
for a document for customerId: 123
where the count
is less than
10, meaning that bucket does not contain a full page of data.
If there is a document that matches
"_id": /^123_/
and itscount
is less than 10, the update command pushes the new trade into the matched document'shistory
array.If there is not a matching document, the update command inserts a new document with the new trade (because
upsert
istrue
). The_id
field of the new document is a concatenation of thecustomerId
and the time in seconds since the unix epoch of the trade.
The logic for update commands avoids unbounded arrays by ensuring that no history
array contains more than 10
documents.
After you run the update operation, the trades
collection has the
following documents:
[ { _id: '123_1698349623', customerId: 123, count: 3, history: [ { type: 'buy', ticker: 'MDB', qty: 419, date: ISODate("2023-10-26T15:47:03.434Z") }, { type: 'sell', ticker: 'MDB', qty: 29, date: ISODate("2023-10-30T09:32:57.765Z") }, { type: 'buy', ticker: 'MSFT', qty: 42, date: ISODate("2023-11-02T11:43:10.000Z") } ] }, { _id: '456_1698765362', customerId: 456, count: 1, history: [ { type: 'buy', ticker: 'GOOG', quantity: 50, date: ISODate("2023-10-31T11:16:02.120Z") } ] } ]
Results
After you implement the bucket pattern, you don't need to incorporate pagination logic to return results in your application. The way the data is stored matches the way it is used in the application.