How to modify this aggregate to update part of a collection in batches?

notapolita · October 22, 2022, 10:58pm

I have this aggregate which assigns a random number to the ‘order’ field in every document within the ‘data’ collection. (The point was to shuffle the order in which data is retrieved every once in a while.)

db.aggregate(
	[	
		{ $set: { "order": { $multiply: [ { $rand: {} }, 200000 ] } } },
		{ $set: { "order": { $floor: "$order" } } },
		{ $merge:  "data"}
	]
)

I need to upgrade this to do things a bit differently:
1: Filter by some of the document fields to only assign the random numbers to a portion of the collection, not the entire collection.
2: Assign every generated random number to 10 documents, not 1. It doesn’t matter which batch gets what number, but each document within a batch should get the same number.

Please help me to understand how to do it.
Thank you.

Pavel_Duchovny · October 23, 2022, 9:17am

Hi @notapolita ,

Its not a super stright forward idea for the mongoDB sever, but the aggregation framework is so rich that you can do the following:

db.data.aggregate([{
 $match: {
 <ANY_TYPE_CONDITION>
 }
}, {
 $setWindowFields: {
  partitionBy: null,
  sortBy: {
   _id: 1
  },
  output: {
   documentNumber: {
    $documentNumber: {}
   }
  }
 }
}, {
 $group: {
  _id: {
   $floor: {
    $divide: [
     '$documentNumber',
     10
    ]
   }
  },
  result: {
   $push: '$$ROOT'
  }
 }
}, {
 $set: {
  order: {
   $floor: {
    $multiply: [
     {
      $rand: {}
     },
     200000
    ]
   }
  }
 }
}, {$sort : {order : 1}}],{"allowDiskUse" : true})

This aggregation will basically first use a match stage to filter on any filter expression that a $match can have, this will cover your first requirement

Then the next stage will actually document number each document using 5.0+ setWindowFields and then will group by devision of 10 creating a document with 10 documents grouped under “results”. Now we add the random number to each 10 groups and sort by it.

There is no need to do 2 $set as it actually does a full document pass twice try to use a minimal stages as possible.

Thanks
Pavel