How to check existence of records for a given input list?

I have to check whether the records exist by passing unique property values for 1 million records. Instead of checking each document, is there any approach to review multiple records in the single or batch call?

Hi,
It would be better if you shared a sample of your collections and describe the desired result
Best,

1 Like

sample document

{
Empno : “123”,
Name :“ABC”
}
{
Empno : “222”,
Name :“XYZ”
}

if i pass the Empno in input list as {“123”,“333”}

The unmatched input value “333” should return.

That is a tricky one and I hope somebody will come up with a simpler solution.

Read Formatting code and log snippets in posts before posting code or documents next time. I could not just copy your documents because they were not formatted correctly.

My solution involves an extra collection. This extra collection would contains the empno you want to query. In your case it would be:

{ "_id" : 123 }
{ "_id" : 333 }

Then using the aggregation pipeline on that extra collection you $lookup into your main collection to find matches as the first stage. Then a $match stage will remove the documents for which an emp was found in the main collection.

lookup =
{
	"$lookup" : {
		"from" : "Sudhesh_Gnanasekaran",
		"localField" : "_id",
		"foreignField" : "empno",
		"as" : "found"
	}
}
match = { "$match" : { "found.0" : { "$exists" : false } } }
// running the pipeline
db.wanted.aggregate( [ lookup , match ] )
// would produce
{ "_id" : 333, "found" : [ ] }
// you could then add $project stage to remove the empty array
// you could even add a $out stage to store the result in another collection

And this is a map-reduce solution:

.aggregate([{
    $group: {
      _id: null,
      notFound: {
        $accumulator: {
          init: function(){
            return[ "123", "333"];
          },
          accumulate: function(arr, empno){
            return arr.filter(x=>x!==empno)
          },
          accumulateArgs: ["$Empno"],
          merge: function(arr1, arr2){
            let part1=arr1.filter(x=>!arr2.includes(x)); 
            let part2=arr2.filter(x=>!arr1.includes(x));
            return part1.concat(part2)
          },
          lang: "js"
        }}}}])

tested on v4.4.0 (shell & server)

Regards

Any solution that involves $group might require to use the disk if the group does not fit in RAM and might hit the 16MB limit.

In case you have lots of documents in your database, but you need to perform a check for a relatively small list of values, you could use the solution below.

Dataset example:

db.fruits.insertMany([
  { name: 'apple' },
  { name: 'plum' },
  { name: 'pear' }
]);

List example of values that you may want to check for existence:
['pear', 'onion', 'apple', 'potato']

Aggregation:

db.fruits.aggregate([
  {
    $group: {
      // List of values to check
      _id: ['pear', 'onion', 'apple', 'potato'],
    }
  },
  {
    $unwind: '$_id'
  },
  {
    $lookup: {
      from: 'fruits',
      localField: '_id',
      foreignField: 'name',
      as: 'joined',
    }
  },
  {
    $project: {
      missing: {
        $cond: [
          { $eq: [{ $size: '$joined' }, 0 ] },
          '$_id',
          '$$REMOVE',
        ]
      }
    }
  },
  {
    $group: {
      _id: null,
      missing: {
        $addToSet: '$missing'
      }
    }
  }
]);

Output:
[ { _id: null, missing: [ 'potato', 'onion' ] } ]

1 Like