Query to find duplicate users (ip)

Hello! Would it be possible to create a query that goes through all access logs for all users and return an object with arrays with ips that has been used on multiple users?

Example of the document for a login

{
    "_id" : ObjectId("6091541aa4eea86edbc76dd0"),
    "type" : "login",
    "user" : ObjectId("6qa9d80cd24c110524c9dbee"),
    "ip" : "127.0.0.1",
    "createdAt" : ISODate("2021-05-04T14:03:06.670+0000"),
    "updatedAt" : ISODate("2021-05-04T14:03:06.670+0000"),
}

If it could return an object/print somehow which users has had the same ip and group them together? We have some users creating a lot of accounts (which is not allowed) but does’t always change his ip.

Hi @amazing,

Something like the following might be suitable for your needs:

db.foo.insertMany([
  { "ip" : "127.0.0.1", "user": 1 },
  { "ip" : "127.1.1.1", "user": 2 },
  { "ip" : "127.2.1.1", "user": 3 },
  { "ip" : "127.0.0.1", "user": 4 },
  { "ip" : "127.1.1.1", "user": 5 },
]);
db.foo.createIndex({ ip: 1, user: 1 });
db.foo.aggregate([
  { $sort: { ip: 1 } },
  { $group: {
    _id: "$ip",
    used: { $sum: 1 },
    users: { $push: { user: "$user" } }
  }},
  { $match: { used: { $gt: 1 } } },  
])

Note that if you’re only using the ip and user fields, having an index on these (as shown above) should greatly improve the performance of this operation.

I’ve written up a longer form of this response at Efficiently Identifying Duplicates using MongoDB | ALEX BEVILACQUA as this question comes up pretty frequently :wink:

5 Likes

Hi @alexbevi the results are exactly what I was looking for. Thank you so much!

However, with our database it doesn’t really work, Maybe i wasn’t clear enough before but the collection has all logins so with your query it will just add the same user multiple times on all ips if the user have logged in more than once? (which most users have.

So it should just add the user if it’s not a duplicate of itself :face_with_head_bandage:

@amazing in this case it sounds like you want to group by unique pairs of values (ex: ip/user):

db.foo.drop();
db.foo.insertMany([
  { "ip" : "127.0.0.1", "user": 1 },
  { "ip" : "127.1.1.1", "user": 1 },
  { "ip" : "127.2.1.1", "user": 2 },
  { "ip" : "127.0.0.1", "user": 1 },
  { "ip" : "127.1.1.1", "user": 2 },
]);
db.foo.createIndex({ ip: 1, user: 1 });
db.foo.aggregate([
  { $sort: { ip: 1 } },
  { $group: {
    _id: { ip: "$ip", user: "$user" },
    used: { $sum: 1 }
  }},
  { $match: { used: { $gt: 1 } } },  
])

This would just count any time a user and ip pair appear more than once. Is this more along the lines of what you were looking for?

If not it might help if you could provide a couple more sample documents that demonstrate what a “duplicate” looks like given your data.

Hey @alexbevi no your first reply was closer! Let me give a very clear example :slight_smile:

db.foo.insertMany([
  { "ip" : "123", "user": 1 },
  { "ip" : "123", "user": 1 },
  { "ip" : "123", "user": 1 },
  { "ip" : "456", "user": 2 },
  { "ip" : "456", "user": 3 },
  { "ip" : "789", "user": 4 },
]);
db.foo.createIndex({ ip: 1, user: 1 });
db.foo.aggregate([
  { $sort: { ip: 1 } },
  { $group: {
    _id: "$ip",
    used: { $sum: 1 },
    users: { $push: { user: "$user" } }
  }},
  { $match: { used: { $gt: 1 } } },  
])

So here the only thing that should return is

{
    "_id": "456",
    "used": 2,
    "users": [
      {
        "user": 2
      },
      {
        "user": 3
      }
    ]
  }

The one with 123 that also gets returned is from the same user logging in 3 times so that’s nothing wrong while user 2 & 3 both login on on the ip “456” which should be returned as it does now :slight_smile:

Thanks for clarifying. What you’re looking to do can be best accomplished by adding the user IDs to a set, then filtering the results by sets with more than 1 entry:

db.foo.aggregate([
  { $sort: { ip: 1 } },
  { $group: {
    _id: "$ip",    
    users: { $addToSet: "$user" }
  }},
  { $match: {     
    $expr: { $gt: [ { $size: "$users" }, 1] }
  }}
])
3 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.