Joining multiple collections by common field

Henrik · June 22, 2021, 12:24pm

Hi,

I have multiple collections with the following schema:
collection ‘game1’:

{
    "uuid": "a43fb870-e012-4c99-adf1-02adb58350bc",
    "stats": {
        "wins": 1,
        "games": 3
    }
}

collection ‘game2’:

{
    "uuid": "a43fb870-e012-4c99-adf1-02adb58350bc",
    "stats": {
        "wins": 8,
        "kills": 7
    }
}

Is it possible to make a query searching based on the UUID field that has the following output?

{
    "game1": {
        "stats": {
            "wins": 1,
            "games": 3
    },
    "game2": {
        "stats": {
            "wins": 8,
            "kills": 7
    }
}

alexbevi · June 22, 2021, 1:06pm

Hi @Henrik ,

This can be accomplished using an Aggregation Pipeline with a $lookup stage.

For example:

// Setup
db.game1.insert({
    "uuid": "a43fb870-e012-4c99-adf1-02adb58350bc",
    "stats": {
        "wins": 1,
        "games": 3
    }
});
db.game1.createIndex({ uuid: 1 })

db.game2.insert({
    "uuid": "a43fb870-e012-4c99-adf1-02adb58350bc",
    "stats": {
        "wins": 8,
        "kills": 7
    }
});
db.game2.createIndex({ uuid: 1 })

Given the above setup, the following pipeline should produce the desired result:

db.game1.aggregate([
   { $match: { uuid: "a43fb870-e012-4c99-adf1-02adb58350bc" } },
   { $lookup: {
       from: "game2",
       let: { uuid: "$uuid" },
       pipeline: [
          { $match: { $expr: { $eq: [ "$uuid",  "$$uuid" ] } } },
          { $project: { _id: 0, uuid: 0 } },
       ],
       as: "game2"
   }},
   { $unwind: "$game2" },
   { $project: { _id: 0, game1: { stats: "$stats" }, game2: "$game2" } }
]);

The indexes on both collections should allow the matching documents to be queried more efficiently on both the game1 and game2 collections.

If more game* collections exist this approach can be further expanded to add additional $lookup stages, however the performance of a pipeline with additional $lookups may not be great and the Data Model may need refinement instead.

Henrik · June 22, 2021, 1:31pm

Thank you, that was exactly what I was looking for. However your last comment made me think about restructuring the data model.

I have multiple games where each game has its own set of stats, as you saw on the original post. The two solutions I have thought of so far is one collection per game (what I currently have) or a single collection where each document has all the stats for all games. The reason I didn’t choose the second option was because it will end up being quite many read and write operations on that collection, but I also think I would want to avoid using $lookup on multiple collections because I might end up having 20 games.

Do you have any advice on this?

alexbevi · June 22, 2021, 1:39pm

Before settling on the data model it would be helpful to evaluate how you intend to interact with your data. If you will be interacting with each game’s collection frequently and only infrequently querying across games the per-game model is likely fine.

If you will frequently be interacting with all games’ data by uuid then a single collection may make more sense as you can later shard this data if collection “size” becomes a concern.

system · June 27, 2021, 1:40pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.