Why would scanning 28k documents in the 2nd stage of an aggregation take 22 seconds?

Hello! I am having trouble to understand why my aggregation pipeline is soooo slow (22 seconds). Here it is, running on my Connection collection:

[
  {
    $match: {
      "between.objectId": "6oBQHqtowF7iAd9YFJCZn_A7GLg2hU"
    }
  },
  {
    $lookup: {
      from: "Profile",
      localField: "between.objectId",
      foreignField: "_id",
      as: "profile"
    }
  }
]

The first stage retrieves 28000 documents in 240 ms (executionTimeMillisEstimate), that’s ok I guess, but the second stage takes - hold on - 22 seconds !!! My mongodb atlas is not under pressure, my website is under maintenance, I am the only one making a request via MongoDB Compass software.

Here is a Connection document (among 8325551 documents) and indexes :

{
  "_id": "fagGBbgfcBbUlMsPLgdD4qhKjLSeZ7ziIwk,6oBQHqtowF7iAd9YFJCZn_A7GLg2hU",
  "between": [
    {
      "__type": "Pointer",
      "className": "Profile",
      "objectId": "fagGBbgfcBbUlMsPLgdD4qhKjLSeZ7ziIwk"
    },
    {
      "__type": "Pointer",
      "className": "Profile",
      "objectId": "6oBQHqtowF7iAd9YFJCZn_A7GLg2hU"
    }
  ],
  "since": {
    "$date": "2019-03-21T18:01:00.000Z"
  },
  "_created_at": {
    "$date": "2024-01-18T02:09:19.957Z"
  },
  "_updated_at": {
    "$date": "2024-08-22T21:46:25.418Z"
  }
}

with indexes:

[
  { v: 2, key: { _id: 1 }, name: '_id_' },
  { v: 2, key: { since: -1 }, name: 'since_-1' },
  {
    v: 2,
    key: { 'between.objectId': 1, _p_disconnectOp: 1, disconnectedAt: 1 },
    name: 'between.objectId_1__p_disconnectOp_1_disconnectedAt_1'
  }
]

and a Profile document (among 7227305 documents) and indexes :

  "_id": "6oBQHqtowF7iAd9YFJCZn_A7GLg2hU",
  "firstName": "John",
  "lastName": "Doe",
  "_created_at": {
    "$date": "2023-02-13T13:09:31.397Z"
  }
}

with indexes:

[
  { v: 2, key: { _id: 1 }, name: '_id_' },
  { v: 2, key: { publicId: 1 }, name: 'publicId_1' },
  { v: 2, key: { syncedAt: 1 }, name: 'syncedAt_1' }
]

I would really appreciate your help on that. Thank you very much :pray:

The first thing I would try is to $project between.objectId:1 and _id:0 and see what it does.

Without the explain plan it is hard to establish if the slow FETCH is on the Connection collection or the Profile collection.

There will be a FETCH on Profile for sure since you return the whole documents.

Returning 28K documents might be slow just for the amount of memory it needs.

The thing is that if you are doing a $lookup for exactly the same Profile. One thing you could try is aggregate on Profile first with a $match. This would give you the Profile, you then $unionWith on Connection to get all the connections.

The first document of the result set will be the unique Profile. The other documents will be the Connection. This way you lookup the Profile only once rather than for each Connection.

@frouo, it has been a week since I provided some information about your issue.

So if you

I would really appreciate a followup.

If my reply provided the solution, please mark it as such. If you found the solution elsewhere or by your own, please share with us.

Hello @steevej , sorry for my late answer. I tried using $unionWith (didn’t know about) but the result is the same.

In my $lookup I used a pipeline to only return the keys that I need, no more.

I am surpised that $lookup on 28k Profile using the unique id is so slow ! It doesn’t sound that much to me. Maybe the reason is the hardware? I am using a M10 cluster ( M10, 2GB RAM, 20GB Storage, 2 vCPUs) :man_shrugging:

Please share the exact pipelines you have tried with the explain plan they produce.

Thank you @steevej for your time. The exact pipeline is:

[
  {
    $match: {
      "between.objectId":
        "ACoAAACQPmoB2OL6tWoksPNyCwpBpSeg"
    }
  },
  {
    $addFields: {
      between: {
        $first: {
          $filter: {
            input: "$between",
            cond: {
              $ne: [
                "$$this.objectId",
                "ACoAAACQPmoB2OL6tWoksPNyCwpBpSeg"
              ]
            }
          }
        }
      }
    }
  },
  {
    $lookup: {
      from: "Profile",
      localField: "between.objectId",
      foreignField: "_id",
      as: "profile",
      pipeline: [
        {
          $project: {
            _id: 1,
            syncedAt: 1
          }
        }
      ]
    }
  }
]

and the explain:

{
  "explainVersion": "1",
  "stages": [
    {
      "$cursor": {
        "queryPlanner": {
          "namespace": "prod.Connection",
          "indexFilterSet": false,
          "parsedQuery": {
            "between.objectId": {
              "$eq": "ACoAAACQPmoB2OL6tWoksPNyCwpBpSeg"
            }
          },
          "queryHash": "E13BCAF8",
          "planCacheKey": "AA28C8B2",
          "maxIndexedOrSolutionsReached": false,
          "maxIndexedAndSolutionsReached": false,
          "maxScansToExplodeReached": false,
          "winningPlan": {
            "stage": "FETCH",
            "inputStage": {
              "stage": "IXSCAN",
              "keyPattern": {
                "between.objectId": 1,
                "_p_disconnectOp": 1,
                "disconnectedAt": 1
              },
              "indexName": "between.objectId_1__p_disconnectOp_1_disconnectedAt_1",
              "isMultiKey": true,
              "multiKeyPaths": {
                "between.objectId": ["between"],
                "_p_disconnectOp": [],
                "disconnectedAt": []
              },
              "isUnique": false,
              "isSparse": false,
              "isPartial": false,
              "indexVersion": 2,
              "direction": "forward",
              "indexBounds": {
                "between.objectId": [
                  "[\"ACoAAACQPmoB2OL6tWoksPNyCwpBpSeg\", \"ACoAAACQPmoB2OL6tWoksPNyCwpBpSeg\"]"
                ],
                "_p_disconnectOp": [
                  "[MinKey, MaxKey]"
                ],
                "disconnectedAt": [
                  "[MinKey, MaxKey]"
                ]
              }
            }
          },
          "rejectedPlans": []
        },
        "executionStats": {
          "executionSuccess": true,
          "nReturned": 23138,
          "executionTimeMillis": 9333,
          "totalKeysExamined": 23138,
          "totalDocsExamined": 23138,
          "executionStages": {
            "stage": "FETCH",
            "nReturned": 23138,
            "executionTimeMillisEstimate": 43,
            "works": 23139,
            "advanced": 23138,
            "needTime": 0,
            "needYield": 0,
            "saveState": 27,
            "restoreState": 27,
            "isEOF": 1,
            "docsExamined": 23138,
            "alreadyHasObj": 0,
            "inputStage": {
              "stage": "IXSCAN",
              "nReturned": 23138,
              "executionTimeMillisEstimate": 4,
              "works": 23139,
              "advanced": 23138,
              "needTime": 0,
              "needYield": 0,
              "saveState": 27,
              "restoreState": 27,
              "isEOF": 1,
              "keyPattern": {
                "between.objectId": 1,
                "_p_disconnectOp": 1,
                "disconnectedAt": 1
              },
              "indexName": "between.objectId_1__p_disconnectOp_1_disconnectedAt_1",
              "isMultiKey": true,
              "multiKeyPaths": {
                "between.objectId": ["between"],
                "_p_disconnectOp": [],
                "disconnectedAt": []
              },
              "isUnique": false,
              "isSparse": false,
              "isPartial": false,
              "indexVersion": 2,
              "direction": "forward",
              "indexBounds": {
                "between.objectId": [
                  "[\"ACoAAACQPmoB2OL6tWoksPNyCwpBpSeg\", \"ACoAAACQPmoB2OL6tWoksPNyCwpBpSeg\"]"
                ],
                "_p_disconnectOp": [
                  "[MinKey, MaxKey]"
                ],
                "disconnectedAt": [
                  "[MinKey, MaxKey]"
                ]
              },
              "keysExamined": 23138,
              "seeks": 1,
              "dupsTested": 23138,
              "dupsDropped": 0
            }
          }
        }
      },
      "nReturned": 23138,
      "executionTimeMillisEstimate": 126
    },
    {
      "$addFields": {
        "between": {
          "$first": [
            {
              "$filter": {
                "input": "$between",
                "as": "this",
                "cond": {
                  "$ne": [
                    "$$this.objectId",
                    {
                      "$const": "ACoAAACQPmoB2OL6tWoksPNyCwpBpSeg"
                    }
                  ]
                }
              }
            }
          ]
        }
      },
      "nReturned": 23138,
      "executionTimeMillisEstimate": 152
    },
    {
      "$lookup": {
        "from": "Profile",
        "as": "profile",
        "localField": "between.objectId",
        "foreignField": "_id",
        "let": {},
        "pipeline": [
          {
            "$project": {
              "_id": 1,
              "syncedAt": 1
            }
          }
        ]
      },
      "totalDocsExamined": 23138,
      "totalKeysExamined": 23138,
      "collectionScans": 0,
      "indexesUsed": ["_id_"],
      "nReturned": 23138,
      "executionTimeMillisEstimate": 9323
    }
  ],
  "serverInfo": {
    "host": "atlas-zl72i1-shard-00-02.avkshgd.mongodb.net",
    "port": 27017,
    "version": "6.0.19",
    "gitVersion": "a7ada5ff3a4d8a1e2ed88f86bd6b3d1d16cb43c6"
  },
  "serverParameters": {
    "internalQueryFacetBufferSizeBytes": 104857600,
    "internalQueryFacetMaxOutputDocSizeBytes": 104857600,
    "internalLookupStageIntermediateDocumentMaxSizeBytes": 104857600,
    "internalDocumentSourceGroupMaxMemoryBytes": 104857600,
    "internalQueryMaxBlockingSortMemoryUsageBytes": 104857600,
    "internalQueryProhibitBlockingMergeOnMongoS": 0,
    "internalQueryMaxAddToSetBytes": 104857600,
    "internalDocumentSourceSetWindowFieldsMaxMemoryBytes": 104857600
  },
  "command": {
    "aggregate": "Connection",
    "pipeline": [
      {
        "$match": {
          "between.objectId": "ACoAAACQPmoB2OL6tWoksPNyCwpBpSeg"
        }
      },
      {
        "$addFields": {
          "between": {
            "$first": {
              "$filter": {
                "input": "$between",
                "cond": {
                  "$ne": [
                    "$$this.objectId",
                    "ACoAAACQPmoB2OL6tWoksPNyCwpBpSeg"
                  ]
                }
              }
            }
          }
        }
      },
      {
        "$lookup": {
          "from": "Profile",
          "localField": "between.objectId",
          "foreignField": "_id",
          "as": "profile",
          "pipeline": [
            {
              "$project": {
                "_id": 1,
                "syncedAt": 1
              }
            }
          ]
        }
      }
    ],
    "cursor": {},
    "maxTimeMS": 60000,
    "$db": "prod"
  },
  "ok": 1,
  "$clusterTime": {
    "clusterTime": {
      "$timestamp": "7436555763570442241"
    },
    "signature": {
      "hash": "bZn4WGbLoOKv1bqgTGIs6//E/4s=",
      "keyId": {
        "low": 8,
        "high": 1722412984,
        "unsigned": false
      }
    }
  },
  "operationTime": {
    "$timestamp": "7436555763570442241"
  }
}

What index do you have on Profile? Can you add all fields you’re using in the pipeline to the index so that it does not have to query the documents as part of the lookup but can use the data within the index that should be in RAM?

Here are my Profile indexes :

You’re pulling both id and synced in the $project in the lookup, try creating a compound index with both of those in:
_id: 1,
syncedAt: 1

Oh! It seems to achieve slightly better execution time (like 4 or 5 seconds, sometimes 6).

Surprisingly I don’t manage to retrieve 22 seconds, even by replacing syncedAt with something like firstName (I don’t have _id_1_firstName_1 index).

When I was facing these 22 seconds, I was reaching the limit storage of my cluster (15/17Go). Since then I have provide 20Go. Could the disk storage reaching the limit be an issue?

You wrote

Then I asked

But

not using $unionWith.

Believe me, I tried the $unionWith , the final document was the same, and the executing time was as long as before. Plus it makes the pipeline harder to read / understand to me. I appreciated trying this solution, thank you.

I don’t: in the 2nd stage I am excluding the profile used in the 1st stage.