Does the order of $skip and $limit matter in the pipeline?
I use mongoose aggregation which runs a pipeline of operations. With skip and limit appended at the end.
projectModel.aggregate(pipeline)
.sort({ updatedAt: -1 })
.skip(skip)
.limit(limit)
My pipeline simply looks like
$match(userId) > $lookup(html_collection) > $lookup(records_collection) > $sort(on updatedAt from mongoose) > $skip(mongoose) > $limit(mongoose)
What I observed during pagination is that the $limit is being respected but the $skip is happening only at end of the pipeline. For example:
Page1: skip = 0, limit = 10
The number of documents as per the .explain() clearing $match stage is 10.
Page 2: skip = 10, limit = 10
The number of documents clearing $match stage is 20 (skip + limit), and 20 documents made it to the next stage. $lookup is made on 20 documents. Slowing down our pipeline and it is at the last stage when $skip works to discard the first 10 documents. Here we wasted work on the first 10 documents.
This was causing an issue in our pipeline and it was getting slow for pagination.
SOLUTION: What we ended up doing was to move $limit followed by $skip after $match.
then $limit = skip + limit, $skip = skip. We thought that limiting documents as limit = skip + limit will fetch documents and $skip at the next stage would reject unnecessary documents, thereby giving $lookup stages only the expected documents.
Page 1: skip = 0, limit = 10
$limit = 0 + 10 = 10 followed by $skip = 0
Page 2: skip = 10, limit = 10
$limit = 10 + 10 followed by $skip = 10
our pipeline now looks like:
$match(userId) > $sort(updatedAt) > $limit(limit + skip) > $skip (skip) > $lookup(html_collection) > $lookup(records_collection)
Here is the sample collection scehema for your refernce:
PROJECT COLLECTION
{
id: ObjectId,
records: [ObjectId],
userId: ObjectId
}
RECORDS COLLECTION
{
id: ObjectId,
text: string
}
HTML COLLECTION
{
id: ObjectId,
html: string,
projectId: ObjectId
}
QUESTIONS:
- Is this behavior intended or something is wrong here with the $skip and $limit?
- If it is not a bug then why did $limit was respected even if it was present at end of stage and $skip was not respected?
- Is the solution that we came up with is correct? as I think for last page there are too many documents clearing the $match stage, but that is also something that MongoDB internally did right…as seen on page 2 of in our case?