Projection is not increasing query speed for big documents?

In the service I’m working on there are two collections (collection1 & collection2). They are more or less comparable in the number of documents (15 M vs 12 M). But average size of document is about 10 times different (800 B vs 8 kB). I need to make a query that returns about 45k documents. It takes about 1-3 seconds on collection1 (acceptable in my task) and 25-30 seconds on collection2 (too slow).

Now, the query that I’m making has a projection for only several fields. But according to explain, I see that it performs FETCH first and only then projection is applied. And this FETCH stage takes all the time, I guess because it takes the whole document which is rather big in this collection. And only then projection is applied.

I’ve tried to create an index that covers all the fields that I’m projecting. But mongo ignores it. And if I specify it with the .hint() it is still doing that FETCH stage. I wonder if there is anything I can do to improve performance of collection2 query.

I’m running MongoDB 5.0.26 Enterprise (on Atlas)

Below are some execution stats

The example of query I’m making:

db.collection|1 or 2|.find
    {'ClientId': 'foobar', TimeStamp: {'$gte': ISODate('2021-01-01T00:03:43+00:00'), '$lte': ISODate('2024-05-14T22:30:15+00:00')},
    {'Field1': 1, 'Field2': 1, 'TimeStamp': 1, '_id': 0, 'sub.document.field': 1}
 )
Execution stats on collection1 (just for reference)
{
  executionSuccess: true,
  nReturned: 44117,
  executionTimeMillis: 1745,
  totalKeysExamined: 44117,
  totalDocsExamined: 44117,
  executionStages: {
    stage: 'PROJECTION_DEFAULT',
    nReturned: 44117,
    executionTimeMillisEstimate: 1539,
    works: 44118,
    advanced: 44117,
    needTime: 0,
    needYield: 0,
    saveState: 123,
    restoreState: 123,
    isEOF: 1,
    transformBy: {
      Field1: 1,
      Field2: 1,
      TimeStamp: 1,
      _id: 0,
      'sub.document.field': 1
    },
    inputStage: {
      stage: 'FETCH',
      nReturned: 44117,
      executionTimeMillisEstimate: 1454,
      works: 44118,
      advanced: 44117,
      needTime: 0,
      needYield: 0,
      saveState: 123,
      restoreState: 123,
      isEOF: 1,
      docsExamined: 44117,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 44117,
        executionTimeMillisEstimate: 117,
        works: 44118,
        advanced: 44117,
        needTime: 0,
        needYield: 0,
        saveState: 123,
        restoreState: 123,
        isEOF: 1,
        keyPattern: {
          ClientId: 1,
          TimeStamp: 1
        },
        indexName: 'ClientId_1_TimeStamp_1',
        isMultiKey: false,
        multiKeyPaths: {
          ClientId: [],
          TimeStamp: []
        },
        isUnique: true,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: {
          ClientId: [
            '["foobar", "foobar"]'
          ],
          TimeStamp: [
            '[new Date(1609459423000), new Date(1715725815000)]'
          ]
        },
        keysExamined: 44117,
        seeks: 1,
        dupsTested: 0,
        dupsDropped: 0
      }
    }
  }
}
Execution stats on collection2
{
  executionSuccess: true,
  nReturned: 44079,
  executionTimeMillis: 26309,
  totalKeysExamined: 44079,
  totalDocsExamined: 44079,
  executionStages: {
    stage: 'PROJECTION_DEFAULT',
    nReturned: 44079,
    executionTimeMillisEstimate: 25661,
    works: 44080,
    advanced: 44079,
    needTime: 0,
    needYield: 0,
    saveState: 1633,
    restoreState: 1633,
    isEOF: 1,
    transformBy: {
      Field1: 1,
      Field2: 1,
      TimeStamp: 1,
      _id: 0,
      'sub.document.field': 1
    },
    inputStage: {
      stage: 'FETCH',
      nReturned: 44079,
      executionTimeMillisEstimate: 25298,
      works: 44080,
      advanced: 44079,
      needTime: 0,
      needYield: 0,
      saveState: 1633,
      restoreState: 1633,
      isEOF: 1,
      docsExamined: 44079,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 44079,
        executionTimeMillisEstimate: 323,
        works: 44080,
        advanced: 44079,
        needTime: 0,
        needYield: 0,
        saveState: 1633,
        restoreState: 1633,
        isEOF: 1,
        keyPattern: {
          ClientId: 1,
          TimeStamp: 1
        },
        indexName: 'ClientId_1_TimeStamp_1',
        isMultiKey: false,
        multiKeyPaths: {
          ClientId: [],
          TimeStamp: []
        },
        isUnique: true,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: {
          ClientId: [
            '["foobar", "foobar"]'
          ],
          TimeStamp: [
            '[new Date(1609459423000), new Date(1715725815000)]'
          ]
        },
        keysExamined: 44079,
        seeks: 1,
        dupsTested: 0,
        dupsDropped: 0
      }
    }
  }
}
Execution stats on collection2 with hint for the index with all projected fields
{
  executionSuccess: true,
  nReturned: 44079,
  executionTimeMillis: 24570,
  totalKeysExamined: 117765,
  totalDocsExamined: 44079,
  executionStages: {
    stage: 'PROJECTION_DEFAULT',
    nReturned: 44079,
    executionTimeMillisEstimate: 24286,
    works: 117766,
    advanced: 44079,
    needTime: 73686,
    needYield: 0,
    saveState: 1500,
    restoreState: 1500,
    isEOF: 1,
    transformBy: {
      Field1: 1,
      Field2: 1,
      TimeStamp: 1,
      _id: 0,
      'sub.document.field': 1
    },
    inputStage: {
      stage: 'FETCH',
      nReturned: 44079,
      executionTimeMillisEstimate: 23882,
      works: 117766,
      advanced: 44079,
      needTime: 73686,
      needYield: 0,
      saveState: 1500,
      restoreState: 1500,
      isEOF: 1,
      docsExamined: 44079,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 44079,
        executionTimeMillisEstimate: 316,
        works: 117766,
        advanced: 44079,
        needTime: 73686,
        needYield: 0,
        saveState: 1500,
        restoreState: 1500,
        isEOF: 1,
        keyPattern: {
          ClientId: 1,
          TimeStamp: 1,
          Field1: 1,
          Field2: 1,
          'sub.document.field': 1
        },
        indexName: 'ClientId_1_TimeStamp_1_Field1_1_Field2_1_sub.document.field_1',
        isMultiKey: true,
        multiKeyPaths: {
          ClientId: [],
          TimeStamp: [],
          Field1: [],
          Field2: [],
          'sub.document.field': [
            'sub.document.field'
          ]
        },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: {
          ClientId: [
            '["foobar", "foobar"]'
          ],
          TimeStamp: [
            '[new Date(1609459423000), new Date(1715725815000)]'
          ],
          Field1: [
            '[MinKey, MaxKey]'
          ],
          Field2: [
            '[MinKey, MaxKey]'
          ],
          'sub.document.field': [
            '[MinKey, MaxKey]'
          ]
        },
        keysExamined: 117765,
        seeks: 1,
        dupsTested: 117765,
        dupsDropped: 73686
      }
    }
  }
}

This is expected. A Projection only affects what data is sent back over the network to the client, unless it enables a covered query.

In an index-covered query, all the fields in your query exist in an index. This means the database can avoid the FETCH stage, and simply read the values from the index.

The problem you have is that the field ‘sub.document.field’ contains an array, which means your index is a multi-key index. The array field in indexed by sorting the values and removing duplicates. It can tell you whether an element occurs in an array, but it can’t tell you where or how many times that element is there.

Because you are returning the array field, the database needs to FETCH the entire document from disk.

See https://www.mongodb.com/docs/manual/core/indexes/index-types/index-multikey/#covered-queries

2 Likes

Thanks @Peter_Hubbard, this makes sense for me now! I tried a query without that field and it indeed returns results much faster with only IXSCAN and PROJECTION_COVERED stages.
I will check with the team but I afraid it is too late to change schema and this stage, as too many services already depend on it. Are there any tricks or advises on how my query speed can be improved? For additional info, I need only first element from that array in subdocument field (the array may be empty though)

You could cache the first element of the array in a different field and index that. Or you might be able to get away with indexing the field ‘array.0’ but I have not tested this.

1 Like

Unfortunately indexing on array.0 doesn’t work. It’s not possible to specify that in projection. I’ve tried with aggregation pipeline where I can do {$project: {myName: {$first: "$array.field"}}. But it’s just doing PROJECTION_DEFAULT.

I didn’t want to touch the app that creates the data because it is so legacy and fragile. But looks like this is the only way for me.

Thanks!

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.