Slow total number of documents using attribute pattern

I am trying to improve the performance of our total count query. We have a collection called offers with 600k documents (Storage Size: 2.09GB, Logical Data Size: 3.73GB) . Each offer has ~150 associated attributes. Therefore we used your recommened attribute pattern.

Document structure:

{
  ...
  attributes: [
    { key: "color", value: "red" },
    { key: "price", value: 50 },
    ...100 attributes more...
  ]
}

Compound index:

{
    'attributes.key': 1,
    'attributes.value': 1,
}

Query:

db.offers.count({"attributes":{"$all":[{"$elemMatch":{"key":"color","value":{"$in":["red"]}}}]}})

The explain() output:

{ explainVersion: '1',
  queryPlanner: 
   { namespace: 'offers.offers',
     indexFilterSet: false,
     parsedQuery: 
      { attributes: 
         { '$elemMatch': 
            { '$and': 
               [ { key: { '$eq': 'color' } },
                 { value: { '$eq': 'red' } } ] } } },
     queryHash: '3EC4C516',
     planCacheKey: '37ABC9CD',
     maxIndexedOrSolutionsReached: false,
     maxIndexedAndSolutionsReached: false,
     maxScansToExplodeReached: false,
     winningPlan: 
      { stage: 'COUNT',
        inputStage: 
         { stage: 'FETCH',
           filter: 
            { attributes: 
               { '$elemMatch': 
                  { '$and': 
                     [ { key: { '$eq': 'color' } },
                       { value: { '$eq': 'red' } } ] } } },
           inputStage: 
            { stage: 'IXSCAN',
              keyPattern: { 'attributes.key': 1, 'attributes.value': 1 },
              indexName: 'attributes.key_1_attributes.value_1',
              isMultiKey: true,
              multiKeyPaths: 
               { 'attributes.key': [ 'attributes' ],
                 'attributes.value': [ 'attributes', 'attributes.value' ] },
              isUnique: false,
              isSparse: false,
              isPartial: false,
              indexVersion: 2,
              direction: 'forward',
              indexBounds: 
               { 'attributes.key': [ '["color", "color"]' ],
                 'attributes.value': [ '["red", "red"]' ] } } } },
     rejectedPlans: [] },
  command: 
   { count: 'offers',
     query: { attributes: { '$all': [ { '$elemMatch': { key: 'color', value: { '$in': [ 'red' ] } } } ] } },
     '$db': 'offers' },
  serverInfo: 
   { host: 'be1988957f70',
     port: 27017,
     version: '6.0.3',
     gitVersion: 'f803681c3ae19817d31958965850193de067c516' },
  serverParameters: 
   { internalQueryFacetBufferSizeBytes: 104857600,
     internalQueryFacetMaxOutputDocSizeBytes: 104857600,
     internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600,
     internalDocumentSourceGroupMaxMemoryBytes: 104857600,
     internalQueryMaxBlockingSortMemoryUsageBytes: 104857600,
     internalQueryProhibitBlockingMergeOnMongoS: 0,
     internalQueryMaxAddToSetBytes: 104857600,
     internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600 },
  ok: 1 }

This query is really really slow. It takes between 2 and 6 seconds. Is there anyway for us to improve the performance of the count?

Performance is based on a few things.

The quantity of data, which you shared.
The complexity of the computation, which you shared.
The capacity of the resources, which you did not shared.

What you see might be normal depending of the system resources allocated to perform the workload.

What are the specifications of your implementation?

1 Like

I’ve been running an M40 instance on MongoDB Atlas and a local MongoDB server on my M1 Macbook. Both servers have the same poor performance.

Hi @bene123,

Can you please share the explain plan with the execution stats explain(true) for this particular query so we can see where the problem might come from?

Also can you please confirm that your M40 has enough RAM to operate (all indexes fit in RAM) and you have some spare RAM to perform aggregations, in-memory sorts and queries (like this one).

Without these information I can only take wild guesses.

  1. Maybe this query returns 25% of the collection (500MB of compressed data) so 2-6 seconds would be quite good.
  2. Another problem I see is that 600k docs with 100 attributes in the array => Index contains 60 millions entries… Which is starting to be quite solid. Also, if 90% of your items have the “color” key, this means that the first part of the index isn’t filtering much docs and docs are only eliminated by the second part of the index. In this case, I would try to revert the index to see if the selectivity is better and thus less index entries are scanned. => {'attributes.value': 1, 'attributes.key': 1}
  3. Maybe your cluster is in South Africa and you are sending this query from a back-end in Japan => latency tax.

Cheers,
Maxime.