Menu Docs

Map-Reduce and Sharded Collections

Observação

Pipeline de Agregação como uma Alternativa ao map-reduce

A partir do MongoDB 5.0, map-reduce está obsoleto:

  • Em vez de map-reduce, você deve usar um aggregation pipeline. aggregation pipeline fornece melhor desempenho e usabilidade do que a redução de mapa.

  • Você pode reescrever operações de map-reduce utilizando aggregation pipeline stages, como $group, $merge e outros.

  • Nas operações de map-reduce que exigem funcionalidade personalizada, você pode usar os operadores de agregação $accumulator e $function. Você pode usar esses operadores para definir expressões de agregação personalizadas no JavaScript.

Para obter exemplos de alternativas de aggregation pipeline para map-reduce, consulte:

Map-reduce supports operations on sharded collections, both as an input and as an output. This section describes the behaviors of mapReduce specific to sharded collections.

When using sharded collection as the input for a map-reduce operation, mongos will automatically dispatch the map-reduce job to each shard in parallel. There is no special option required. mongos will wait for jobs on all shards to finish.

If the out field for mapReduce has the sharded value, MongoDB shards the output collection using the _id field as the shard key.

To output to a sharded collection:

  • If the output collection does not exist, create the sharded collection first.

  • If the output collection already exists but is not sharded, map-reduce fails.

  • For a new or an empty sharded collection, MongoDB uses the results of the first stage of the map-reduce operation to create the initial chunks distributed among the shards.

  • mongos dispatches, in parallel, a map-reduce post-processing job to every shard that owns a chunk. During the post-processing, each shard will pull the results for its own chunks from the other shards, run the final reduce/finalize, and write locally to the output collection.

Observação

  • During later map-reduce jobs, MongoDB splits chunks as needed.

  • Balancing of chunks for the output collection is automatically prevented during post-processing to avoid concurrency issues.