Map-Reduce and Sharded Collections

이 페이지의 내용

Sharded Collection as Input
Sharded Collection as Output

참고

맵 리듀스의 대안으로서의 집계 파이프라인

MongoDB 5.0부터 맵 리듀스는 더 이상 사용되지 않습니다.

맵 리듀스 대신 집계 파이프라인을 사용해야 합니다. 집계 파이프라인은 맵 리듀스보다 성능과 유용성 측면에서 더 우수합니다.
$group, $merge 등과 같은 집계 파이프라인 단계 를 사용하여 맵 리듀스 작업을 다시 작성할 수 있습니다.
사용자 지정 기능이 필요한 맵 리듀스 작업의 경우 $accumulator 및 $function 집계 연산자를 사용할 수 있습니다. 이러한 연산자를 사용하여 JavaScript에서 사용자 지정 집계 표현식을 정의할 수 있습니다.

맵 리듀스 대안으로서의 집계 파이프라인 예시는 다음을 참조하세요.

Map-reduce supports operations on sharded collections, both as an input and as an output. This section describes the behaviors of mapReduce specific to sharded collections.

Sharded Collection as Input

When using sharded collection as the input for a map-reduce operation, mongos will automatically dispatch the map-reduce job to each shard in parallel. There is no special option required. mongos will wait for jobs on all shards to finish.

Sharded Collection as Output

If the out field for mapReduce has the sharded value, MongoDB shards the output collection using the _id field as the shard key.

To output to a sharded collection:

If the output collection does not exist, create the sharded collection first.
If the output collection already exists but is not sharded, map-reduce fails.
For a new or an empty sharded collection, MongoDB uses the results of the first stage of the map-reduce operation to create the initial chunks distributed among the shards.
mongos dispatches, in parallel, a map-reduce post-processing job to every shard that owns a chunk. During the post-processing, each shard will pull the results for its own chunks from the other shards, run the final reduce/finalize, and write locally to the output collection.

참고

During later map-reduce jobs, MongoDB splits chunks as needed.
Balancing of chunks for the output collection is automatically prevented during post-processing to avoid concurrency issues.

돌아가기

맵 축소

동시성