Hello,
I have a large collection in one MongoDB ReplicaSet cluster. I am trying to synchronize two clusters with mongosync, the process seems to work fine, until it hits this one large collection (let’s call it document
). This collection has ~1.4B documents in it.
When I run mongosync
, in the output, this is the log I can see constantly appearing:
{"time":"2024-10-17T08:49:15.618642Z","level":"debug","serverID":"f07fb8c7","mongosyncID":"coordinator","operationID":"985b60bc","clientType":"source","database":"mycompany","collectionUUID":"b244c65b-ef8a-423d-a17e-a495d239639d","operationDescription":"Sampling _id values from source collection to get partition bounds.","attemptNumber":0,"totalTimeSpent":"1.113µs","retryAttemptDurationSoFarSecs":0,"retryAttemptDurationLimitSecs":600,"collection":"document_match_entity","componentNames":["Partition Creation","determining `mycompany.document`’s ID range","Partition (Sample Docs from Index Tail)"],"timeElapsed":"28m20.00095419s","operationDescription":"Sampling _id values from source collection to get partition bounds.","message":"Long-running operation."}
I have just restarted the process, hoping to resolve this issue, but it is still present. It has been stuck like this for the past two days. All I can see changing in this log output is attemptNumber
increasing.
I have started the mongosync
process with these options:
"buildIndexes": "never"
--loadLevel 2
Is there a way to get around this problem? Or mongosync
is not fit for synchronizing collections with such amount of documents?