Mongosync Large Collection stuck

Gvidas_Pranauskas · October 17, 2024, 9:04am

Hello,
I have a large collection in one MongoDB ReplicaSet cluster. I am trying to synchronize two clusters with mongosync, the process seems to work fine, until it hits this one large collection (let’s call it document). This collection has ~1.4B documents in it.

When I run mongosync , in the output, this is the log I can see constantly appearing:

{"time":"2024-10-17T08:49:15.618642Z","level":"debug","serverID":"f07fb8c7","mongosyncID":"coordinator","operationID":"985b60bc","clientType":"source","database":"mycompany","collectionUUID":"b244c65b-ef8a-423d-a17e-a495d239639d","operationDescription":"Sampling _id values from source collection to get partition bounds.","attemptNumber":0,"totalTimeSpent":"1.113µs","retryAttemptDurationSoFarSecs":0,"retryAttemptDurationLimitSecs":600,"collection":"document_match_entity","componentNames":["Partition Creation","determining `mycompany.document`’s ID range","Partition (Sample Docs from Index Tail)"],"timeElapsed":"28m20.00095419s","operationDescription":"Sampling _id values from source collection to get partition bounds.","message":"Long-running operation."}

I have just restarted the process, hoping to resolve this issue, but it is still present. It has been stuck like this for the past two days. All I can see changing in this log output is attemptNumber increasing.

I have started the mongosync process with these options:

"buildIndexes": "never"
--loadLevel 2

Is there a way to get around this problem? Or mongosync is not fit for synchronizing collections with such amount of documents?