Team,
I am trying to recover a sharded cluster from filesystem snapshot. At the time of backup, oplogs were collected incrementally from each shard [from a chosen secondary] on top of the data from filesystem snapshot based on timestamps.
During recovery, all shards could be started up using the data from snapshot. The config server, data shards and mongos are all booted up.
The incremental oplog dumps are then applied using mongorestore on each shard [on the primaries that were started using filesystem snapshot] with oplog replay switch. It succeeded on the config server but failed on a data shard with the below error.
2024-09-09T19:22:28.032+0300 preparing collections to restore from
2024-09-09T19:22:28.034+0300 replaying oplog
2024-09-09T19:22:29.586+0300 Failed: restore error: error applying oplog: applyOps: (StaleConfig) sharding status of collection testDb.testCollection is not currently known and needs to be recovered
Logs from the relevant shard server:
{"t":{"$date":"2024-09-09T19:22:29.585+03:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn26","msg":"Slow query","attr":{"type":"command","ns":"testDb.testCollection","appName":"mongorestore","command":{"applyOps":[{"ts":{"$timestamp":{"t":1725894549,"i":1}},"t":1,"v":2,"op":"i","ns":"testDb.testCollection","o":{"_id":{"$oid":"66df0f9597f7b2711ffa0b73"},"i":100,"date":{"$date":"2024-09-09T15:09:09.168Z"}},"o2":{"_id":{"$oid":"66df0f9597f7b2711ffa0b73"}},"lsid":{"id":{"$uuid":"f4d04e23-6daf-4458-ae88-0a06bd0a8148"},"uid":{"$binary":{"base64":"O0CMtIVItQN4IsEOsJdrPL8s7jv5xwh5a/A5Qfvs2A8=","subType":"0"}}},"txnNumber":1,"prevOpTime":{"ts":{"$timestamp":{"t":0,"i":0}},"t":-1}}],"lsid":{"id":{"$uuid":"7fd15986-3c19-44f8-bc8c-4b149bbf9a8f"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1725898946,"i":3}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"admin","$readPreference":{"mode":"primaryPreferred"}},"totalOplogSlotDurationMicros":843185,"numYields":0,"ok":0,"errMsg":"sharding status of collection testDb.testCollection is not currently known and needs to be recovered","errName":"StaleConfig","errCode":13388,"reslen":449,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":2}},"FeatureCompatibilityVersion":{"acquireCount":{"r":1,"w":2}},"ReplicationStateTransition":{"acquireCount":{"w":3}},"Global":{"acquireCount":{"r":1,"w":2}},"Database":{"acquireCount":{"w":2}},"Collection":{"acquireCount":{"w":1}},"Mutex":{"acquireCount":{"r":3}}},"flowControl":{"acquireCount":1},"readConcern":{"level":"local","provenance":"implicitDefault"},"storage":{"data":{"bytesRead":39108,"timeReadingMicros":2807}},"cpuNanos":2598481,"remote":"127.0.0.1:38548","protocol":"op_msg","durationMillis":843}}
Can this be looked at please ? What could’ve potentially led to this error ? I am using MongoDB server v7.0.9.
Thanks and Regards.