Backuping a MongoDB cluster composed of three replicated MongoDB instances on a Kubernetes on-premise cluster using Velero and MinIO with Restic, triggers this fatal error of one of them after restoring the backup:
"ERROR","verbose_level_id":-3,"msg":"__wt_block_read_off:226:WiredTigerHS.wt: potential hardware corruption, read checksum error for 4096B block at offset 172032: block header checksum of 0x63755318 doesn't match expected checksum of 0x22b37ec4"
"ERROR","verbose_level_id":-3,"msg":"__wt_block_read_off:235:WiredTigerHS.wt: fatal read error","error_str":"WT_ERROR: non-specific WiredTiger error","error_code":-31802
"ERROR","verbose_level_id":-3,"msg":"__wt_block_read_off:235:the process must exit and restart","error_str":"WT_PANIC: WiredTiger library panic","error_code":-31804
Fatal assertion","attr":{"msgid":50853,"file":"src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp","line":712
\n\n***aborting after fassert() failure\n\n
Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n
Please note that we tested it using versions 4.4.11 and 6.0.5
The restore works well for all our application (including two MongoDB nodes) except one (or two sometimes) MongoDB node which is most of the time in a “Back-off restarting failed container” state (even after having triggered a manual “mongod --repair” on it).
We think that doing the backup of the three replicated MongoDB instances, maybe when some MongoDB synchronisation is ongoing (the services connected to MongoDB are all off during the backup), causes the backup to be seen as corrupted during the restore. Do you know what could cause this issue and how we could solve it?