Mongod random crashes on Windows: FileRenameFailed

Good day.

First, I know they are many posts that relate to my issue, but none provided me with a fix. :frowning:

Running:
Mongo Server Community 5.0.5
Windows Server 2019

The service runs with a domain user for which we gave full control over the root path of D:\Mongo\ (in which is the data and log folder). Additionally, weā€™ve also setup our AV to exclude scanning within D:\Mongo\ too !

Every so often (too often!) the mongod.exe process still seems to crash with a FileRenamedFailed: Access is deniedā€¦ error. Hereā€™s a snipped of the log file:

{"t":{"$date":"2023-03-02T13:18:50.717-05:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.10.42.251:56707","uuid":"6a36177a-b425-400a-a1a9-1fc735f56ab0","connectionId":165612,"connectionCount":9}}
{"t":{"$date":"2023-03-02T13:18:58.738-05:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn165612","msg":"Connection ended","attr":{"remote":"10.10.42.251:56707","uuid":"6a36177a-b425-400a-a1a9-1fc735f56ab0","connectionId":165612,"connectionCount":8}}
{"t":{"$date":"2023-03-02T13:18:59.738-05:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.10.42.251:56884","uuid":"0c8e1898-f54c-49dd-8605-bb31d7f2b909","connectionId":165613,"connectionCount":9}}
{"t":{"$date":"2023-03-02T13:19:11.933-05:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn165613","msg":"Connection ended","attr":{"remote":"10.10.42.251:56884","uuid":"0c8e1898-f54c-49dd-8605-bb31d7f2b909","connectionId":165613,"connectionCount":8}}

{"t":{"$date":"2023-03-02T13:19:11.990-05:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"ftdc","msg":"Writing fatal message","attr":{"message":"terminate() called. An exception is active; attempting to gather more information"}}
{"t":{"$date":"2023-03-02T13:19:12.032-05:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"ftdc","msg":"Writing fatal message","attr":{"message":"DBException::toString(): FileRenameFailed: Access is denied\nActual exception type: class mongo::error_details::ExceptionForImpl<37,class mongo::AssertionException>\n"}}

{"t":{"$date":"2023-03-02T13:19:12.766-05:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.10.42.251:57108","uuid":"9d15f9b4-8e8a-4659-9377-a78356a0c731","connectionId":165614,"connectionCount":9}}
{"t":{"$date":"2023-03-02T13:19:12.766-05:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn165614","msg":"Connection ended","attr":{"remote":"10.10.42.251:57108","uuid":"9d15f9b4-8e8a-4659-9377-a78356a0c731","connectionId":165614,"connectionCount":8}}
{"t":{"$date":"2023-03-02T13:19:13.768-05:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.10.42.251:57120","uuid":"cd4b4ba0-4f96-4494-8073-7d408e924f4f","connectionId":165615,"connectionCount":9}}
{"t":{"$date":"2023-03-02T13:19:13.768-05:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn165615","msg":"Connection ended","attr":{"remote":"10.10.42.251:57120","uuid":"cd4b4ba0-4f96-4494-8073-7d408e924f4f","connectionId":165615,"connectionCount":8}}
{"t":{"$date":"2023-03-02T13:19:14.390-05:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"Checkpointer","msg":"WiredTiger message","attr":{"message":"[1677781154:390576][14408:140723038999488], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 5515, snapshot max: 5515 snapshot count: 0, oldest timestamp: (1677781152, 1) , meta checkpoint timestamp: (1677781152, 1) base write gen: 108733"}}
{"t":{"$date":"2023-03-02T13:19:14.770-05:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.10.42.251:57137","uuid":"43c1c2bf-d3c5-49d3-bb1b-4e83e16e1440","connectionId":165616,"connectionCount":9}}
{"t":{"$date":"2023-03-02T13:19:14.770-05:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn165616","msg":"Connection ended","attr":{"remote":"10.10.42.251:57137","uuid":"43c1c2bf-d3c5-49d3-bb1b-4e83e16e1440","connectionId":165616,"connectionCount":8}}

In all the posts out there, none of them resolved this crash for us:

  • Most of them related to an AV scanning files with the /data/: weā€™ve excluded scanning within the folder!

  • Some talk about permissions problems: weā€™ve given full control to the user running mongod within the root of the Mongo files!

  • Iā€™ve even seen posts talking about a bad server locale setup (but that would be when the log shows unicode chars not processed properly or something (log would show something like {"message":"DBException::toString(): FileRenameFailed: \ufffdv\ufffd\ufffd...), but that doesnā€™t seem to be our case from viewing our log. Plus, our server is set with a ā€œEnglishā€ local:


Iā€™m running out of ideas hereā€¦ Upgrade to latest Mongo? But why havenā€™t I found anything regarding this that says you need to upgrade if thatā€™s the case?

Any ideas would be super appreciated. Much thanks for your time folks.

Regards,
Patrick

Hi @Patrick_Roy

Sorry youā€™re having difficulty with this issue, but unfortunately I believe the error FileRenameFailed originated from outside the server, so itā€™s typically an OS level issue.

One thing I can think of is SERVER-58085, which will warn you if the path is a network drive (which is known to sometimes result in this). SERVER-28194 is another, but that was fixed a long time ago.

Since youā€™re running version 5.0.5 and the latest in the 5.0 series is 5.0.15, I would start by upgrading first. Upgrading to the latest version ensures that youā€™re not seeing a fixed issue, so itā€™s usually a good idea to try first.

If your dbpath is not on a network drive, and you have upgraded to 5.0.15, then perhaps the best option is to open a SERVER ticket describing the situation.

Best regards
Kevin

Hello @kevinadi. Thanks for your reply.

Our server currently randomly crashing seems to be our Arbiter (we are running with PSA). Although weā€™ve had the crash on another server that has only 1 instance (primary only - testing server). These 2 servers that did produce the crash all have the dbPath set to a local disk.

Our next step, since we donā€™t want to fall too much behind in upgrades, is to upgrade to latest Mongo 6.0.x LTS version, and hope all crashes magically goes away :wink: Although, Iā€™m still puzzled as to why weā€™re getting the crash. I mean, if doing an upgrade fixes it, then I should be able to find the relevant fix that resolves the issue, but didnā€™t find anything yetā€¦

Hi folks, just to share an update on this particular crashā€¦ We know that the crash would occasionally occur when Mongo renamed this file: \diagnostic.data\metrics.interim to metrics.interim.temp.

Few steps I took to try and bypass the manipulation of this file (itā€™s only a diagnostic / metrics file info of some kind, so not really needed (?))

  1. Upgraded our Mongo instances to MongoDB 6.0.5 Community
  2. Tried to forcefully disable free Monitoring in mongod.cfg:
    cloud:
      monitoring:
        free:
          state: off
    
  3. Tried to forcefully disable diagnostic data collection in mongod.cfg:
    setParameter:
    	diagnosticDataCollectionEnabled: false
    
  4. Finally, tried to disable Telemetry with : mongosh --nodb --eval "disableTelemetry()"

Results: it seems like it is the last point (4) that fixed the issue by disabling telemetry. I am not sure though if itā€™s a combination of all points that did itā€¦ But so far, itā€™s but up over a month without a crash (was crashing 3-4 times a month before!).

Regardless, we definitely shouldnā€™t need to disable all that stuff. To me, it looks like thereā€™s a bug somewhere with specifics setup (but what?) canā€™t sayā€¦

Cheers! Pat

I just had the same issue with mongodb 4.4.22.

I enabled file auditing on windows and it appears Kaspersky is to blame:

Objekt:
	Objektserver:		Security
	Objekttyp:		File
	Objektname:		C:\mongo\db\diagnostic.data\metrics.interim
	Handle-ID:		0x82c
	Ressourcenattribute:	S:AI

Prozessinformationen:
	Prozess-ID:		0xcf8
	Prozessname:		C:\Program Files (x86)\Kaspersky Lab\Kaspersky Security for Windows Server\kavfswp.exe

Zugriffsanforderungsinformationen:
	Zugriffe:		Attribute schreiben