On August 22, 2024 Node.js v22.7.0 introduced an incorrect optimization for buffer.write which can result in strings being encoded using ISO-8859-1 rather than UTF-8.
Though this issue is not directly within any of MongoDB’s products, the use of the fast API for buffer.write will be disabled with Node.js v22.8.0. Developers using MongoDB’s Node.js driver could experience potential data integrity issues with Node.js v22.7.0.
This issue only manifests if the following conditions are true:
- Node.js v22.7.0 is being used
- Documents are being written to MongoDB
- Those documents contain certain characters (ex: é or ) that are encodable with ISO-8859-1, but not ASCII. As a result Chinese/Japanese text would not be impacted as it’s not encodable using ISO-8859-1
As of September 3, 2024, Node.js v22.8.0 is available and contains a fix for the UTF-8 encoding issue present in Node.js v22.7.0.
Mitigating the Issue
To avoid potential data integrity issues due to this bug in the Node.js runtime it is recommended that Node.js v22.7.0 is not used at all.
MongoDB recommends only using Node.js runtime versions documented as compatible in production environments. At the time of writing, Node.js v22.x is not considered a compatible runtime for use with the MongoDB Node.js driver.
Understanding the Issue
To illustrate how this can occur, consider the following reproduction:
When run using a previous version of Node.js, the Buffer length is consistently evaluated for 20K iterations, a document is inserted into a MongoDB collection then successfully retrieved.
When the same reproduction is run using Node.js v22.7.0 however, invalid UTF-8 string data can be produced, which would then be inserted into the MongoDB collection, resulting in subsequent retrieval attempts failing.
Though MongoDB’s Node.js driver supports UTF-8 validation, that feature applies to decoding BSON strings that are being received from the MongoDB server. As the bug in Node.js v22.7.0 occurs when encoding strings as UTF-8, the invalid data can still be serialized to BSON and written to the database.
Note that if you’ve installed mongosh via homebrew for macOS it’s possible the underlying Node.js runtime may be Node.js v22.7.0, as homebrew auto-upgrades to the latest Node.js version by default. If any data was written to the database from these mongosh instances, and that data contained non-ASCII characters, this encoding issue may have occurred.
Identifying Data Integrity Issues in Active Clusters
MongoDB’s Commercial Support team maintains replica set consistency and remediation tools which can be used in the event of data corruption.
To determine if your data is impacted, the validate.js script can be used for 7.0 or greater as follows:
If the output contains entries similar to the following then invalid BSON documents have been detected in those collections.
This failure would be logged in the mongod.log as follows, which could be identified by grepping the logs:
For releases prior to the 7.0 release, a script can be leveraged to identify documents that contain the strings that have been incorrectly encoded due to the Node.js issue. An example Python script that leverages PyMongo is available as detection.py and has been tested against 4.2.25, 4.4.29, 5.0.28, and 6.0.16. Its output includes the _id and the database and collection.
Running the detection script produces a CSV file called to_fix.csv containing the same information output above:
Impacts to Backups
If a backup was taken with the incorrect encoding future restores of those backups will contain this issue. When restoring a backup taken from this period, you must execute the detection and remediation procedure to ensure that the invalid encoding is repaired.
REMEDIATION
Running the detection.py script produces a to_fix.csv file that can then be used to remediate the issue manually, or by running fix.py script. It is recommended that you leverage the script as an example and make it applicable to your environment.
If you are concerned about the impact of this issue, we recommend that you cross reference your data with any other records that can help verify your data integrity. For any further questions, please open a support case or start a chat with the Atlas Support team.