Best practices for defining JSON structure for MongoDB Documents

I need to import a large amount of records into MongoDB. However, they are in XML format. I have converted XML file to Json equivalent using python. However, later on when I started building mongo queries against that data some of them wouldn’t execute. Apparently, a certain field in a document was imported as a plain value because it was just one value. But it should have been an array since some other documents had array of values for the same field.

Is there a best practice around in defining JSON structure for MongoDB Documents. Such as, if at least one value in the document is an array of values than all documents should define an array type for that particular field?

Hi Aigerim! Thank you for your question!

You can validate the inserted and/or updated documents in a MongoDB collection by defining a JSON schema.

Here’s a simplified JSON schema validating an array field:

{
  "$jsonSchema": {
    "bsonType": "object",
    "required": ["arrayField"],
    "properties": {
      "arrayField": {
        "bsonType": "array",
        "items": {
          "bsonType": "string",
          "description": "must be a string"
        },
        "description": "must be an array of strings and is required"
      }
    }
  }
}

This schema ensures that the arrayField is consistently an array of strings, whether it contains one or multiple elements.