Write to MongoDB in Batch Mode

Overview

To write data to MongoDB, call the write() method on your Dataset<Row> object. This method returns a DataFrameWriter object, which you can use to specify the format and other configuration settings for your batch write operation.

You must specify the following configuration settings to write to MongoDB:

Setting	Description
`dataFrame.write.format()`	Specifies the format of the underlying output data source. Use `mongodb` to write to MongoDB.
`dataFrame.write.option()`	Use the `option` method to configure batch write settings, including the MongoDB deployment connection string, MongoDB database and collection, and destination directory. For a list of batch write configuration options, see the Batch Write Configuration Options guide.

The following example creates a DataFrame from a json file and saves it to the people.contacts collection in MongoDB:

Dataset<Row> dataFrame = spark.read().format("json")
                                     .load("example.json");
dataFrame.write().format("mongodb")
                 .mode("overwrite")
                 .option("database", "people")
                 .option("collection", "contacts")
                 .save();

Tip

DataFrame Type

DataFrame doesn't exist as a class in the Java API. Use Dataset<Row> to reference a DataFrame.

To write data to MongoDB, call the write function on your DataFrame object. This function returns a DataFrameWriter object, which you can use to specify the format and other configuration settings for your batch write operation.

You must specify the following configuration settings to write to MongoDB:

Setting	Description
`dataFrame.write.format()`	Specifies the format of the underlying output data source. Use `mongodb` to write to MongoDB.
`dataFrame.write.option()`	Use the `option` method to configure batch write settings, including the MongoDB deployment connection string, MongoDB database and collection, and destination directory. For a list of batch write configuration options, see the Batch Write Configuration Options guide.

The following example uses the createDataFrame() function on the SparkSession object to create a DataFrame object from a list of tuples containing names and ages and a list of column names. The example then writes this DataFrame to the people.contacts collection in MongoDB.

dataFrame = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77),
   ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"])
dataFrame.write.format("mongodb")
               .mode("append")
               .option("database", "people")
               .option("collection", "contacts")
               .save()

To write data to MongoDB, call the write() method on your DataFrame object. This method returns a DataFrameWriter object, which you can use to specify the format and other configuration settings for your batch write operation.

You must specify the following configuration settings to write to MongoDB:

Setting	Description
`dataFrame.write.format()`	Specifies the format of the underlying output data source. Use `mongodb` to write to MongoDB.
`dataFrame.write.option()`	Use the `option` method to configure batch write settings, including the MongoDB deployment connection string, MongoDB database and collection, and destination directory. For a list of batch write configuration options, see the Batch Write Configuration Options guide.

The following example creates a DataFrame from a json file and saves it to the people.contacts collection in MongoDB:

val dataFrame = spark.read.format("json")
                          .load("example.json")
dataFrame.write.format("mongodb")
               .mode("overwrite")
               .option("database", "people")
               .option("collection", "contacts")
               .save()

Warning

Save Modes

The MongoDB Spark Connector supports the following save modes:

append
overwrite

If you specify the overwrite write mode, the connector drops the target collection and creates a new collection that uses the default collection options. This behavior can affect collections that don't use the default options, such as the following collection types:

Sharded collections
Collections with nondefault collations
Time-series collections

To learn more about save modes, see the Spark SQL Guide.

Important

If your write operation includes a field with a null value, the connector writes the field name and null value to MongoDB. You can change this behavior by setting the write configuration property ignoreNullValues.

For more information about setting the connector's write behavior, see Write Configuration Options.

API Documentation

To learn more about the types used in these examples, see the following Apache Spark API documentation:

Back

Configuration