Write to MongoDB in Batch Mode
Overview
To write data to MongoDB, call the write()
method on your
Dataset<Row>
object. This method returns a
DataFrameWriter
object, which you can use to specify the format and other configuration settings for your
batch write operation.
You must specify the following configuration settings to write to MongoDB:
Setting | Description |
---|---|
dataFrame.write.format() | Specifies the format of the underlying output data source. Use mongodb
to write to MongoDB. |
dataFrame.write.option() | Use the For a list of batch write configuration options, see the Batch Write Configuration Options guide. |
The following example creates a DataFrame from a json
file and
saves it to the people.contacts
collection in MongoDB:
Dataset<Row> dataFrame = spark.read().format("json") .load("example.json"); dataFrame.write().format("mongodb") .mode("overwrite") .option("database", "people") .option("collection", "contacts") .save();
Tip
DataFrame Type
DataFrame
doesn't exist as a class in the Java API. Use
Dataset<Row>
to reference a DataFrame.
To write data to MongoDB, call the write
function on your
DataFrame
object. This function returns a
DataFrameWriter
object, which you can use to specify the format and other configuration settings for your
batch write operation.
You must specify the following configuration settings to write to MongoDB:
Setting | Description |
---|---|
dataFrame.write.format() | Specifies the format of the underlying output data source. Use mongodb
to write to MongoDB. |
dataFrame.write.option() | Use the For a list of batch write configuration options, see the Batch Write Configuration Options guide. |
The following example uses the createDataFrame()
function on the SparkSession
object to create a DataFrame
object from a list of tuples containing names
and ages and a list of column names. The example then writes this DataFrame
to the
people.contacts
collection in MongoDB.
dataFrame = spark.createDataFrame([("Bilbo Baggins", 50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77), ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"]) dataFrame.write.format("mongodb") .mode("append") .option("database", "people") .option("collection", "contacts") .save()
To write data to MongoDB, call the write()
method on your
DataFrame
object. This method returns a
DataFrameWriter
object, which you can use to specify the format and other configuration settings for your
batch write operation.
You must specify the following configuration settings to write to MongoDB:
Setting | Description |
---|---|
dataFrame.write.format() | Specifies the format of the underlying output data source. Use mongodb
to write to MongoDB. |
dataFrame.write.option() | Use the For a list of batch write configuration options, see the Batch Write Configuration Options guide. |
The following example creates a DataFrame from a json
file and
saves it to the people.contacts
collection in MongoDB:
val dataFrame = spark.read.format("json") .load("example.json") dataFrame.write.format("mongodb") .mode("overwrite") .option("database", "people") .option("collection", "contacts") .save()
Warning
Save Modes
The MongoDB Spark Connector supports the following save modes:
append
overwrite
If you specify the overwrite
write mode, the connector drops the target
collection and creates a new collection that uses the
default collection options.
This behavior can affect collections that don't use the default options,
such as the following collection types:
Sharded collections
Collections with nondefault collations
Time-series collections
To learn more about save modes, see the Spark SQL Guide.
Important
If your write operation includes a field with a null
value,
the connector writes the field name and null
value to MongoDB. You can
change this behavior by setting the write configuration property
ignoreNullValues
.
For more information about setting the connector's write behavior, see Write Configuration Options.
API Documentation
To learn more about the types used in these examples, see the following Apache Spark API documentation: