Docs Menu
Docs Home
/
MongoDB Spark Connector
/ /

Batch Write Configuration Options

On this page

  • Overview
  • Specifying Properties in connection.uri

You can configure the following properties when writing data to MongoDB in batch mode.

Note

If you use SparkConf to set the connector's write configurations, prefix spark.mongodb.write. to each property.

Property name
Description
connection.uri
Required.
The connection string configuration key.

Default: mongodb://localhost:27017/
database
Required.
The database name configuration.
collection
Required.
The collection name configuration.
idFieldList
Field or list of fields by which to split the collection data. To specify more than one field, separate them using a comma as shown in the following example:
"fieldName1,fieldName2"
Default: _id
maxBatchSize
Specifies the maximum number of operations to batch in bulk operations.

Default: 512
mongoClientFactory
MongoClientFactory configuration key.
You can specify a custom implementation that must implement the com.mongodb.spark.sql.connector.connection.MongoClientFactory interface.

Default: com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory
operationType
Specifies the type of write operation to perform. You can set this to one of the following values:
  • insert: Insert the data.

  • replace: Replace an existing document that matches the idFieldList value with the new data. If no match exists, the value of upsertDocument indicates whether the connector inserts a new document.

  • update: Update an existing document that matches the idFieldList value with the new data. If no match exists, the value of upsertDocument indicates whether the connector inserts a new document.


Default: replace
ordered
Specifies whether to perform ordered bulk operations.

Default: true
writeConcern.journal
Specifies j, a write-concern option to enable request for acknowledgment that the data is confirmed on on-disk journal for the criteria specified in the w option. You can specify either true or false.

For more information on j values, see the MongoDB server guide on the WriteConcern j option.
writeConcern.w
Specifies w, a write-concern option to request acknowledgment that the write operation has propagated to a specified number of MongoDB nodes. For a list of allowed values for this option, see WriteConcern in the MongoDB manual.

Default: 1
writeConcern.wTimeoutMS
Specifies wTimeoutMS, a write-concern option to return an error when a write operation exceeds the number of milliseconds. If you use this optional setting, you must specify a nonnegative integer.

For more information on wTimeoutMS values, see the MongoDB server guide on the WriteConcern wtimeout option.

If you use SparkConf to specify any of the previous settings, you can either include them in the connection.uri setting or list them individually.

The following code example shows how to specify the database, collection, and maxBatchSize setting as part of the connection.uri setting:

spark.mongodb.write.connection.uri=mongodb://127.0.0.1/myDB.myCollection?maxBatchSize=256

To keep the connection.uri shorter and make the settings easier to read, you can specify them individually instead:

spark.mongodb.write.connection.uri=mongodb://127.0.0.1/
spark.mongodb.write.database=myDB
spark.mongodb.write.collection=myCollection
spark.mongodb.write.maxBatchSize=256

Important

If you specify a setting in both the connection.uri and on its own line, the connection.uri setting takes precedence. For example, in the following configuration, the connection database is foobar:

spark.mongodb.write.connection.uri=mongodb://127.0.0.1/foobar
spark.mongodb.write.database=bar

Back

Write to MongoDB in Batch Mode