Batch Write Configuration Options
On this page
Overview
You can configure the following properties when writing data to MongoDB in batch mode.
Note
If you use SparkConf
to set the connector's write configurations,
prefix spark.mongodb.write.
to each property.
Property name | Description | |
---|---|---|
connection.uri | Required. The connection string configuration key. Default: mongodb://localhost:27017/ | |
database | Required. The database name configuration. | |
collection | Required. The collection name configuration. | |
idFieldList | Field or list of fields by which to split the collection data. To
specify more than one field, separate them using a comma as shown
in the following example:
Default: _id | |
maxBatchSize | Specifies the maximum number of operations to batch in bulk
operations. Default: 512 | |
mongoClientFactory | MongoClientFactory configuration key. You can specify a custom implementation that must implement the
com.mongodb.spark.sql.connector.connection.MongoClientFactory
interface.Default: com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory | |
operationType | Specifies the type of write operation to perform. You can set
this to one of the following values:
Default: replace | |
ordered | Specifies whether to perform ordered bulk operations. Default: true | |
writeConcern.journal | Specifies j , a write-concern option to enable request for
acknowledgment that the data is confirmed on on-disk journal for
the criteria specified in the w option. You can specify
either true or false .For more information on j values, see the MongoDB server
guide on the
WriteConcern j option. | |
writeConcern.w | Specifies w , a write-concern option to request acknowledgment
that the write operation has propagated to a specified number of
MongoDB nodes. For a list
of allowed values for this option, see WriteConcern in the MongoDB manual.Default: 1 | |
writeConcern.wTimeoutMS | Specifies wTimeoutMS , a write-concern option to return an error
when a write operation exceeds the number of milliseconds. If you
use this optional setting, you must specify a nonnegative integer.For more information on wTimeoutMS values, see the MongoDB server
guide on the
WriteConcern wtimeout option. |
Specifying Properties in connection.uri
If you use SparkConf to specify any of the previous settings, you can
either include them in the connection.uri
setting or list them individually.
The following code example shows how to specify the
database, collection, and maxBatchSize
setting as part of the connection.uri
setting:
spark.mongodb.write.connection.uri=mongodb://127.0.0.1/myDB.myCollection?maxBatchSize=256
To keep the connection.uri
shorter and make the settings easier to read, you can
specify them individually instead:
spark.mongodb.write.connection.uri=mongodb://127.0.0.1/ spark.mongodb.write.database=myDB spark.mongodb.write.collection=myCollection spark.mongodb.write.maxBatchSize=256
Important
If you specify a setting in both the connection.uri
and on its own line,
the connection.uri
setting takes precedence.
For example, in the following configuration, the connection
database is foobar
:
spark.mongodb.write.connection.uri=mongodb://127.0.0.1/foobar spark.mongodb.write.database=bar