Docs Home → MongoDB Spark Connector
Write to MongoDB
On this page
To create a DataFrame, first create a SparkSession object, then use the object's createDataFrame()
function.
The sparkR
shell provides a default SparkSession object called
spark
.
To create a DataFrame, use the createDataFrame
method
to convert an R data.frame
to a Spark DataFrame. To save the
DataFrame to MongoDB, use the write.df()
method:
charactersRdf <- data.frame(list(name=c("Bilbo Baggins", "Gandalf", "Thorin", "Balin", "Kili", "Dwalin", "Oin", "Gloin", "Fili", "Bombur"), age=c(50, 1000, 195, 178, 77, 169, 167, 158, 82, NA))) charactersSparkdf <- createDataFrame(charactersRdf) write.df(charactersSparkdf, "", source = "com.mongodb.spark.sql.DefaultSource", mode = "overwrite")
Note
The empty argument ("") refers to a file to use as a data source. In this case our data source is a MongoDB collection, so the data source argument is empty.
The above operation writes to the MongoDB database and collection
specified in the spark.mongodb.output.uri option
specified in the sparkR
shell arguments or SparkSession
configuration.
To read the first few rows of the DataFrame, use the head()
method.
head(charactersSparkdf)
The operation prints the following output:
name age 1 Bilbo Baggins 50 2 Gandalf 1000 3 Thorin 195 4 Balin 178 5 Kili 77 6 Dwalin 169
The printSchema()
method prints out the DataFrame's schema:
printSchema(charactersSparkdf)
In the sparkR
shell, the operation prints the following output:
root |-- name: string (nullable = true) |-- age: double (nullable = true)
Writing with Options
You can add arguments to the write.df()
method to specify
a MongoDB database and collection.
The following operation writes the charactersSparkdf
data to
a MongoDB collection called ages
in a database called
characters
.
write.df(charactersSparkdf, "", source = "com.mongodb.spark.sql.DefaultSource", mode = "overwrite", database = "characters", collection = "ages")