Docs Home → MongoDB Spark Connector
Read from MongoDB
On this page
You can create a Spark DataFrame to hold data from the MongoDB
collection specified in the
spark.mongodb.input.uri option which your
SparkSession
option is using.
Consider a collection named fruit
that contains the
following documents:
{ "_id" : 1, "type" : "apple", "qty" : 5 } { "_id" : 2, "type" : "orange", "qty" : 10 } { "_id" : 3, "type" : "banana", "qty" : 15 }
Load the collection into a DataFrame with read.df()
from within the sparkR
shell.
df <- read.df("", source = "com.mongodb.spark.sql.DefaultSource")
Note
The empty argument ("") refers to a file to use as a data source. In this case our data source is a MongoDB collection, so the data source argument is empty.
Spark samples the records to infer the schema of the collection. The following operation prints the schema to the console:
printSchema(df)
The operation produces the following shell output:
root |-- _id: double (nullable = true) |-- qty: double (nullable = true) |-- type: string (nullable = true)
Reading with Options
You can add arguments to the read.df()
method to specify
a MongoDB database and collection. The following example reads
from a collection called contacts
in a database called
people
.
df <- read.df("", source = "com.mongodb.spark.sql.DefaultSource", database = "people", collection = "contacts")