Read from MongoDB

On this page

Reading with Options

This version of the documentation is archived and no longer supported. See the current documentation for the latest version of the MongoDB Connector for Spark.

You can create a Spark DataFrame to hold data from the MongoDB collection specified in the spark.mongodb.input.uri option which your SparkSession option is using.

Consider a collection named fruit that contains the following documents:

{ "_id" : 1, "type" : "apple", "qty" : 5 }
{ "_id" : 2, "type" : "orange", "qty" : 10 }
{ "_id" : 3, "type" : "banana", "qty" : 15 }

Load the collection into a DataFrame with read.df() from within the sparkR shell.

df <- read.df("", source = "com.mongodb.spark.sql.DefaultSource")

Note

The empty argument ("") refers to a file to use as a data source. In this case our data source is a MongoDB collection, so the data source argument is empty.

Spark samples the records to infer the schema of the collection. The following operation prints the schema to the console:

printSchema(df)

The operation produces the following shell output:

root
 |-- _id: double (nullable = true)
 |-- qty: double (nullable = true)
 |-- type: string (nullable = true)

Reading with Options

You can add arguments to the read.df() method to specify a MongoDB database and collection. The following example reads from a collection called contacts in a database called people.

df <- read.df("", source = "com.mongodb.spark.sql.DefaultSource",
              database = "people", collection = "contacts")

← Write to MongoDB Aggregation →

Read from MongoDB.css-134mg1q{-webkit-align-self:center;-ms-flex-item-align:center;align-self:center;padding:0 10px;visibility:hidden;}.css-6vrlzm{border-radius:0!important;display:initial!important;margin:initial!important;}.css-1l4s55v{margin-top:-175px;position:absolute;padding-bottom:2px;}

Note

Reading with Options

Read from MongoDB