Docs Home → MongoDB Spark Connector
Read From MongoDB
Use the MongoSpark.load
method to create an RDD representing
a collection.
The following example loads the collection specified in the
SparkConf
:
val rdd = MongoSpark.load(sc) println(rdd.count) println(rdd.first.toJson)
To specify a different collection, database, and other read
configuration settings, pass a ReadConfig
to
MongoSpark.load()
.
Using a ReadConfig
MongoSpark.load()
can accept a ReadConfig
object which
specifies various read configuration settings, such as the collection or the
Read Preference.
The following example reads from the spark
collection with a
secondaryPreferred
read preference:
import com.mongodb.spark.config._ val readConfig = ReadConfig(Map("collection" -> "spark", "readPreference.name" -> "secondaryPreferred"), Some(ReadConfig(sc))) val customRdd = MongoSpark.load(sc, readConfig) println(customRdd.count) println(customRdd.first.toJson)
SparkContext Load Helper Methods
SparkContext
has an implicit helper method loadFromMongoDB()
to
load data from MongoDB.
For example, use the loadFromMongoDB()
method without any arguments
to load the collection specified in the SparkConf
:
sc.loadFromMongoDB() // Uses the SparkConf for configuration
Call loadFromMongoDB()
with a ReadConfig
object to specify a
different MongoDB server address, database and collection. See
input configuration settings for available
settings:
sc.loadFromMongoDB(ReadConfig(Map("uri" -> "mongodb://example.com/database.collection"))) // Uses the ReadConfig