Aggregation

This version of the documentation is archived and no longer supported. See the current documentation for the latest version of the MongoDB Connector for Spark.

Use MongoDB's aggregation pipeline to apply filtering rules and perform aggregation operations when reading data from MongoDB into Spark.

Consider a collection named fruit that contains the following documents:

{ "_id" : 1, "type" : "apple", "qty" : 5 }
{ "_id" : 2, "type" : "orange", "qty" : 10 }
{ "_id" : 3, "type" : "banana", "qty" : 15 }

Add a pipeline argument to read.df() from within the sparkR shell to specify an aggregation pipeline to use when creating a DataFrame.

agg_pipeline <- "{'$match': {'type': 'apple'}}"
df <- read.df("", source = "com.mongodb.spark.sql.DefaultSource", pipeline = agg_pipeline)
head(df)

Note

The empty argument ("") refers to a file to use as a data source. In this case our data source is a MongoDB collection, so the data source argument is empty.

In the sparkR shell, the operation prints the following output:

  _id qty  type
1   1   5 apple

← Read from MongoDB Filters and SQL →

Aggregation.css-134mg1q{-webkit-align-self:center;-ms-flex-item-align:center;align-self:center;padding:0 10px;visibility:hidden;}.css-6vrlzm{border-radius:0!important;display:initial!important;margin:initial!important;}.css-1l4s55v{margin-top:-175px;position:absolute;padding-bottom:2px;}

Note

Aggregation