Filters and SQL

On this page

Filters

This version of the documentation is archived and no longer supported. See the current documentation for the latest version of the MongoDB Connector for Spark.

Filters

Note

When using filters with DataFrames or the R API, the underlying Mongo Connector code constructs an aggregation pipeline to filter the data in MongoDB before sending it to Spark.

Use filter() to read a subset of data from your MongoDB collection.

Consider a collection named fruit that contains the following documents:

{ "_id" : 1, "type" : "apple", "qty" : 5 }
{ "_id" : 2, "type" : "orange", "qty" : 10 }
{ "_id" : 3, "type" : "banana", "qty" : 15 }

First, set up a dataframe to connect with your default MongoDB data source:

df <- read.df("", source = "com.mongodb.spark.sql.DefaultSource")

Note

The empty argument ("") refers to a file to use as a data source. In this case our data source is a MongoDB collection, so the data source argument is empty.

The following operation filters the data and includes records where the qty field is greater than or equal to 10:

head(filter(df, df$qty >= 10))

The operation prints the following output:

_id qty   type
1   2  10 orange
2   3  15 banana

SQL

Before running SQL queries on your dataset, you must register a temporary view for the dataset.

The following example registers a temporary table called temp, then uses SQL to query for records in which the type field contains the letter e:

createOrReplaceTempView(df, "temp")
some_fruit <- sql("SELECT type, qty FROM temp WHERE type LIKE '%e%'")
head(some_fruit)

In the sparkR shell, the operation prints the following output:

type qty
1  apple   5
2 orange  10

← Aggregation FAQ →

Filters and SQL.css-134mg1q{-webkit-align-self:center;-ms-flex-item-align:center;align-self:center;padding:0 10px;visibility:hidden;}.css-6vrlzm{border-radius:0!important;display:initial!important;margin:initial!important;}.css-1l4s55v{margin-top:-175px;position:absolute;padding-bottom:2px;}

Filters

Note

Note

SQL

Filters and SQL