MongoDB Aggregation Pipeline Queries vs SQL Queries
Joe Karlsson7 min read • Published Feb 07, 2022 • Updated May 10, 2022
Rate this tutorial
Let's be honest: Many devs coming to MongoDB are joining the community
with a strong background in SQL. I would personally include myself in
this subset of MongoDB devs. I think it's useful to map terms and
concepts you might be familiar with in SQL to help
"translate"
your work into MongoDB Query Language (MQL). More specifically, in this
post, I will be walking through translating the MongoDB Aggregation
Pipeline from SQL.
The aggregation framework allows you to analyze your data in real time.
Using the framework, you can create an aggregation pipeline that
consists of one or more
stages.
Each stage transforms the documents and passes the output to the next
stage.
If you're familiar with the Unix pipe |, you can think of the
aggregation pipeline as a very similar concept. Just as output from one
command is passed as input to the next command when you use piping,
output from one stage is passed as input to the next stage when you use
the aggregation pipeline.
SQL is a declarative language. You have to declare what you want to
see—that's why SELECT comes first. You have to think in sets, which can
be difficult, especially for functional programmers. With MongoDB's
aggregation pipeline, you can have stages that reflect how you think—for
example, "First, let's group by X. Then, we'll get the top 5 from every
group. Then, we'll arrange by price." This is a difficult query to do in
SQL, but much easier using the aggregation pipeline framework.
If you want to check out another great introduction to the MongoDB
Aggregation Pipeline, be sure to check out Introduction to the MongoDB
Aggregation
Framework.
The following table provides an overview of common SQL aggregation
terms, functions, and concepts and the corresponding MongoDB
aggregation
operators:
SQL Terms, Functions, and Concepts | MongoDB Aggregation Operators |
---|---|
WHERE | $match |
GROUP BY | $group |
HAVING | $match |
SELECT | $project |
LIMIT | $limit |
OFFSET | $skip |
ORDER BY | $sort |
SUM() | $sum |
COUNT() | $sum and $sortByCount |
JOIN | $lookup |
SELECT INTO NEW_TABLE | $out |
MERGE INTO TABLE | $merge (Available starting in MongoDB 4.2) |
UNION ALL | $unionWith (Available starting in MongoDB 4.4) |
Alright, now that we've covered the basics of MongoDB Aggregations,
let's jump into some examples.
The SQL examples assume two tables, album and songs, that join by
the song.album_id and the songs.id columns. Here's what the tables
look like:
id | name | band_name | price | status |
---|---|---|---|---|
1 | lo-fi chill hop songs to study to | Silicon Infinite | 2.99 | A |
2 | Moon Rocks | Silicon Infinite | 1.99 | B |
3 | Flavour | Organical | 4.99 | A |
id | title | plays | album_id |
---|---|---|---|
1 | Snow Beats | 133 | 1 |
2 | Rolling By | 242 | 1 |
3 | Clouds | 3191 | 1 |
4 | But First Coffee | 562 | 3 |
5 | Autumn | 901 | 3 |
6 | Milk Toast | 118 | 2 |
7 | Purple Mic | 719 | 2 |
8 | One Note Dinner Party | 1242 | 2 |
I used a site called SQL Fiddle,
and used PostgreSQL 9.6 for all of my examples. However, feel free to
run these sample SQL snippets wherever you feel most comfortable. In
fact, this is the code I used to set up and seed my tables with our
sample data:
1 -- Creating the main albums table 2 CREATE TABLE IF NOT EXISTS albums ( 3 id BIGSERIAL NOT NULL UNIQUE PRIMARY KEY, 4 name VARCHAR(40) NOT NULL UNIQUE, 5 band_name VARCHAR(40) NOT NULL, 6 price float8 NOT NULL, 7 status VARCHAR(10) NOT NULL 8 ); 9 10 -- Creating the songs table 11 CREATE TABLE IF NOT EXISTS songs ( 12 id SERIAL PRIMARY KEY NOT NULL, 13 title VARCHAR(40) NOT NULL, 14 plays integer NOT NULL, 15 album_id BIGINT NOT NULL REFERENCES albums ON DELETE RESTRICT 16 ); 17 18 INSERT INTO albums (name, band_name, price, status) 19 VALUES 20 ('lo-fi chill hop songs to study to', 'Silicon Infinite', 7.99, 'A'), 21 ('Moon Rocks', 'Silicon Infinite', 1.99, 'B'), 22 ('Flavour', 'Organical', 4.99, 'A'); 23 24 INSERT INTO songs (title, plays, album_id) 25 VALUES 26 ('Snow Beats', 133, (SELECT id from albums WHERE name='lo-fi chill hop songs to study to')), 27 ('Rolling By', 242, (SELECT id from albums WHERE name='lo-fi chill hop songs to study to')), 28 ('Clouds', 3191, (SELECT id from albums WHERE name='lo-fi chill hop songs to study to')), 29 ('But First Coffee', 562, (SELECT id from albums WHERE name='Flavour')), 30 ('Autumn', 901, (SELECT id from albums WHERE name='Flavour')), 31 ('Milk Toast', 118, (SELECT id from albums WHERE name='Moon Rocks')), 32 ('Purple Mic', 719, (SELECT id from albums WHERE name='Moon Rocks')), 33 ('One Note Dinner Party', 1242, (SELECT id from albums WHERE name='Moon Rocks'));
The MongoDB examples assume one collection
albums
that contains
documents with the following schema:1 { 2 name : 'lo-fi chill hop songs to study to', 3 band_name: 'Silicon Infinite', 4 price: 7.99, 5 status: 'A', 6 songs: [ 7 { title: 'Snow beats', 'plays': 133 }, 8 { title: 'Rolling By', 'plays': 242 }, 9 { title: 'Sway', 'plays': 3191 } 10 ] 11 }
For this post, I did all of my prototyping in a MongoDB Visual Studio
Code plugin playground. For more information on how to use a MongoDB
Playground in Visual Studio Code, be sure to check out this post: How
To Use The MongoDB Visual Studio Code
Plugin.
Once you have your playground all set up, you can use this snippet to
set up and seed your collection. You can also follow along with this
demo by using the MongoDB Web
Shell.
1 // Select the database to use. 2 use('mongodbVSCodePlaygroundDB'); 3 4 // The drop() command destroys all data from a collection. 5 // Make sure you run it against the correct database and collection. 6 db.albums.drop(); 7 8 // Insert a few documents into the albums collection. 9 db.albums.insertMany([ 10 { 11 'name' : 'lo-fi chill hop songs to study to', band_name: 'Silicon Infinite', price: 7.99, status: 'A', 12 songs: [ 13 { title: 'Snow beats', 'plays': 133 }, 14 { title: 'Rolling By', 'plays': 242 }, 15 { title: 'Clouds', 'plays': 3191 } 16 ] 17 }, 18 { 19 'name' : 'Moon Rocks', band_name: 'Silicon Infinite', price: 1.99, status: 'B', 20 songs: [ 21 { title: 'Milk Toast', 'plays': 118 }, 22 { title: 'Purple Mic', 'plays': 719 }, 23 { title: 'One Note Dinner Party', 'plays': 1242 } 24 ] 25 }, 26 { 27 'name' : 'Flavour', band_name: 'Organical', price: 4.99, status: 'A', 28 songs: [ 29 { title: 'But First Coffee', 'plays': 562 }, 30 { title: 'Autumn', 'plays': 901 } 31 ] 32 }, 33 ]);
1 SELECT COUNT(*) AS count 2 FROM albums
1 db.albums.aggregate( [ 2 { 3 $group: { 4 _id: null, // An _id value of null on the $group operator accumulates values for all the input documents as a whole. 5 count: { $sum: 1 } 6 } 7 } 8 ] );
1 SELECT SUM(price) AS total 2 FROM albums
1 db.albums.aggregate( [ 2 { 3 $group: { 4 _id: null, 5 total: { $sum: "$price" } 6 } 7 } 8 ] );
1 SELECT band_name, 2 SUM(price) AS total 3 FROM albums 4 GROUP BY band_name
1 db.albums.aggregate( [ 2 { 3 $group: { 4 _id: "$band_name", 5 total: { $sum: "$price" } 6 } 7 } 8 ] );
1 SELECT band_name, 2 SUM(price) AS total 3 FROM albums 4 GROUP BY band_name 5 ORDER BY total
1 db.albums.aggregate( [ 2 { 3 $group: { 4 _id: "$band_name", 5 total: { $sum: "$price" } 6 } 7 }, 8 { $sort: { total: 1 } } 9 ] );
1 SELECT band_name, 2 count(*) 3 FROM albums 4 GROUP BY band_name 5 HAVING count(*) > 1;
1 db.albums.aggregate( [ 2 { 3 $group: { 4 _id: "$band_name", 5 count: { $sum: 1 } 6 } 7 }, 8 { $match: { count: { $gt: 1 } } } 9 ] );
1 SELECT band_name, 2 SUM(price) as total 3 FROM albums 4 WHERE status = 'A' 5 GROUP BY band_name
1 db.albums.aggregate( [ 2 { $match: { status: 'A' } }, 3 { 4 $group: { 5 _id: "$band_name", 6 total: { $sum: "$price" } 7 } 8 } 9 ] );
1 SELECT band_name, 2 SUM(price) as total 3 FROM albums 4 WHERE status = 'A' 5 GROUP BY band_name 6 HAVING SUM(price) > 5.00;
1 db.albums.aggregate( [ 2 { $match: { status: 'A' } }, 3 { 4 $group: { 5 _id: "$band_name", 6 total: { $sum: "$price" } 7 } 8 }, 9 { $match: { total: { $gt: 5.00 } } } 10 ] );
1 SELECT band_name, 2 SUM(songs.plays) as total_plays 3 FROM albums, 4 songs 5 WHERE songs.album_id = albums.id 6 GROUP BY band_name;
1 db.albums.aggregate( [ 2 { $unwind: "$songs" }, 3 { 4 $group: { 5 _id: "$band_name", 6 qty: { $sum: "$songs.plays" } 7 } 8 } 9 ] );
1 SELECT name, title, plays 2 FROM songs s1 INNER JOIN albums ON (album_id = albums.id) 3 WHERE plays=(SELECT MAX(s2.plays) 4 FROM songs s2 5 WHERE s1.album_id = s2.album_id) 6 ORDER BY name;
1 db.albums.aggregate( [ 2 { $project: 3 { 4 name: 1, 5 plays: { 6 $filter: { 7 input: "$songs", 8 as: "item", 9 cond: { $eq: ["$item.plays", { $max: "$songs.plays" }] } 10 } 11 } 12 } 13 } 14 ] );
This post is in no way a complete overview of all the ways that MongoDB
can be used like a SQL-based database. This was only meant to help devs
in SQL land start to make the transition over to MongoDB with some basic
queries using the aggregation pipeline. The aggregation framework has
many other powerful stages, including
$count,
$geoNear,
$graphLookup,
$project,
$unwind,
and others.
If you want to get better at using the MongoDB Aggregation Framework, be
sure to check out MongoDB University: M121 - The MongoDB Aggregation
Framework. Or,
better yet, try to use some advanced MongoDB aggregation pipeline
queries in your next project! If you have any questions, be sure to head
over to the MongoDB Community
Forums. It's the
best place to get your MongoDB questions answered.
- MongoDB University: M121 - The MongoDB Aggregation Framework: https://university.mongodb.com/courses/M121/about
- How to Use Custom Aggregation Expressions in MongoDB 4.4: https://developer.mongodb.com/how-to/use-function-accumulator-operators
- Introduction to the MongoDB Aggregation Framework: https://developer.mongodb.com/quickstart/introduction-aggregation-framework
- How to Use the Union All Aggregation Pipeline Stage in MongoDB 4.4: https://developer.mongodb.com/how-to/use-union-all-aggregation-pipeline-stage
- Aggregation Framework with Node.js Tutorial: https://developer.mongodb.com/quickstart/node-aggregation-framework
- Aggregation Pipeline Quick Reference: https://docs.mongodb.com/manual/meta/aggregation-quick-reference
- SQL to Aggregation Mapping Chart: https://docs.mongodb.com/manual/reference/sql-aggregation-comparison
- SQL to MongoDB Mapping Chart: https://docs.mongodb.com/manual/reference/sql-comparison
- Questions? Comments? We'd love to connect with you. Join the conversation on the MongoDB Community Forums: https://developer.mongodb.com/community/forums