Atlas SQL Language Reference
On this page
- Compatability and Limitations
- Type System
- Data Types
- Type Conversions
- Schemas
- Clauses
- SELECT
- FROM
- WHERE
- GROUP BY
- HAVING
- ORDER BY
- LIMIT and OFFSET
- Set operations
- Expressions
- Identifiers
- Aliases
- Keywords
- Literals
- Parenthesized Expressions
- Operators
- Subquery Expressions
- Document and Field-Access Expressions
- Null and Missing
- Comments
This page outlines the syntax and semantics of Atlas SQL, a SQL dialect designed to provide first-class support for working with MongoDB's flexible document model. This page lists and describes supported clauses, operators, expressions, and functions.
Compatability and Limitations
Atlas SQL is based on the SQL-92 standard. However Atlas SQL is not fully SQL-92 compatible due to the following limitations:
The
UNION
function is not supported. However,UNION ALL
is supported.The
date
data type is not supported. Usetimestamp
instead.SELECT DISTINCT
is not supported.Interval and date interval arithmetic are not supported.
Atlas SQL does not support Atlas Vector Search and Atlas Search.
Type System
Data Types
Atlas SQL data types are the set of BSON types. All of these types can be queried in Atlas SQL. They are:
String (
STRING
)Document (
DOCUMENT
)Array (
ARRAY
)BinData (
BINDATA
)ObjectId (
OBJECTID
)Boolean (
BOOL
)Date (
TIMESTAMP
)Null (
NULL
)Regex (
REGEX
)32-bit integer (
INT
)Double (
DOUBLE
)Long (
LONG
)Timestamp (
BSON_TIMESTAMP
)Decimal (
DECIMAL
)MinKey (
MINKEY
)MaxKey (
MAXKEY
)DBPointer (
DBPOINTER
)Symbol (
SYMBOL
)Javascript with scope (
JAVASCRIPTWITHSCOPE
)JavaScript (
JAVASCRIPT
)
Each type in Atlas SQL has a name (in parentheses above), which is a keyword that can be used to
reference the type when necessary (e.g. in an expression like CAST
).
Type Conversions
Explicit type conversions are expressed via the CAST
function or the ::
operator.
Numeric types are all mutually comparable;
Atlas SQL allows operations between the various numeric types without casting the operands to be the same numeric type.
Schemas
Atlas SQL converts MongoDB's flexible document values into types using a schema. An Atlas SQL schema is a collection of facts about an expression or collection that are known to be true at compile time.
For example, an Atlas SQL schema might dictate that an expression is either a boolean or a document with subfields, or that an expression is either an array with a length of one or a positive integer.
If a static type constraint is not satisfied, the query will fail to compile.
Clauses
Atlas SQL queries support a basic set of SQL clauses. The available clauses are:
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
OFFSET
LIMIT
SELECT
SELECT
begins every Atlas SQL query.
Atlas SQL allows SELECT VALUE
and SELECT VALUES
to be used interchangeably.
Atlas SQL requires nested SELECT
statements to have an alias.
SELECT foo FROM (SELECT bar FROM baz) as subSelect
CAST()
Atlas SQL supports the CAST()
funtion, which allows you to dynamically
convert values in your query to a given data type.
SELECT * FROM table WHERE period_start_utc >= CAST('2023-01-01T00:00:00.000Z' AS TIMESTAMP)
FROM
FROM
is the first clause evaluated in every Atlas SQL query.
FROM
can pull data from various sources including collections
(SELECT * FROM foo)
, arrays (SELECT * FROM [{'a': 1}])
, joins
(SELECT * FROM a JOIN b)
, derived tables
(SELECT * FROM (SELECT a FROM foo) d)
, and FLATTEN and UNWIND.
WHERE
The WHERE
clause is a filter on incoming data.
Its expression must statically have type BOOL
or NULL
and may
evaluate to MISSING
.
GROUP BY
GROUP BY
provides a means for grouping and aggregating data.
If using GROUP BY
on a field that has been flattened using FLATTEN or UNWIND,
an alias is required.
SELECT customerAge, COUNT(*) FROM Sales GROUP BY customer.age AS customerAge
Aggregation Functions
Atlas SQL supports the following aggregation functions.
Name | Description | Notes |
---|---|---|
ADD_TO_ARRAY | Pushes the argument to the end of an array.
The total output of this function will be an array. | The argument to ADD_TO_ARRAY can have any type. |
ADD_TO_SET | Pushes the argument to the end of an array removing duplicates.
The total output of this function will be an array with all duplicate items removed.
Duplicates are determined using the = operator. | The argument to ADD_TO_SET can have any type. |
AVG | Returns the average of all the arguments. | The argument must be statically typed to a numeric type. |
COUNT | Counts the number of elements.
COUNT(*) counts all values unconditionally.
COUNT(<expression>) counts all values for which the expression does not result in NULL or MISSING . | The argument to COUNT can have any type. |
FIRST | Returns the first element in the group. | Deterministic only when the input has deterministic order, otherwise undefined. |
LAST | Returns the first element in the group.
Deterministic only when the input has deterministic order, otherwise undefined. | The argument to LAST can have any type. |
MAX | Returns the maximum element as ordered by the Atlas SQL > operator. | The argument must be statically typed to be comparable via the > operator. |
MERGE_DOCUMENTS | Returns a document formed by successively merging documents,
with the previous element used as the left hand side.
In the case of duplicate keys, the value of the key in the new element is kept.
As with FIRST and LAST ,
the output is only deterministic when the input has deterministic ordering. | The argument must be statically typed as DOCUMENT ,
and thus MERGE_DOCUMENTS(DISTINCT x) is not supported. |
MIN | Returns the minimum element as ordered by the Atlas SQL < operator. | The argument must be statically typed to be comparable via the < operator. |
STDDEV_POP | Returns the standard deviation of all elements over the entire group population. | The argument must be statically typed to a numeric type.
See stdDevPop. |
STDDEV_SAMP | Returns the standard deviation of a sample of all elements in the group.
See stdDevPop. | The argument must be statically typed to a numeric type. |
SUM | Returns the sum of all the arguments. | The argument must be statically typed to a numeric type. |
HAVING
The HAVING
clause operates in the same way as a WHERE
clause, but after the GROUP BY
clause.
Like the WHERE
clause, the HAVING
clause takes an expression
that must statically have type BOOL
or NULL
and may evaluate to MISSING
.
It can reference aliases defined in the GROUP BY
and can contain expressions with aggregation functions.
Only aliases defined in the GROUP BY
are available to the HAVING
clause.
ORDER BY
The ORDER BY
clause provides a way to order a result set by one or more sort keys.
Each sort key can be either a column reference,
or an integer literal referring to a SELECT
expression by its position in the select expression list.
Sort keys that are column references can be compound identifiers.
These compound identifiers can be qualified with datasource names or refer to document subfields.
Atlas SQL sorts MISSING
before NULL
, and NULL
before all other values.
The ORDER BY
clause requires that all possible values in a sort key expression
can be statically verified to be comparable via the >
(greater than) and <
(less than) operators.
LIMIT and OFFSET
The LIMIT
and OFFSET
clauses allow users to retrieve only some of the rows returned by a query.
If a LIMIT
number is provided, no more than that number of rows will be returned.
If an OFFSET
number is provided, that number of rows is skipped before returning rows.
Both LIMIT
and OFFSET
numbers must be positive integers.
Using LIMIT
or OFFSET
without ORDER BY
does not guarantee the same result.
When LIMIT
and OFFSET
are both set,
the OFFSET
rows will be skipped before returning the rest of the results,
which should contain no more than the LIMIT
number rows.
LIMIT i, j
is a shorter form of LIMIT i OFFSET j
.
LIMIT
and OFFSET
can be used in subqueries.
Set operations
The UNION ALL
set operator returns a single result set for two SELECT
queries.
The UNION ALL
operator does not remove duplicate rows from the result set.
The result set returned by the UNION ALL
operator does not have a defined order.
Atlas SQL does not support distinct UNION
, INTERSECT
, or EXCEPT
set operations.
Expressions
Identifiers
Identifiers in Atlas SQL refer to databases, tables, and columns.
Atlas SQL identifiers support all UTF-8 characters except the null character \x00
.
In Atlas SQL, some identifiers are restricted to avoid conflicts
with characters that have other semantic meaning; for an identifier to include such a character, it must be delimited,
that is, surrounded by double quotes or backticks.
For example, an identifier must be delimited if it begins with a digit or if it conflicts with a reserved keyword
(e.g. "10cent"
).
Identifiers are case-sensitive, whether delimited or not.
Aliases
Identifiers are used for all aliases in Atlas SQL.
In most cases, Atlas SQL returns an error if an alias is used more than once in the same clause.
The exception to this is that aliases can be repeated on both sides of a UNION ALL
.
This also applies to automatically generated aliases.
Keywords
Atlas SQL keywords (such as SELECT
, FROM
, JOIN
, etc.) cannot be used as undelimited identifiers.
Literals
Atlas SQL supports literals for booleans, null, numbers, and strings.
Strings are enclosed in single quotes.
To include a single quote character in a string literal, double it ('o''clock'
).
Literal integers are typed as INT
when within the 32-bit signed integer range and LONG
otherwise.
Literal floating point numbers or scientific notation numbers have type DOUBLE
.
Atlas SQL does not support literals for every type.
For example OBJECTID
, BSON_DATE
, and DECIMAL
have no literal syntax.
For such types, pseudo-literal values can be obtained by using the CAST
operator
to get from a string or numeric representation of those types to their respective type.
Some types may also have "constructor" functions which alias the relevant CAST
invocations.
Parenthesized Expressions
A parenthesized expression is an expression grouped by parentheses.
Any time infix operators are present, the need for parentheses (or a similar mechanism) to distinguish order of operations may be necessary.
Atlas SQL has several infix operators, such as +
and ::
.
For example, the value of 1 + 2 * 3
is 7, while the value of (1 + 2) * 3
is 9.
Operators
Atlas SQL supports the following basic operators:
+
-
*
/
||
<
<=
!=
==
>
>=
BETWEEN
AND
OR
NOT
Subquery Expressions
A subquery is a SQL query within a query. You can use a subquery anywhere an expression can be used.
Atlas SQL supports scalar subquery and table subquery. A scalar subquery returns a result set with zero or one row and one column. It can be used in most places a literal or single column value is valid. A table subquery returns zero or more rows and one or more columns.
Document and Field-Access Expressions
Documents can be represented with a syntax similar to JSON objects. Keys must be strings and values can have any of the supported types. To access document fields, Atlas SQL supports two options: "dot" notation and "bracket" notation.
Dot notation is similar to field access in MongoDB aggregation.
For example, if a document doc
contains a field f
, then the expression doc.f
is used to access the value of that field.
Bracket notation uses square brackets ([
and ]
) around a field name to access the field with that name.
For example, consider the same document described before: doc["f"]
is used to access the value of that field.
Null and Missing
BSON distinguishes between NULL
and MISSING
.
In the case of NULL
there is a field with the literal value NULL
,
whereas in the case of MISSING
, the field is gone.
Comments
Comments are sequences of characters within queries that do not impact query execution. Atlas SQL supports both standard SQL comments and C-style block comments.
Standard SQL comments begin with double dashes and end with a new line:
\-- This is a standard SQL comment
Block comments begin with \*
and end at the matching occurrence of */
.
\* This is a multiline comment */