Time series data is generated everywhere from social media to stock tickers to IoT devices. Analyzing time series data enables organizations to detect, prevent, and predict events ahead of their competition. But where should the ever-increasing volume of time series data be stored? And how can it be analyzed?
MongoDB added native support for time series data in version 5.0. In this article, you'll learn what time series data is, how you can store and query time series data in MongoDB, and what the best practices are for working with time series data in MongoDB.
Time series data are measurements taken at time intervals from one or more sources.
While not all data is time series in nature, a growing percentage of it can be classified as time series. Nearly every company needs to query, analyze, and report on time series data. Consider a stock day trader constantly looking at feeds of stock prices over time and running algorithms to analyze trends to identify opportunities. They are looking at data over a time interval with hourly or daily ranges. Another example could be how a connected weather measurement device might obtain telemetry such as humidity levels and temperature change to forecast weather. Additionally, it could monitor air pollution to produce alerts or analysis before a crisis occurs. The gathered information can be looked at over a time range to calculate trends over time.
Here is a single document example of a stock trading measurement:
{
date: ISODate("2020-01-03T05:00:00.000Z"),
symbol: 'AAPL',
volume: 146322800,
open: 74.287498,
adjClose: 73.486023,
high: 75.144997,
low: 74.125,
close: 74.357498
}
Generally, time series data includes the time and measurement, as well as other identifying information such as the source of the data. In this example of time series data that captures stock trading information, we have the date as the time classifier and the stock symbol as the identification field while information like open and close prices are the measurements in this case.
When you work with time series data, you are often not only concerned with storing the data but also require high read and write performance and advanced query capabilities. MongoDB is a document-based general purpose database with flexible schema design and a rich query language. As of MongoDB 5.0, MongoDB natively supports time series data.
You can create a new time series collection with the createCollection()
command. When you want to create a time series collection, you must include the timeField
option. timeField
indicates the name of the field that includes the date in each document. You should also include the following options:
metaField
indicates the name of the field that contains metadata in each document. The metaField serves as a label or tag which allows time series collections to uniquely identify the source of a time series. This field should never or rarely change over time.granularity
field indicates the time span between documents with a matching metaField, if specified. The default granularity is "seconds"
which indicates a high-frequency ingestion rate as it relates to each unique time series identified by the metaField. Granularity can be set to "seconds,"
"minutes,"
or "hours,"
and can be modified to be coarser at any time. However, you cannot modify granularity to be finer, e.g., "minutes" to "seconds," so it is recommended to start at a finer granularity and adjust to a coarser granularity.Lastly, you may want to include this option if you would like to remove data after a certain time has passed:
expireAfterSeconds
field indicates the number of seconds after which documents should expire and be automatically deleted.The following example creates a time series collection named dowJonesTickerData
where the timeField
is date and the metaField
is symbol:
db.createCollection("dowJonesTickerData",
{ timeseries: {
timeField: "date",
metaField: "symbol",
granularity: "minutes" } })
Each document that you add to the time series collection will need to specify at least the timeField
. In the example document below, the timeField
is date
. Note that, timeField
can be named anything as long as it is of BSON type, Date. A document can be added to a time series collection using any of the methods that can be used to insert documents into other MongoDB collections. The example below adds a document to the dowJonesTickerData
collection using insertOne()
.
db.dowJonesTickerData.insertOne({
date: ISODate("2020-01-03T05:00:00.000Z"),
symbol: 'AAPL',
volume: 146322800,
open: 74.287498,
adjClose: 73.486023,
high: 75.144997,
low: 74.125,
close: 74.357498
})
Each measurement inserted should be a single measurement, either as individual documents or batches of documents with one measure per document. MongoDB will optimize this data stored by time to reduce size footprint and optimize time series access patterns and write throughput.
Documents in time series collections can be queried in the same ways documents in other MongoDB collections can. For example, we can query in the MongoDB Shell, mongosh, for a document in the dowJonesTickerData
collection using findOne()
.
db.dowJonesTickerData.findOne()
The MongoDB shell will return one document:
{
date: ISODate("2020-01-03T05:00:00.000Z"),
symbol: 'AAPL',
volume: 146322800,
open: 74.287498,
adjClose: 73.486023,
high: 75.144997,
low: 74.125,
close: 74.357498,
_id: ObjectId("60eea26373c4cdcb6356827d")
}
MongoDB optimizes the data, as it stores data ordered by time as opposed to the natural order in regular collections.
You can improve query performance by adding secondary indexes on the metaField and/or the timeField.
db.dowJonesTickerData.createIndex({ symbol : 1, date : 1});
If you want to enable search on multiple metadata fields (e.g., symbol and company), we recommend updating your data model. Instead of the metaField being a single value (e.g., symbol), update the metaField to be an object that contains multiple pieces of metadata.
For example, let's drop our existing dowJonesTickerData
collection and create a new one that has a metaField
named "meta."
db.dowJonesTickerData.drop();
db.createCollection("dowJonesTickerData",
{ timeseries: {
timeField: "date",
metaField: "meta",
granularity: "minutes" } })
Next, let's insert a document that stores multiple pieces of metadata in the meta
field.
db.dowJonesTickerData.insertOne({
date: ISODate("2021-05-20T10:24:51.303Z"),
meta : { symbol: 'ba', company: 'boeing'},
price: 125
})
Now, we can create a secondary index that has multiple fields. In our case, we will create a secondary index that allows for efficient searching of both symbol
and company
.
db.dowJonesTickerData.createIndex({ "meta.symbol" :1, "meta.company" :1 , date : 1});
It's common that time series data might not be relevant after a certain time period. Purging or archiving old data keeps the collection as small as possible, thereby improving performance and reducing cost.
The recommended way to automatically delete expired data is by setting a TTL, Time To Live expression, on a time series collection in the form of an expireAfterSeconds parameter. Data will be removed after a document date value reaches “now - expireAfterSeconds.”
db.createCollection("dowJonesTickerData",
{ timeseries: {
timeField: "date",
metaField: "meta" ,
granularity : "minutes"
},
expireAfterSeconds : 94608000 // 3 years
})
In the example above, a document will be deleted after the value stored in the date field is three years old.
You can set or modify this parameter at any point during or after collection creation, so if your requirements change around your data lifecycle, you can easily modify or update the Time to Live for your data.
Another option for handling old data is to tier it into operational and online archive storage. You can use Atlas Online Archive to automatically archive data from your Atlas cluster.
Online Archive has many advantages:
See the official MongoDB documentation on configuring online archives for more information.
MongoDB provides a variety of aggregation pipeline operators and aggregation pipeline stages to enable developers to analyze data. These operators and stages are available for all collections — time series or regular. Let's examine some of the new operators and a stage that were added in version 5.0 to make working with dates and times easier.
In version 5.0, MongoDB added these aggregation pipeline operators:
These new operators make working with time series data even easier. Visit the official MongoDB documentation on aggregation pipeline operators to learn more about all of the available operators.
Consider the original stock data example:
{
date: ISODate("2020-01-03T05:00:00.000Z"),
symbol: 'AAPL',
volume: 146322800,
open: 74.287498,
adjClose: 73.486023,
high: 75.144997,
low: 74.125,
close: 74.357498
}
For this example, the dowJonesTickerData
collection is using “date” as a timeField and “symbol” as a metaField.
Let's say we want to calculate the average closing stock price per month for each stock in the collection. We can use the $dateTrunc
to truncate the dates to the appropriate month. Then, we can use $group
to first, group the documents by month and symbol and, second, calculate the average for each group.
db.dowJonesTickerData.aggregate([{
$group: {
_id: {
firstDayOfMonth: {
$dateTrunc: {
date: "$date",
unit: "month"
}
},
symbol: "$symbol"
},
avgMonthClose: {
$avg: "$close"
}
}
}])
The result of running the above aggregation is a set of documents. Each document contains the average closing price per month for a particular stock. Below are example documents that resulted from running the above aggregation.
{
_id: {
firstDayOfMonth: ISODate("2020-06-01T00:00:00.000Z"),
symbol: 'GOOG'
},
avgMonthClose: 1431.0477184545455
},
{
_id: {
firstDayOfMonth: ISODate("2021-07-01T00:00:00.000Z"),
symbol: 'MDB'
},
avgMonthClose: 352.7314293333333
},
{
_id: {
firstDayOfMonth: ISODate("2021-06-01T00:00:00.000Z"),
symbol: 'MSFT'
},
avgMonthClose: 259.01818086363636
}
Window functions allow developers to run calculations on documents in a given window. MongoDB added support for window functions in version 5.0 with the introduction of the $setWindowFields aggregation pipeline stage. Similar to $group, $setWindowFields allows you to apply one or more operations on a defined window. In $group, documents are grouped together and then calculations are performed on each group. In $setWindowFields, the windows are relative to each document, so the calculations are performed on each document.
With the new $setWindowFields operator, you can calculate a rolling average of the closing price over the last 30 days for each stock:
db.dowJonesTickerData.aggregate( [
{$setWindowFields: {
partitionBy: { symbol : "$symbol" } ,
sortBy: { date: 1 },
output: {
averageMonthClosingPrice: {
$avg : "$close",
window : { range : [-1, "current"], unit : "month" }
} }
} }] )
)
The result of running the above aggregation is a set of documents. The set will contain the same number of documents as the original collection. Each document in the results will contain a new field: $averageMonthClosingPrice
. The value of $averageMonthClosingPrice
is the average of the previous month's closing price for the indicated stock symbol. Below is an example document that resulted from running the above aggregation.
{
date: ISODate("2020-01-29T05:00:00.000Z"),
symbol: 'AAPL',
volume: 216229200,
adjClose: 80.014801,
low: 80.345001,
high: 81.962502,
open: 81.112503,
close: 81.084999,
averageMonthClosingPrice: 77.63137520000001
}
MongoDB Charts is a great tool to visualize the data calculated by the above aggregation pipeline.
Adjust the chart’s Data Source to use the same pipeline we did above:
[
{$setWindowFields: {
partitionBy: { symbol : "$symbol" } ,
sortBy: { date: 1 },
output: {
averageMonthClosingPrice: {
$avg : "$close",
window : { range : [-1, "current"], unit : "month" }
} }
} }]
As a result, the data source exposes all fields such as symbol, date, close, and averageMonthClosingPrice, which can be used directly in a line graph:
When charting this data, we can see the immediate value of this analysis:
The graphs above show a closing price and a rolling 30-day average graph of each stock.
Below is a list of six best practices for working with time series data in MongoDB:
Time series data is everywhere, but storing and querying it can be challenging. MongoDB added native support for time series data in version 5.0, making it even easier, faster, and cheaper to work with time series data.
Time series data are measurements taken at time intervals from one or more sources. When analyzed, the individual data pieces form a meaningful insight over a period of time. Examples are weather measurement data and stock trading data.
MongoDB is a general purpose document database that has native support for time series data. MongoDB's time series collections are optimized and purpose-built for ingesting, storing, and querying time series data. Therefore, users can use a single unified Query API utilizing MongoDB as a time series database alongside other database use cases.