Best Practices for Time Series Collections

On this page

Optimize Inserts

Batch Documents by Metadata
Use Consistent Field Order in Documents
Increase the Number of Clients
Optimize Compression
Omit Fields Containing Empty Objects and Arrays from Documents
Round Numeric Data to Few Decimal Places
Optimize Query Performance

This page describes best practices to improve performance and data usage for time series collections.

Optimize Inserts

To optimize insert performance for time series collections, perform the following actions.

Batch Documents by Metadata

When inserting multiple documents:

To avoid network roundtrips, use a single insertMany() statement as opposed to multiple insertOne() statements.
If possible, order or construct batches to contain multiple measurements per series (as defined by metadata).

For example, if you have two sensors, sensor A and sensor B, a batch containing multiple measurements from a single sensor incurs the cost of one insert, rather than one insert per measurement.

The following operation inserts six documents, but only incurs the cost of two inserts (one per batch), because the documents are ordered by sensor:

db.temperatures.insertMany( [
   {
      "metadata": {
         "sensor": "sensorA"
      },
      "timestamp": ISODate("2021-05-18T00:00:00.000Z"),
      temperature: 10
   },
   {
      "metadata": {
         "sensor": "sensorA"
      },
      "timestamp": ISODate("2021-05-19T00:00:00.000Z"),
      temperature: 12
   },
   {
      "metadata": {
         "sensor": "sensorA"
      },
      "timestamp": ISODate("2021-05-20T00:00:00.000Z"),
      temperature: 13
   },
   {
      "metadata": {
         "sensor": "sensorB"
      },
      "timestamp": ISODate("2021-05-18T00:00:00.000Z"),
      temperature: 20
   },
   {
      "metadata": {
         "sensor": "sensorB"
      },
      "timestamp": ISODate("2021-05-19T00:00:00.000Z"),
      temperature: 25
   },
   {
      "metadata": {
         "sensor": "sensorB"
      },
      "timestamp": ISODate("2021-05-20T00:00:00.000Z"),
      temperature: 26
   }
] )

Use Consistent Field Order in Documents

Using a consistent field order in your documents improves insert performance.

For example, inserting these documents achieves optimal insert performance:

{
   _id: ObjectId("6250a0ef02a1877734a9df57"),
   timestamp: 2020-01-23T00:00:00.441Z,
   name: 'sensor1',
   range: 1
},
{
   _id: ObjectId("6560a0ef02a1877734a9df66")
   timestamp: 2020-01-23T01:00:00.441Z,
   name: 'sensor1',
   range: 5
}

In contrast, these documents do not achieve optimal insert performance, because their field orders differ:

{
   range: 1,
   _id: ObjectId("6250a0ef02a1877734a9df57"),
   name: 'sensor1',
   timestamp: 2020-01-23T00:00:00.441Z
},
{
   _id: ObjectId("6560a0ef02a1877734a9df66")
   name: 'sensor1',
   timestamp: 2020-01-23T01:00:00.441Z,
   range: 5
}

Increase the Number of Clients

Increasing the number of clients writing data to your collections can improve performance.

Important

Disable Retryable Writes

To write data with multiple clients, you must disable retryable writes. Retryable writes for time series collections do not combine writes from multiple clients.

To learn more about retryable writes and how to disable them, see retryable writes.

Optimize Compression

To optimize data compression for time series collections, perform the following actions.

Omit Fields Containing Empty Objects and Arrays from Documents

To optimize compression, if your data contains empty objects or arrays, omit the empty fields from your documents.

For example, consider the following documents:

{
 time: 2020-01-23T00:00:00.441Z,
 coordinates: [1.0, 2.0]
},
{
   time: 2020-01-23T00:00:10.441Z,
   coordinates: []
},
{
   time: 2020-01-23T00:00:20.441Z,
   coordinates: [3.0, 5.0]
}

The alternation between coordinates fields with populated values and an empty array result in a schema change for the compressor. The schema change causes the second and third documents in the sequence remain uncompressed.

In contrast, the following documents where the empty array is omitted receive the benefit of optimal compression:

{
   time: 2020-01-23T00:00:00.441Z,
   coordinates: [1.0, 2.0]
},
{
   time: 2020-01-23T00:00:10.441Z
},
{
   time: 2020-01-23T00:00:20.441Z,
   coordinates: [3.0, 5.0]
}

Round Numeric Data to Few Decimal Places

Round numeric data to the precision required for your application. Rounding numeric data to fewer decimal places improves the compression ratio.

Optimize Query Performance

To improve query performance, create one or more secondary indexes on your timeField and metaField to support common query patterns.

Back

Shard a Time Series Collection

Reference