Perform Long-Running Snapshot Queries
On this page
Snapshot queries allow you to read data as it appeared at a single point in time in the recent past.
Starting in MongoDB 5.0, you can use read concern
"snapshot"
to query data on secondary nodes. This
feature increases the versatility and resilience of your application's
reads. You do not need to create a static copy of your data, move it out
into a separate system, and manually isolate these long-running queries
from interfering with your operational workload. Instead, you can
perform long-running queries against a live, transactional database
while reading from a consistent state of the data.
Using read concern "snapshot"
on secondary nodes does not
impact your application's write workload. Only application reads benefit
from long-running queries being isolated to secondaries.
Use snapshot queries when you want to:
Perform multiple related queries and ensure that each query reads data from the same point in time.
Ensure that you read from a consistent state of the data from some point in the past.
Comparing Local and Snapshot Read Concerns
When MongoDB performs long-running queries using the default
"local"
read concern, the query results may contain data
from writes that occur at the same time as the query. As a result, the
query may return unexpected or inconsistent results.
To avoid this scenario, create a session and specify
read concern "snapshot"
. With read concern
"snapshot"
, MongoDB runs your query with snapshot
isolation, meaning that your query reads data as it appeared at a single
point in time in the recent past.
Examples
The examples on this page show how you can use snapshot queries to:
Run Related Queries From the Same Point in Time
Read concern "snapshot"
lets you run multiple related
queries within a session and ensure that each query reads data from the
same point in time.
An animal shelter has a pets
database that contains collections for
each type of pet. The pets
database has these collections:
cats
dogs
Each document in each collection contains an adoptable
field,
indicating whether the pet is available for adoption. For example, a
document in the cats
collection looks like this:
{ "name": "Whiskers", "color": "white", "age": 10, "adoptable": true }
You want to run a query to see the total number of pets available for adoption across all collections. To provide a consistent view of the data, you want to ensure that the data returned from each collection is from a single point in time.
To accomplish this goal, use read concern "snapshot"
within a session:
mongoc_client_session_t *cs = NULL; mongoc_collection_t *cats_collection = NULL; mongoc_collection_t *dogs_collection = NULL; int64_t adoptable_pets_count = 0; bson_error_t error; mongoc_session_opt_t *session_opts; cats_collection = mongoc_client_get_collection (client, "pets", "cats"); dogs_collection = mongoc_client_get_collection (client, "pets", "dogs"); /* Seed 'pets.cats' and 'pets.dogs' with example data */ if (!pet_setup (cats_collection, dogs_collection)) { goto cleanup; } /* start a snapshot session */ session_opts = mongoc_session_opts_new (); mongoc_session_opts_set_snapshot (session_opts, true); cs = mongoc_client_start_session (client, session_opts, &error); mongoc_session_opts_destroy (session_opts); if (!cs) { MONGOC_ERROR ("Could not start session: %s", error.message); goto cleanup; } /* * Perform the following aggregation pipeline, and accumulate the count in * `adoptable_pets_count`. * * adoptablePetsCount = db.cats.aggregate( * [ { "$match": { "adoptable": true } }, * { "$count": "adoptableCatsCount" } ], session=s * ).next()["adoptableCatsCount"] * * adoptablePetsCount += db.dogs.aggregate( * [ { "$match": { "adoptable": True} }, * { "$count": "adoptableDogsCount" } ], session=s * ).next()["adoptableDogsCount"] * * Remember in order to apply the client session to * this operation, you must append the client session to the options passed * to `mongoc_collection_aggregate`, i.e., * * mongoc_client_session_append (cs, &opts, &error); * cursor = mongoc_collection_aggregate ( * collection, MONGOC_QUERY_NONE, pipeline, &opts, NULL); */ accumulate_adoptable_count (cs, cats_collection, &adoptable_pets_count); accumulate_adoptable_count (cs, dogs_collection, &adoptable_pets_count); printf ("there are %" PRId64 " adoptable pets\n", adoptable_pets_count);
using namespace mongocxx; using bsoncxx::builder::basic::kvp; using bsoncxx::builder::basic::make_document; auto db = client["pets"]; int64_t adoptable_pets_count = 0; auto opts = mongocxx::options::client_session{}; opts.snapshot(true); auto session = client.start_session(opts); { pipeline p; p.match(make_document(kvp("adoptable", true))).count("adoptableCatsCount"); auto cursor = db["cats"].aggregate(session, p); for (auto doc : cursor) { adoptable_pets_count += doc.find("adoptableCatsCount")->get_int32(); } } { pipeline p; p.match(make_document(kvp("adoptable", true))).count("adoptableDogsCount"); auto cursor = db["dogs"].aggregate(session, p); for (auto doc : cursor) { adoptable_pets_count += doc.find("adoptableDogsCount")->get_int32(); } }
ctx := context.TODO() sess, err := client.StartSession(options.Session().SetSnapshot(true)) if err != nil { return err } defer sess.EndSession(ctx) var adoptablePetsCount int32 err = mongo.WithSession(ctx, sess, func(ctx context.Context) error { // Count the adoptable cats const adoptableCatsOutput = "adoptableCatsCount" cursor, err := db.Collection("cats").Aggregate(ctx, mongo.Pipeline{ bson.D{{"$match", bson.D{{"adoptable", true}}}}, bson.D{{"$count", adoptableCatsOutput}}, }) if err != nil { return err } if !cursor.Next(ctx) { return fmt.Errorf("expected aggregate to return a document, but got none") } resp := cursor.Current.Lookup(adoptableCatsOutput) adoptableCatsCount, ok := resp.Int32OK() if !ok { return fmt.Errorf("failed to find int32 field %q in document %v", adoptableCatsOutput, cursor.Current) } adoptablePetsCount += adoptableCatsCount // Count the adoptable dogs const adoptableDogsOutput = "adoptableDogsCount" cursor, err = db.Collection("dogs").Aggregate(ctx, mongo.Pipeline{ bson.D{{"$match", bson.D{{"adoptable", true}}}}, bson.D{{"$count", adoptableDogsOutput}}, }) if err != nil { return err } if !cursor.Next(ctx) { return fmt.Errorf("expected aggregate to return a document, but got none") } resp = cursor.Current.Lookup(adoptableDogsOutput) adoptableDogsCount, ok := resp.Int32OK() if !ok { return fmt.Errorf("failed to find int32 field %q in document %v", adoptableDogsOutput, cursor.Current) } adoptablePetsCount += adoptableDogsCount return nil }) if err != nil { return err }
db = client.pets async with await client.start_session(snapshot=True) as s: adoptablePetsCount = 0 docs = await db.cats.aggregate( [{"$match": {"adoptable": True}}, {"$count": "adoptableCatsCount"}], session=s ).to_list(None) adoptablePetsCount = docs[0]["adoptableCatsCount"] docs = await db.dogs.aggregate( [{"$match": {"adoptable": True}}, {"$count": "adoptableDogsCount"}], session=s ).to_list(None) adoptablePetsCount += docs[0]["adoptableDogsCount"] print(adoptablePetsCount)
$catsCollection = $client->selectCollection('pets', 'cats'); $dogsCollection = $client->selectCollection('pets', 'dogs'); $session = $client->startSession(['snapshot' => true]); $adoptablePetsCount = $catsCollection->aggregate( [ ['$match' => ['adoptable' => true]], ['$count' => 'adoptableCatsCount'], ], ['session' => $session], )->toArray()[0]->adoptableCatsCount; $adoptablePetsCount += $dogsCollection->aggregate( [ ['$match' => ['adoptable' => true]], ['$count' => 'adoptableDogsCount'], ], ['session' => $session], )->toArray()[0]->adoptableDogsCount; var_dump($adoptablePetsCount);
db = client.pets with client.start_session(snapshot=True) as s: adoptablePetsCount = ( ( db.cats.aggregate( [{"$match": {"adoptable": True}}, {"$count": "adoptableCatsCount"}], session=s, ) ).next() )["adoptableCatsCount"] adoptablePetsCount += ( ( db.dogs.aggregate( [{"$match": {"adoptable": True}}, {"$count": "adoptableDogsCount"}], session=s, ) ).next() )["adoptableDogsCount"] print(adoptablePetsCount)
client = Mongo::Client.new(uri_string, database: "pets") client.start_session(snapshot: true) do |session| adoptable_pets_count = client['cats'].aggregate([ { "$match": { "adoptable": true } }, { "$count": "adoptable_cats_count" } ], session: session).first["adoptable_cats_count"] adoptable_pets_count += client['dogs'].aggregate([ { "$match": { "adoptable": true } }, { "$count": "adoptable_dogs_count" } ], session: session).first["adoptable_dogs_count"] puts adoptable_pets_count end
The preceding series of commands:
Uses
MongoClient()
to establish a connection to the MongoDB deployment.Switches to the
pets
database.Establishes a session. The command specifies
snapshot=True
, so the session uses read concern"snapshot"
.Performs these actions for each collection in the
pets
database:Prints the
adoptablePetsCount
variable.
All queries within the session read data as it appeared at the same point in time. As a result, the final count reflects a consistent snapshot of the data.
Note
If the session lasts longer than the WiredTiger history retention
period (300 seconds, by default), the query errors with a
SnapshotTooOld
error. To learn how to configure snapshot
retention and enable longer-running queries, see
Configure Snapshot Retention.
Read from a Consistent State of the Data from Some Point in the Past
Read concern "snapshot"
ensures that your query reads
data as it appeared at some single point in time in the recent past.
An online shoe store has a sales
collection that contains data for
each item sold at the store. For example, a document in the sales
collection looks like this:
{ "shoeType": "boot", "price": 30, "saleDate": ISODate("2022-02-02T06:01:17.171Z") }
Each day at midnight, a query runs to see how many pairs of shoes were sold that day. The daily sales query looks like this:
mongoc_client_session_t *cs = NULL; mongoc_collection_t *sales_collection = NULL; bson_error_t error; mongoc_session_opt_t *session_opts; bson_t *pipeline = NULL; bson_t opts = BSON_INITIALIZER; mongoc_cursor_t *cursor = NULL; const bson_t *doc = NULL; bool ok = true; bson_iter_t iter; int64_t total_sales = 0; sales_collection = mongoc_client_get_collection (client, "retail", "sales"); /* seed 'retail.sales' with example data */ if (!retail_setup (sales_collection)) { goto cleanup; } /* start a snapshot session */ session_opts = mongoc_session_opts_new (); mongoc_session_opts_set_snapshot (session_opts, true); cs = mongoc_client_start_session (client, session_opts, &error); mongoc_session_opts_destroy (session_opts); if (!cs) { MONGOC_ERROR ("Could not start session: %s", error.message); goto cleanup; } if (!mongoc_client_session_append (cs, &opts, &error)) { MONGOC_ERROR ("could not apply session options: %s", error.message); goto cleanup; } pipeline = BCON_NEW ("pipeline", "[", "{", "$match", "{", "$expr", "{", "$gt", "[", "$saleDate", "{", "$dateSubtract", "{", "startDate", "$$NOW", "unit", BCON_UTF8 ("day"), "amount", BCON_INT64 (1), "}", "}", "]", "}", "}", "}", "{", "$count", BCON_UTF8 ("totalDailySales"), "}", "]"); cursor = mongoc_collection_aggregate (sales_collection, MONGOC_QUERY_NONE, pipeline, &opts, NULL); bson_destroy (&opts); ok = mongoc_cursor_next (cursor, &doc); if (mongoc_cursor_error (cursor, &error)) { MONGOC_ERROR ("could not get totalDailySales: %s", error.message); goto cleanup; } if (!ok) { MONGOC_ERROR ("%s", "cursor has no results"); goto cleanup; } ok = bson_iter_init_find (&iter, doc, "totalDailySales"); if (ok) { total_sales = bson_iter_as_int64 (&iter); } else { MONGOC_ERROR ("%s", "missing key: 'totalDailySales'"); goto cleanup; }
ctx := context.TODO() sess, err := client.StartSession(options.Session().SetSnapshot(true)) if err != nil { return err } defer sess.EndSession(ctx) var totalDailySales int32 err = mongo.WithSession(ctx, sess, func(ctx context.Context) error { // Count the total daily sales const totalDailySalesOutput = "totalDailySales" cursor, err := db.Collection("sales").Aggregate(ctx, mongo.Pipeline{ bson.D{{"$match", bson.D{{"$expr", bson.D{{"$gt", bson.A{"$saleDate", bson.D{{"$dateSubtract", bson.D{ {"startDate", "$$NOW"}, {"unit", "day"}, {"amount", 1}, }, }}, }, }}, }}, }}, bson.D{{"$count", totalDailySalesOutput}}, }) if err != nil { return err } if !cursor.Next(ctx) { return fmt.Errorf("expected aggregate to return a document, but got none") } resp := cursor.Current.Lookup(totalDailySalesOutput) var ok bool totalDailySales, ok = resp.Int32OK() if !ok { return fmt.Errorf("failed to find int32 field %q in document %v", totalDailySalesOutput, cursor.Current) } return nil }) if err != nil { return err }
db = client.retail async with await client.start_session(snapshot=True) as s: docs = await db.sales.aggregate( [ { "$match": { "$expr": { "$gt": [ "$saleDate", { "$dateSubtract": { "startDate": "$$NOW", "unit": "day", "amount": 1, } }, ] } } }, {"$count": "totalDailySales"}, ], session=s, ).to_list(None) total = docs[0]["totalDailySales"] print(total)
$salesCollection = $client->selectCollection('retail', 'sales'); $session = $client->startSession(['snapshot' => true]); $totalDailySales = $salesCollection->aggregate( [ [ '$match' => [ '$expr' => [ '$gt' => ['$saleDate', [ '$dateSubtract' => [ 'startDate' => '$$NOW', 'unit' => 'day', 'amount' => 1, ], ], ], ], ], ], ['$count' => 'totalDailySales'], ], ['session' => $session], )->toArray()[0]->totalDailySales;
db = client.retail with client.start_session(snapshot=True) as s: _ = ( ( db.sales.aggregate( [ { "$match": { "$expr": { "$gt": [ "$saleDate", { "$dateSubtract": { "startDate": "$$NOW", "unit": "day", "amount": 1, } }, ] } } }, {"$count": "totalDailySales"}, ], session=s, ) ).next() )["totalDailySales"]
The preceding query:
Uses
$match
with$expr
to specify a filter on thesaleDate
field.$expr
allows the use of aggregation expressions (such asNOW
) in the$match
stage.
Uses the
$gt
operator and$dateSubtract
expression to return documents where thesaleDate
is greater than one day before the time the query is executed.Uses
$count
to return a count of the matching documents. The count is stored in thetotalDailySales
variable.Specifies read concern
"snapshot"
to ensure that the query reads from a single point in time.
The sales
collection is quite large, and as a result this query may
take a few minutes to run. Because the store is online, sales can occur
at any time of day.
For example, consider if:
The query begins executing at 12:00 AM.
A customer buys three pairs of shoes at 12:02 AM.
The query finishes executing at 12:04 AM.
If the query doesn't use read concern "snapshot"
, sales
that occur between when the query starts and when it finishes can be
included in the query count, despite not occurring on the day the report
is for. This could result in inaccurate reports with some sales being
counted twice.
By specifying read concern "snapshot"
, the query only
returns data that was present in the database at a point in time shortly
before the query started executing.
Note
If the query takes longer than the WiredTiger history retention
period (300 seconds, by default), the query errors with a
SnapshotTooOld
error. To learn how to configure snapshot
retention and enable longer-running queries, see
Configure Snapshot Retention.
Configure Snapshot Retention
By default, the WiredTiger storage engine retains history for 300
seconds. You can use a session with snapshot=true
for a total of 300
seconds from the time of the first operation in the session to the last.
If you use the session for a longer period of time, the session fails
with a SnapshotTooOld
error. Similarly, if you query data using read
concern "snapshot"
and your query lasts longer than 300
seconds, the query fails.
If your query or session run for longer than 300 seconds, consider
increasing the snapshot retention period. To increase the retention
period, modify the minSnapshotHistoryWindowInSeconds
parameter.
For example, this command sets the value of
minSnapshotHistoryWindowInSeconds
to 600 seconds:
db.adminCommand( { setParameter: 1, minSnapshotHistoryWindowInSeconds: 600 } )
Important
To modify minSnapshotHistoryWindowInSeconds
for a
MongoDB Atlas cluster, you must contact Atlas
Support.
Disk Space and History
Increasing the value of minSnapshotHistoryWindowInSeconds
increases disk usage because the server must maintain the history of
older modified values within the specified time window. The amount of
disk space used depends on your workload, with higher volume workloads
requiring more disk space.