i have some code that runs a couple of aggregation queries in succession:
- aggregates on an input collection and uses $out to a tmp collection
- aggregates on the tmp collection in a loop, then uses $merge on each iteration to output the matched documents to the final output collection.
i’m on a three-node replica set in atlas and i’m wrapping everything in a causally consistent client session. however, there are still some instances where the second agg query isn’t finding the results of the first query, which causes my final collection to be missing some documents.
it seems like the causally consistent client session should guarantee these run in order, but i do see in the documentation that “Read operations of the $out
statement occur on the secondary nodes, while the write operations occur only on the primary nodes.” (see https://www.mongodb.com/docs/manual/reference/operator/aggregation/out/)
the $merge page says something similar. does this mean that $merge and $out are reading data from secondaries without regard for the client session? that is the only thing i can think of that would make this make sense.
when i explicitly make the tmp collection with read preference = primary, this issue disappears.