Trigger failure due to primary node change

We’re running a mongoDB cluster on Atlas which makes use of a few triggered functions. One in particular will populate a target collection with a transformed version of a document whenever said document is inserted/updated in a source collection (by executing an aggregation pipeline against the changed document).

While investigating a discrepancy in document counts between the two collections, we’ve identified that the trigger fails to execute properly when the primary is down.

Specifically, during a cluster auto-scaling event, our triggers failed to execute and threw this error:

(PrimarySteppedDown) PlanExecutor error during aggregation :: caused by :: No primary exists currently

The meat of the function is pretty straightforward:

  try {
    // If this is a "delete" event, delete the document in the other collection
    if (changeEvent.operationType === "delete") {
      await targetCollection.deleteOne({"_id": docId});
      console.log("deleted doc id: ",docId.id);
    }


    // If this is an "insert", "update" or "replace" event, then execute pipeline on doc in the source collection to replace the document in the target collection
    else if (changeEvent.operationType === "insert" || changeEvent.operationType === "update" || changeEvent.operationType === "replace") {
      
      await sourceCollection.aggregate(pipeline).toArray();
      console.log("updated doc id: ",docId.id);
      
    }
  } catch(err) {
    console.log("error performing mongodb write: ",err.message);
  }

What can we do to ensure that a trigger will execute properly in the face of an auto-scaling event?

Hey Robert_Lancaster,

I understand that you are facing some issue with trigger failure so I am here to provide you some solutio to get out of it.

To make sure that triggers execute appropriately during auto-scaling events in a MongoDB cluster on Atlas, you can deal with the “PrimarySteppedDown” error by implementing a retry mechanism. Here is a worked on approach:

try {
  // Your trigger logic here
} catch (err) {
  if (err.message.includes("PrimarySteppedDown")) {
    // Retry logic
    await new Promise(resolve => setTimeout(resolve, 1000)); // Wait for a short duration
    // Retry the trigger logic
    try {
      // Your trigger logic again
    } catch (retryErr) {
      console.log("Error performing MongoDB write even after retry:", retryErr.message);
    }
  } else {
    console.log("Error performing MongoDB write:", err.message);
  }
}

This code gets the “PrimarySteppedDown” mistake, wait for a short duration (1 second in this model), and afterward retries the trigger logic. Depending on your requirements and the expected downtime during auto-scaling events, you can adjust the retry interval and number of retries.

Thanks
(James)