Atlas x PyMongo x GCP Cloud Run Functions = sometimes Timeout

Hello,
I have sets of web services running on Google Cloud Run Functions (also called Cloud Functions gen 2), supporting concurrency.

With Pymongo (4.8.0), I open a connection to the Atlas using srv url like this:

uri = "mongodb+srv://myAtlas URL/?authSource=%24external&authMechanism=MONGODB-X509&retryWrites=true&w=majority".format(os.getenv("MONGODB_SRV"))
client = MongoClient(uri, tls=True,  tlsCAFile=certifi.where(), tlsCertificateKeyFile='secret/X509-cert.pem', server_api=ServerApi('1'), tz_aware=True)

Following best practices (https://www.mongodb.com/docs/atlas/manage-connections-google-cloud/) I open the connection on top of the function, to avoid connection to be recreated upon API call. I didn’t adjust the maxIdleTimeMS settings yet, because I presume cause is elsewhere.

This works 100% for the first call (after a cold star of the cloud run functions container), but calls after can fail randomly.
In the logs, I have:

  • One warning : MongoClient opened before fork. May not be entirely fork-safe, proceed with caution. See PyMongo’s documentation for details: Frequently Asked Questions - PyMongo 4.8.0 documentation
  • One exception : pymongo.errors.ServerSelectionTimeoutError: No replica set members found yet, Timeout: 30s

Note : I do not use any VPC Peering or NAT, it’s going via public network.

For the warning, am I supposed to do the init of the Mongo Client in a different way?
What could be the reason of the exception? Does the connection can be recreated automatically after idle time?
Thanks

Hi @Rapha_Ben, I opened https://jira.mongodb.org/browse/PYTHON-4699 to investigate, but it looks like the behavior on Google Cloud Run Functions is not what we expected. PyMongo itself is not fork-safe, and it looks like the pattern recommended in the docs might be resulting in a fork operation. For now I would recommend recreating the MongoClient upon API call.

1 Like

@Rapha_Ben what libraries/frameworks does your app use? Are one of those libraries implicitly calling fork() or multiprocessing?

Explicitely, I do not use anything but pyMongo, but I think Google Cloud Functions uses Flask behind.
I’ve tried to set the max instance count to 1 to avoid new instance, but it still fails randomly…
Let me know if I can help for something.

Thanks @Rapha_Ben, that’s all we need for now. We’ll look into it in PYTHON-4699. I wonder if something has recently changed in Cloud Run Functions (formerly Cloud Functions gen 2)?

Actually, this was my first deployment on gen 2, so I don’t know :confused:
I had no problem with the gen 1 (no support of concurrency, so probably no internal fork)

@Rapha_Ben we can reproduce the fork warning. We recommend you add connect=False when creating the client and ensure the client is only used within the request handler (eg don’t run client.server_info() or client.admin.command() at the global level):

client = MongoClient(uri, tls=True,  tlsCAFile=certifi.where(), tlsCertificateKeyFile='secret/X509-cert.pem', server_api=ServerApi('1'), tz_aware=True, connect=False)

Could you confirm whether this fixes the warnings and ServerSelectionTimeouts you’re seeing? If not, could you try downgrading to pymongo 4.7?

We will update the docs to mention this. Thanks for reporting.

Thanks, I’ve added the connect=False flag, and I’ve waited several minutes between calls, to see forking/cold starting behavior.
It seems to be solved now!
Thank you!