I am using MongoDB C# driver version 2.17.0 in a prod service and found that the service faces a lot of MongoDB.Driver.MongoWaitQueueFullException every day. Initially I thought that the service was getting a lot of requests and it was hitting the wait queue limits. So, I started tracking connection count and wait queue size per service instance with events.
Then I found that the service was well below the default active connection limit of 100 and the wait queue was empty most of the time too. But the exception was happening whenever there was a sharp rise in requests to open new connections to MongoDB. For example, 30 requests to open new connections in 10 seconds. I was able to re-produce this locally with k6 load testing tool. I made 100 requests gradually in 10 seconds and a small percentage (approx. 3% to 7%) of requests failed due to MongoWaitQueueFullException on multiple runs.
Is it possible to avoid this error since the service is not actually hitting the wait queue limits? One potential solution that comes to my mind is to open a minimum number of connections on startup so that my service always has some available connections to spare and can deal with sharp increase in requests more gracefully. Is there any other potential solution?
Welcome to the MongoDB Community Forums. I understand that you are experiencing MongoWaitQueueFullExceptions sporadically in your production application.
The default wait queue size is the WaitQueueMulitplier (default 5) times the MaxPoolSize (default 100). But what is the wait queue and why do you receive MongoWaitQueueFullExceptions? To understand this, let’s talk about server selection and connection pools…
When you execute an operation on your MongoDB cluster, the first step is server selection. If the operation is a write, that write must be executed against the primary (or a mongos in a sharded cluster which will then route it to the correct primary). If it is a read, the driver will evaluate the requested read preference against the cached cluster topology to look for a suitable node. This includes the node’s state (e.g. primary, secondary, etc.), latency, max staleness, and other factors. See Server Selection for a detailed explanation.
Once a server has been selected, the driver will attempt to check a connection out of the connection pool for that node. First it enters the wait queue, which specifies how many threads can block waiting for a connection. (As mentioned above the default number is 500.) If a connection is available, it will be checked out and the wait queue exited. If one is not available but the pool is not at maxPoolSize, a new connection will be established and then the wait queue exited.
To help prevent connection storms, MongoDB .NET/C# Driver 2.13.0 introduced maxConnecting, which limits the number of connection establishment requests to a cluster node. maxConnecting was made configurable in 2.14.0. The default value is 2.
If you are not at maxPoolSize but are seeing MongoWaitQueueFullExceptions, you may be slow to establish new connections (2 concurrent to a single cluster node) causing a lot of threads to block on connection establishment. You can try increasing maxConnecting either via the connection string or via MongoClientSettings.MaxConnecting. I would suggest trying 4 (e.g. double the default value) and gradually increasing from there to see if it resolves the issue.
Hopefully tuning maxConnecting alleviates the MongoWaitQueueFullExceptions in your deployment. Please let us know the results of this tuning or if you have additional questions.