updateOne of single property creates huge bandwidth usage on large documents

Rafael_Polit · August 4, 2024, 11:07pm

Good afternoon.

We are doing an updateOne procedure on a document with a $set process on a single, very small property of the document. The documents in this collection, however, have ANOTHER property which is a large array of values.

The updateOne will incur in a HUGE network traffic (according to our Grafana stats) on this operation. Calling this updateOne multiple times, in sync (awaited) sequence, yields compound network traffic that, eventually, clogs our infrastructure.

This produces network traffic peaks of 400 MB/s or so, sustained for about 60 secs or more.

We are trying to figure out what could be the underlying problem here, and if we can somehow prevent this issue without changing the topology of the collection?

Here are some things we have tried:

remove the large property (which I must insist is NOT being modified by the $set operator) from all the documents to ensure the size of the document is what’s causing the problem. Indeed, almost no traffic at all with the smaller document, 3 MB/s
removed the sequential calls on updateOne and update only one document (again, never modifying the large property). Still, network traffic high-ish to 80 MB/s (on a single document!)
timeout of a second between calls to the in-series updateOne results in an over-time lesser network peaks of around 100 MB/s for a longer period than the fully sequential approach.

As mentioned, we would like to avoid having re-structure the collection… but, most importantly: what is causing this traffic? It is a SINGLE, VERY SMALL property being update via the $set operator, yet the underlying size of the document causes higher network traffic depending on it’s size? We simply don’t get it.

We also are not sure where this traffic is going, the servers containing the other replicas don’t appear to receive this traffic, the node client is not getting the full size document at all, only the results of the update operation.

Thanks for any insight, best regards,
Rafa.

chris · August 5, 2024, 3:50pm

Hi @Rafael_Polit

Is the host where MongoDB is running dedicated to MongoDB or are other processes running on the host that could be contributing to the network utilization?

What is being measured here, total host network IO or mongodb specific traffic?

This is a good observation. This indicates to me that the update itself is not the issue, rather some consequence of the update. For me this also rules out traffic due to network storage(if it was used)

Is there some process that is querying the Primary any time a change occurs on that collection?

Are there metrics for opcounters on this member when this occurs or can mongostat be run?

Rafael_Polit · August 6, 2024, 6:06am

Thanks Chris for this reply.

You are indeed correct, it was a watcher fetching the data with each change. Running sequential changes caused the watcher to run many times over.

We figured this out a couple of days ago, my post took about 3 days or more to get approved… you would have saved us quite a bit of time, and I appreciate the thought into it. You really nailed it.

I do believe that post approval could be a bit faster for these things that require semi-urgent solutions.

Thanks again,
Rafa.

system · August 11, 2024, 6:06am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.