MongoDB write performance

Averell_Tran · June 6, 2024, 11:31am

Hi everyone,

I’m using the Spark connector (10.3.0) to copy data from BigQuery to Mongo. About 100M small records (about 250 bytes each). Beside the index on _id: ObjectID, there is one compound index on two fields.

When I tried to use “insert” only, with a M10 cluster, sometimes I got 10+K inserted documents per second, but most of the time I got only ~2K/sec.
When I tried to use “replace + upsert”, I usually got 0.6-0.7K/sec.
When the performance was bad (~2K), CPU IOWAIT was usually high, between 80-130%.

I tried to scale the cluster up to M30. Despite CPU IOWAIT reduced to about 50%, the performance hadn’t improved.

My question is, if I want to finish upserting that amount of data in a short period of time like 1-2 hours, what can I do? What is the typical bulk write speed should I expect from a M10 or M30 cluster?

P/S: I’m using serverless GCP DataProc to run Spark. There’s hardly any transformation needed - just read from BigQuery and write to Mongo.

Thank you

Peter_Hubbard · June 7, 2024, 8:26am

Indexes impact your write performance, so you could trying disabling your compound index while loading the data and then build it afterwards.

Averell_Tran · June 11, 2024, 3:55am

Thanks Peter.
I tried to use one single index (the one on _id), the performance increased by a few percents.