MongoDB write performance

Hi everyone,

I’m using the Spark connector (10.3.0) to copy data from BigQuery to Mongo. About 100M small records (about 250 bytes each). Beside the index on _id: ObjectID, there is one compound index on two fields.

When I tried to use “insert” only, with a M10 cluster, sometimes I got 10+K inserted documents per second, but most of the time I got only ~2K/sec.
When I tried to use “replace + upsert”, I usually got 0.6-0.7K/sec.
When the performance was bad (~2K), CPU IOWAIT was usually high, between 80-130%.

I tried to scale the cluster up to M30. Despite CPU IOWAIT reduced to about 50%, the performance hadn’t improved.

My question is, if I want to finish upserting that amount of data in a short period of time like 1-2 hours, what can I do? What is the typical bulk write speed should I expect from a M10 or M30 cluster?

P/S: I’m using serverless GCP DataProc to run Spark. There’s hardly any transformation needed - just read from BigQuery and write to Mongo.

Thank you

Indexes impact your write performance, so you could trying disabling your compound index while loading the data and then build it afterwards.

Thanks Peter.
I tried to use one single index (the one on _id), the performance increased by a few percents.