Does Spark connector support "INSERT IGNORE"?

Yj_hwang · April 21, 2020, 1:28pm

Hi.
I’m looking for “INSERT IGNORE” feature in mongodb-spark-connector.
There’s a unique key with multiple column in mongdb, and I wrote some daily batch running on spark. The batch should be retriable, I mean idempotent, so when writing to db, I want to ignore some duplicate error cases.
I’ve seen SaveMode.Overwrite implementation, but it just drop the collection. This is not I’m looking for.

github.com

mongodb/mongo-spark/blob/master/src/main/scala/com/mongodb/spark/sql/DefaultSource.scala#L71-L72


      
          case Overwrite =>
            mongoConnector.withCollectionDo(writeConfig, { collection: MongoCollection[Document] => collection.drop() })

Is there a way to insert and ignore the duplication error?

marcoslimadev · June 28, 2024, 4:54pm

I recently went through this same scenario. To solve this we can use mode(“append”) and add two options (operationType and upsertDocument).
If the item exists we replace it and do not duplicate it.
Example this soluction:

df_test.write
.format(“mongodb”)
.mode(“append”)
.option(“connection.uri”, “”)
.option(“database”, “”)
.option(“collection”, “”)
.option(“ignoreNullValues”, True)
.option(“operationType”,“replace”)
.option("upsertDocument ", True)
.save()

follow the link for learn more about this configurations type: