Hi everyone. I am currently facing an extremely weird behaviour from one of our test MongoDB cluster.
We are currently running some Glue-based data migration pipelines, mapping data from a bunch of CSVs into our MongoDB. Everything seems fine, except for a very strange Int32-type field of one of the collection. The field at first, right after insertion is populated with the correct data from the CSVs. But after one full table read, of any kind (normal query, read from Spark connector, dump collection to CSV,… etc…) all of the values in said field is turned into 0, every single one of them.
We are, dumbfounded at first, checked the input CSVs, checked the pipelines, output that field during mapping Glue jobs runs, aggregate the field during the mapping jobs runs, … none gives us any clue of how this is happening.
Im writing this in request of the community for this strange problem that we are having, looking for people who has experienced the same thing and just about any hint on what could be the root cause for this.
Every single value and for only this field.
This looks like a typo error in the name of the field. Upper/lower case issues are the hardest to spot.
@Quan_Le_Anh this appears to be the same question as https://www.reddit.com/r/mongodb/comments/1h2p3px/mysterious_loss_of_data_in_a_very_strange_manner/.
The OP response there to some of the threads also makes it sound like this is solved now:
The oplog was mad helpful, I inserted into a new field (no, renamed the old field) and turns out one of the K8S Pod (we have a cloned app cluster, but essentially abandoned till the migration process is done) is actively re-inserting the 0 value because it is a cyclic field (renewed every week, the data is 2 months old, there is another field for cond check, the checking interval is 5 seconds). Shit was crazy haunted till the DevOps guy comes by and said “yeah we cloned the whole thing even the K8S cron Pod” so while Atlas is not showing any native Mongo crons we saw 2 inserts in the oplog.rs . This stupif “bug” took us 4 days to come by. I guess I should be more specific when telling the Ops guy to provide an env thats “as close as possible to prod” lmao.
If you’re copy/pasting questions between sites like Reddit, SO or elsewhere it wouldn’t hurt to link those together to prevent duplication of effort and to promote discoverability