5.0 zstd compression level is not working as expected

I have compared same data restoration to MongoDB 4.2, 5.0 (with compression level 6-default, 10, 15 and 22), but i found no data compression is found:
I have set zstd and block compreser level as expaling in config files as:

Anything I am missing here?

Results of same data restoration are:
Mongo 4.2: 7415377920 (Bytes)
Mongo 5.0 -level 22: 7418728448 (Bytes)
Mongo 5.0 -level 6: 7684075520 (Bytes)
Mongo 5.0 -Level 15: 7232811008 (Bytes)

Hi @Aayushi_Mangal

Compression performance depends very much on the documents. If the document contains a random string pattern, those are much less compressible compared to documents containing textual paragraph, for example.

Could you provide the example document that you tried with? And have you tried this experiment with different document patterns? It’ll be great if you can provide information and scripts to reproduce your tests.

As an aside, I usually use a tool like mgeneratejs to generate dummy example documents in large numbers. For example, I can easily create gigabytes of documents following a pattern using this.

Best regards
Kevin

3 Likes

Hi @kevinadi ,

Thank you for your response, please find below details to reproduce. Also if you can share any document or test that shows how these compression level worked, for our case it did nothing.
case 1:

  1. From MongoDB 4.2 we dump around 25 GB of data using mongodump.
  2. launch multiple mongod for 4.2, 5.0 (22, 10, 15, 6 – different compression level )
  3. restore the same data to all these versions to check what compression we are getting.

case2:

  1. we inserted bulk fresh dummy data in all these mongo, but found no difference in compresion.
  2. Sample document for this dummy data is:
{
    "_id" : ObjectId("62cc6f86b504c0604570bf9e"),
    "Traceid" : 1.0,
    "BillNO" : "Trace1",
    "CreatedDate" : ISODate("2020-08-27T21:04:35.967Z"),
    "CustID" : 11.0,
    "SystemType" : "Card",
    "DisplayID" : "123",
    "TraceNo" : "12231",
    "ManyDeliveryGroupID" : "123AS45",
    "traceSource" : "",
    "AdminBy" : "admin",
    "DeliveryNumber" : "121",
    "OriginType" : "abcdefg",
    "ProdInfo" : {
        "Vol" : {
            "Wt" : 55.555,
            "Ut" : null
        },
        "Size" : {
            "Wd" : null,
            "Len" : null,
            "Ht" : null,
            "Ut" : null
        }
    },
    "ReatInfo" : [ 
        {
            "Reat" : "Test",
            "SellerAddress" : "Test,Test",
            "ReatNo" : 8888.0,
            "ReatBillN" : "5655656",
            "ReatDate" : ISODate("2019-09-21T14:17:46.625Z"),
            "BillPri" : 101.0,
            "InvNo" : null,
            "GainSrN" : "2455",
            "ReatSrN" : "4554"
        }
    ],
    "SubSNo" : "12323",
    "ProvideType" : "JJJ",
    "Amount" : 45.0,
    "TraceType" : "",
    "CollectDetails" : {
        "Addr" : [ 
            {
                "Type" : "Sec",
                "Name" : "ABC",
                "Address" : "123,XYZ",
                "City" : "XYZ",
                "State" : "AB"
            }
        ],
        "ConnectInfo" : [ 
            {
                "Cate" : "",
                "Mob" : "1234567890"
            }
        ],
        "CollectDate" : null,
        "CollectTime" : {
            "Src" : null,
            "Dest" : null
        },
        "IsCollected" : null,
        "CollectCode" : "AB123",
        "Long" : null,
        "Lat" : null,
        "Loc" : null
    },
    "TraceDelivery" : "",
    "TraceParameter" : "",
    "TeacePrice" : 146.0,
    "MentionPrice" : 223.0,
    "ItemPrice" : 8290.0,
    "Comment" : "Valide Trace Data",
    "TransactionType" : "Card",
    "DisctinctID" : "",
    "CollectType" : "Vendor",
    "PCollectCode" : "12333",
    "DestDetails" : {
        "Addresses" : [ 
            {
                "Cate" : "PPP",
                "Name" : "dsdwidm",
                "Address" : "LMN",
                "City" : "LMN",
                "State" : "AB"
            }
        ],
        "ConnectInfo" : [ 
            {
                "Cate" : "",
                "Mob" : "2234566078"
            }
        ],
        "Submit" : ISODate("2020-10-31T19:33:14.892Z"),
        "SubTime" : {
            "Src" : null,
            "Dest" : null
        },
        "Long" : null,
        "Lat" : null,
        "Loc" : null
    },
    "RetnInfo" : {
        "Addr" : [ 
            {
                "Cate" : "Sec",
                "Name" : "tyty",
                "Address" : "dfdfd",
                "City" : "dfdf",
                "State" : "AA"
            }
        ],
        "ConnectInfo" : [ 
            {
                "Cate" : "PPP",
                "MoB" : "123456777"
            }
        ]
    },
    "TraceRNo" : "",
    "TraceDevCli" : "",
    "TeaceGID" : "",
    "TraceOAmt" : "",
    "Cust" : ""
}

1 Like

HI @Aayushi_Mangal

I did a quick test using ~25GB of data derived from the example document you provided.

This is the output of db.test.stats() of the collection using the standard snappy compression:

  ns: 'test.test',
  size: Long("27070806709"),
  count: 17166000,
  avgObjSize: 1577,
  storageSize: Long("10205900800"),

and this is the output of db.testzstd.stats() of the collection configured to use zstd:

  ns: 'test.testzstd',
  size: Long("27070806709"),
  count: 17166000,
  avgObjSize: 1577,
  storageSize: Long("6124052480"),

So the snappy-compressed collection uses about 9.5GB of storage, and the zstd-compressed collection (using the standard compression level) uses about 5.7GB. I’m using MongoDB 5.0.9.

So far I think it’s working for me, where zstd clearly shows an advantage.

Could you double check the experiment using the latest MongoDB version? E.g. for 5.0, please use 5.0.9, and for 4.2 please use 4.2.21

Best regards
Kevin

2 Likes

Hi @kevinadi ,

Thank you for testing it, but the test case you did does not seems the one I have tested. I am looking for the comparison between zstd itself with different compression level that is available from mongodb 5.0 along with MongoDB 4.2,

My test case referring to this https://www.mongodb.com/docs/manual/release-notes/5.0/#configurable-zstd-compression-level

I did same data restoration in mongodb 4.2, 5.0 (with compression level 6-default, 10, 15 and 22) with ZSTD only.

1 Like

Hi @Aayushi_Mangal,

Thanks for link reference and detailing the test information you performed :slight_smile:

I had inserted about 500K test documents into a MongoDB version 4.2.21 instance with default compression and then mongorestore the dump of this same data to several test instances with varying compressions:

  • MongoDB version 5.0.10 zstd compression level 22
  • MongoDB version 5.0.10 zstd compression level 10
  • MongoDB version 5.0.10 default compression

The results are here below (for all the below tests please take note of the decreasing storageSize values):

MongoDB version 4.2.21, default compression:

  ns: 'compressdb.compresscoll',
  size: 767500000,
  count: 500000,
  avgObjSize: 1535,
  storageSize: 64327680

MongoDB version 5.0.10 default compression:

  ns: 'compressdb.compresscoll',
  size: 767500000,
  count: 500000,
  avgObjSize: 1535,
  storageSize: 54767616

MongoDB version 5.0.10 zstd compression, level 10:

  ns: 'compressdb.compresscoll',
  size: 767500000,
  count: 500000,
  avgObjSize: 1535,
  storageSize: 27152384

MongoDB version 5.0.10 zstd compression, level 22:

  ns: 'compressdb.compresscoll',
  size: 767500000,
  count: 500000,
  avgObjSize: 1535,
  storageSize: 1257472

For case 2 in your reply within this post:

  1. we inserted bulk fresh dummy data in all these mongo, but found no difference in compresion.

Could you run a db.collection.stats() on each of your test cases / instances and advise the following values for each test instance:

  • storageSize
  • creationString

Regards,
Jason

2 Likes

Hello Jason,

Thank you so much for response and reproduction of this case.

I did test again by inserting test documents using this script mongo script for insert 100 million test data · GitHub

Please find details required:

MongoDB version 4.2.12

db.actlog.count()
105727766

“storageSize” : 1676038144

“creationString” : “access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none),block_allocation=best,block_compressor=zstd,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,type=file,value_format=u”

MongoDB version 5.0.8 —level 6

db.actlog.count()
105727766

“storageSize” : 1682145280

“creationString” : “access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=zstd,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,import=(enabled=false,file_metadata=,repair=false),internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,readonly=false,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,tiered_object=false,tiered_storage=(auth_token=,bucket=,bucket_prefix=,cache_directory=,local_retention=300,name=,object_target_size=10M),type=file,value_format=u,verbose=,write_timestamp_usage=none”

MongoDB version 5.0.8 —level 10

db.actlog.count()
105727766

“storageSize” : 1690370048,

“creationString” : “access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=zstd,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,import=(enabled=false,file_metadata=,repair=false),internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,readonly=false,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,tiered_object=false,tiered_storage=(auth_token=,bucket=,bucket_prefix=,cache_directory=,local_retention=300,name=,object_target_size=10M),type=file,value_format=u,verbose=,write_timestamp_usage=none”

MongoDB version 5.0.8 —level 22

db.actlog.count()
105727766

“storageSize” : 1705689088

“creationString” : “access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=zstd,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,import=(enabled=false,file_metadata=,repair=false),internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,readonly=false,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,tiered_object=false,tiered_storage=(auth_token=,bucket=,bucket_prefix=,cache_directory=,local_retention=300,name=,object_target_size=10M),type=file,value_format=u,verbose=,write_timestamp_usage=none”,

I must be missing something, as “creationString” looks identical for 5.0 version. Please suggest if any parameter needs to check or something i am missing here.

1 Like

Hi Jason,

Meanwhile you are checking, could you please share configuration file you are using for 5.0 version, so I will compare that with mine if any parameter miss.

Hi @Aayushi_Mangal,

Thanks for providing those details :slight_smile:

Firstly, I would like to note my results in my previous post were changed and eventually reached a stage where the storageSize was very close in value for various zstd compression levels (differing by 1-2%). I had noted these values very shortly after importing the data but did not realise the storageSize was growing over a few minutes for some of the higher compression levels set due to some internal WiredTiger processes (e.g. level 22).

From some very basic zstd compression testing, I performed a level 6 vs level 22 compression using the zstd command-line tool of a 1480 byte BSON file outside of MongoDB, which will hopefully mirror what is happening inside MongoDB to some extent (As far as I know, MongoDB compresses each document individually). The compression difference was ~0.8% smaller file size when using level 22 compression compared to level 6:

$ls -l
-rw-r--r--  1 user  staff  1480 12 Aug 14:11 testcoll.bson /// <--- Original
-rw-r--r--  1 user  staff   786 12 Aug 14:11 testcoll6.bson.zst /// <--- zstd compressed level 6
-rw-r--r--  1 user  staff   780 12 Aug 14:11 testcoll22.bson /// <--- zstd compress level 22 

Please note that this is a very simple demonstration for a singular compression use case and that the manner WiredTiger utilises zstd has differences from how it was used in the example above.

This may be a case where the lower levels of compression have reached towards lower limit and higher levels of compression cannot compress much further beyond that until the lower limit. The amount of compression generally depends on the type of data as well.

Regards,
Jason

2 Likes

Hi Jason,
Thank you for the confirmation and detailed response.

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.