MongoDB Network Compression: A Win-Win

Brian Leonard6 min read • Published Nov 03, 2021 • Updated Aug 13, 2024

MongoDB

Rate this tutorial

MongoDB Network Compression: A Win-Win

An under-advertised feature of MongoDB is its ability to compress data between the client and the server. The CRM company Close has a really nice article on how compression reduced their network traffic from about 140 Mbps to 65 Mpbs. As Close notes, with cloud data transfer costs ranging from $0.01 per GB and up, you can get a nice little savings with a simple configuration change.

MongoDB supports the following compressors:

snappy
zlib (Available starting in MongoDB 3.6)
zstd (Available starting in MongoDB 4.2)

Enabling compression from the client simply involves installing the desired compression library and then passing the compressor as an argument when you connect to MongoDB. For example:

1 client = MongoClient('mongodb://localhost', compressors='zstd')

This article provides two tuneable Python scripts, read-from-mongo.py and write-to-mongo.py, that you can use to see the impact of network compression yourself.

Setup

Client Configuration

Edit params.py and at a minimum, set your connection string. Other tunables include the amount of bytes to read and insert (default 10 MB) and the batch size to read (100 records) and insert (1 MB):

1 # Read to Mongo
2 target_read_database        = 'sample_airbnb'
3 target_read_collection      = 'listingsAndReviews'
4 megabytes_to_read           = 10
5 batch_size                  = 100   # Batch size in records (for reads)
6 
7 # Write to Mongo
8 drop_collection             = True  # Drop collection on run
9 target_write_database       = 'test'
10 target_write_collection     = 'network-compression-test'
11 megabytes_to_insert         = 10
12 batch_size_mb               = 1     # Batch size of bulk insert in megabytes

Compression Library

The snappy compression in Python requires the python-snappy package.

pip3 install python-snappy

The zstd compression requires the zstandard package

pip3 install zstandard

The zlib compression is native to Python.

Sample Data

My read-from-mongo.py script uses the Sample AirBnB Listings Dataset but ANY dataset will suffice for this test.

The write-to-mongo.py script generates sample data using the Python package Faker.

pip3 install faker

Execution

Read from Mongo

The cloud providers notably charge for data egress, so anything that reduces network traffic out is a win.

Let's first run the script without network compression (the default):

1 ✗ python3 read-from-mongo.py
2 
3 MongoDB Network Compression Test
4 Network Compression: Off
5 Now: 2021-11-03 12:24:00.904843
6 
7 Collection to read from: sample_airbnb.listingsAndReviews
8 Bytes to read: 10 MB
9 Bulk read size: 100 records
10 
11 1 megabytes read at 307.7 kilobytes/second
12 2 megabytes read at 317.6 kilobytes/second
13 3 megabytes read at 323.5 kilobytes/second
14 4 megabytes read at 318.0 kilobytes/second
15 5 megabytes read at 327.1 kilobytes/second
16 6 megabytes read at 325.3 kilobytes/second
17 7 megabytes read at 326.0 kilobytes/second
18 8 megabytes read at 324.0 kilobytes/second
19 9 megabytes read at 322.7 kilobytes/second
20 10 megabytes read at 321.0 kilobytes/second
21 
22  8600 records read in 31 seconds (276.0 records/second)
23 
24  MongoDB Server Reported Megabytes Out: 188.278 MB

You've obviously noticed the reported Megabytes out (188 MB) are more than 18 times our test size of 10 MBs. There are several reasons for this, including other workloads running on the server, data replication to secondary nodes, and the TCP packet being larger than just the data. Focus on the delta between the other tests runs.

The script accepts an optional compression argument, that must be either snappy, zlib or zstd. Let's run the test again using snappy, which is known to be fast, while sacrificing some compression:

1 ✗ python3 read-from-mongo.py -c "snappy"
2 
3 MongoDB Network Compression Test
4 Network Compression: snappy
5 Now: 2021-11-03 12:24:41.602969
6 
7 Collection to read from: sample_airbnb.listingsAndReviews
8 Bytes to read: 10 MB
9 Bulk read size: 100 records
10 
11 1 megabytes read at 500.8 kilobytes/second
12 2 megabytes read at 493.8 kilobytes/second
13 3 megabytes read at 486.7 kilobytes/second
14 4 megabytes read at 480.7 kilobytes/second
15 5 megabytes read at 480.1 kilobytes/second
16 6 megabytes read at 477.6 kilobytes/second
17 7 megabytes read at 488.4 kilobytes/second
18 8 megabytes read at 482.3 kilobytes/second
19 9 megabytes read at 482.4 kilobytes/second
20 10 megabytes read at 477.6 kilobytes/second
21 
22  8600 records read in 21 seconds (410.7 records/second)
23 
24  MongoDB Server Reported Megabytes Out: 126.55 MB

With snappy compression, our reported bytes out were about 62 MBs fewer. That's a 33% savings. But wait, the 10 MBs of data was read in 10 fewer seconds. That's also a 33% performance boost!

Let's try this again using zlib, which can achieve better compression, but at the expense of performance.

zlib compression supports an optional compression level. For this test I've set it to 9 (max compression).

1 ✗ python3 read-from-mongo.py -c "zlib"
2 
3 MongoDB Network Compression Test
4 Network Compression: zlib
5 Now: 2021-11-03 12:25:07.493369
6 
7 Collection to read from: sample_airbnb.listingsAndReviews
8 Bytes to read: 10 MB
9 Bulk read size: 100 records
10 
11 1 megabytes read at 362.0 kilobytes/second
12 2 megabytes read at 373.4 kilobytes/second
13 3 megabytes read at 394.8 kilobytes/second
14 4 megabytes read at 393.3 kilobytes/second
15 5 megabytes read at 398.1 kilobytes/second
16 6 megabytes read at 397.4 kilobytes/second
17 7 megabytes read at 402.9 kilobytes/second
18 8 megabytes read at 397.7 kilobytes/second
19 9 megabytes read at 402.7 kilobytes/second
20 10 megabytes read at 401.6 kilobytes/second
21 
22  8600 records read in 25 seconds (345.4 records/second)
23 
24  MongoDB Server Reported Megabytes Out: 67.705 MB

With zlib compression configured at its maximum compression level, we were able to achieve a 64% reduction in network egress, although it took 4 seconds longer. However, that's still a 19% performance improvement over using no compression at all.

Let's run a final test using zstd, which is advertised to bring together the speed of snappy with the compression efficiency of zlib:

1 ✗ python3 read-from-mongo.py -c "zstd"
2 
3 MongoDB Network Compression Test
4 Network Compression: zstd
5 Now: 2021-11-03 12:25:40.075553
6 
7 Collection to read from: sample_airbnb.listingsAndReviews
8 Bytes to read: 10 MB
9 Bulk read size: 100 records
10 
11 1 megabytes read at 886.1 kilobytes/second
12 2 megabytes read at 798.1 kilobytes/second
13 3 megabytes read at 772.2 kilobytes/second
14 4 megabytes read at 735.7 kilobytes/second
15 5 megabytes read at 734.4 kilobytes/second
16 6 megabytes read at 714.8 kilobytes/second
17 7 megabytes read at 709.4 kilobytes/second
18 8 megabytes read at 698.5 kilobytes/second
19 9 megabytes read at 701.9 kilobytes/second
20 10 megabytes read at 693.9 kilobytes/second
21 
22 8600 records read in 14 seconds (596.6 records/second)
23 
24 MongoDB Server Reported Megabytes Out: 61.254 MB

And sure enough, zstd lives up to its reputation, achieving 68% percent improvement in compression along with a 55% improvement in performance!

Write to Mongo

The cloud providers often don't charge us for data ingress. However, given the substantial performance improvements with read workloads, what can be expected from write workloads?

The write-to-mongo.py script writes a randomly generated document to the database and collection configured in params.py, the default being test.network_compression_test.

As before, let's run the test without compression:

1 python3 write-to-mongo.py
2 
3 MongoDB Network Compression Test
4 Network Compression: Off
5 Now: 2021-11-03 12:47:03.658036
6 
7 Bytes to insert: 10 MB
8 Bulk insert batch size: 1 MB
9 
10 1 megabytes inserted at 614.3 kilobytes/second
11 2 megabytes inserted at 639.3 kilobytes/second
12 3 megabytes inserted at 652.0 kilobytes/second
13 4 megabytes inserted at 631.0 kilobytes/second
14 5 megabytes inserted at 640.4 kilobytes/second
15 6 megabytes inserted at 645.3 kilobytes/second
16 7 megabytes inserted at 649.9 kilobytes/second
17 8 megabytes inserted at 652.7 kilobytes/second
18 9 megabytes inserted at 654.9 kilobytes/second
19 10 megabytes inserted at 657.2 kilobytes/second
20 
21  27778 records inserted in 15.0 seconds
22 
23  MongoDB Server Reported Megabytes In: 21.647 MB

So it took 15 seconds to write 27,778 records. Let's run the same test with zstd compression:

1 ✗ python3 write-to-mongo.py -c 'zstd'
2 
3 MongoDB Network Compression Test
4 Network Compression: zstd
5 Now: 2021-11-03 12:48:16.485174
6 
7 Bytes to insert: 10 MB
8 Bulk insert batch size: 1 MB
9 
10 1 megabytes inserted at 599.4 kilobytes/second
11 2 megabytes inserted at 645.4 kilobytes/second
12 3 megabytes inserted at 645.8 kilobytes/second
13 4 megabytes inserted at 660.1 kilobytes/second
14 5 megabytes inserted at 669.5 kilobytes/second
15 6 megabytes inserted at 665.3 kilobytes/second
16 7 megabytes inserted at 671.0 kilobytes/second
17 8 megabytes inserted at 675.2 kilobytes/second
18 9 megabytes inserted at 675.8 kilobytes/second
19 10 megabytes inserted at 676.7 kilobytes/second
20 
21  27778 records inserted in 15.0 seconds
22 
23  MongoDB Server Reported Megabytes In: 8.179 MB

Our reported megabytes in are reduced by 62%. However, our write performance remained identical. Personally, I think most of this is due to the time it takes the Faker library to generate the sample data. But having gained compression without a performance impact it is still a win.

Measurement

There are a couple of options for measuring network traffic. This script is using the db.serverStatus() physicalBytesOut and physicalBytesIn, reporting on the delta between the reading at the start and end of the test run. As mentioned previously, our measurements are corrupted by other network traffic occuring on the server, but my tests have shown a consistent improvement when run. Visually, my results achieved appear as follows:

Another option would be using a network analysis tool like Wireshark. But that's beyond the scope of this article for now.

Bottom line, compression reduces network traffic by more than 60%, which is in line with the improvement seen by Close. More importantly, compression also had a dramatic improvement on read performance. That's a Win-Win.

Rate this tutorial

Quickstart

Getting Started With MongoDB and FastAPI

Jul 12, 2024 | 7 min read

Tutorial

Building a Dynamic Pricing Microservice with Vertex AI and MongoDB Atlas

Jan 22, 2025 | 18 min read

Tutorial

How to Seed a MongoDB Database with Fake Data

Sep 23, 2022 | 2 min read

Tutorial

Building with Patterns: The Subset Pattern

Sep 23, 2022 | 3 min read

MongoDB Network Compression: A Win-Win

MongoDB

MongoDB Network Compression: A Win-Win

MongoDB Network Compression: A Win-Win

Setup

Client Configuration

Compression Library

Sample Data

Execution

Read from Mongo

Write to Mongo

Measurement

Related

Getting Started With MongoDB and FastAPI

Building a Dynamic Pricing Microservice with Vertex AI and MongoDB Atlas

How to Seed a MongoDB Database with Fake Data

Building with Patterns: The Subset Pattern

Table of Contents

1	# Read to Mongo
2	target_read_database = 'sample_airbnb'
3	target_read_collection = 'listingsAndReviews'
4	megabytes_to_read = 10
5	batch_size = 100 # Batch size in records (for reads)
6
7	# Write to Mongo
8	drop_collection = True # Drop collection on run
9	target_write_database = 'test'
10	target_write_collection = 'network-compression-test'
11	megabytes_to_insert = 10
12	batch_size_mb = 1 # Batch size of bulk insert in megabytes

1	✗ python3 read-from-mongo.py
2
3	MongoDB Network Compression Test
4	Network Compression: Off
5	Now: 2021-11-03 12:24:00.904843
6
7	Collection to read from: sample_airbnb.listingsAndReviews
8	Bytes to read: 10 MB
9	Bulk read size: 100 records
10
11	1 megabytes read at 307.7 kilobytes/second
12	2 megabytes read at 317.6 kilobytes/second
13	3 megabytes read at 323.5 kilobytes/second
14	4 megabytes read at 318.0 kilobytes/second
15	5 megabytes read at 327.1 kilobytes/second
16	6 megabytes read at 325.3 kilobytes/second
17	7 megabytes read at 326.0 kilobytes/second
18	8 megabytes read at 324.0 kilobytes/second
19	9 megabytes read at 322.7 kilobytes/second
20	10 megabytes read at 321.0 kilobytes/second
21
22	8600 records read in 31 seconds (276.0 records/second)
23
24	MongoDB Server Reported Megabytes Out: 188.278 MB

1	✗ python3 read-from-mongo.py -c "snappy"
2
3	MongoDB Network Compression Test
4	Network Compression: snappy
5	Now: 2021-11-03 12:24:41.602969
6
7	Collection to read from: sample_airbnb.listingsAndReviews
8	Bytes to read: 10 MB
9	Bulk read size: 100 records
10
11	1 megabytes read at 500.8 kilobytes/second
12	2 megabytes read at 493.8 kilobytes/second
13	3 megabytes read at 486.7 kilobytes/second
14	4 megabytes read at 480.7 kilobytes/second
15	5 megabytes read at 480.1 kilobytes/second
16	6 megabytes read at 477.6 kilobytes/second
17	7 megabytes read at 488.4 kilobytes/second
18	8 megabytes read at 482.3 kilobytes/second
19	9 megabytes read at 482.4 kilobytes/second
20	10 megabytes read at 477.6 kilobytes/second
21
22	8600 records read in 21 seconds (410.7 records/second)
23
24	MongoDB Server Reported Megabytes Out: 126.55 MB

1	✗ python3 read-from-mongo.py -c "zlib"
2
3	MongoDB Network Compression Test
4	Network Compression: zlib
5	Now: 2021-11-03 12:25:07.493369
6
7	Collection to read from: sample_airbnb.listingsAndReviews
8	Bytes to read: 10 MB
9	Bulk read size: 100 records
10
11	1 megabytes read at 362.0 kilobytes/second
12	2 megabytes read at 373.4 kilobytes/second
13	3 megabytes read at 394.8 kilobytes/second
14	4 megabytes read at 393.3 kilobytes/second
15	5 megabytes read at 398.1 kilobytes/second
16	6 megabytes read at 397.4 kilobytes/second
17	7 megabytes read at 402.9 kilobytes/second
18	8 megabytes read at 397.7 kilobytes/second
19	9 megabytes read at 402.7 kilobytes/second
20	10 megabytes read at 401.6 kilobytes/second
21
22	8600 records read in 25 seconds (345.4 records/second)
23
24	MongoDB Server Reported Megabytes Out: 67.705 MB

1	✗ python3 read-from-mongo.py -c "zstd"
2
3	MongoDB Network Compression Test
4	Network Compression: zstd
5	Now: 2021-11-03 12:25:40.075553
6
7	Collection to read from: sample_airbnb.listingsAndReviews
8	Bytes to read: 10 MB
9	Bulk read size: 100 records
10
11	1 megabytes read at 886.1 kilobytes/second
12	2 megabytes read at 798.1 kilobytes/second
13	3 megabytes read at 772.2 kilobytes/second
14	4 megabytes read at 735.7 kilobytes/second
15	5 megabytes read at 734.4 kilobytes/second
16	6 megabytes read at 714.8 kilobytes/second
17	7 megabytes read at 709.4 kilobytes/second
18	8 megabytes read at 698.5 kilobytes/second
19	9 megabytes read at 701.9 kilobytes/second
20	10 megabytes read at 693.9 kilobytes/second
21
22	8600 records read in 14 seconds (596.6 records/second)
23
24	MongoDB Server Reported Megabytes Out: 61.254 MB

1	python3 write-to-mongo.py
2
3	MongoDB Network Compression Test
4	Network Compression: Off
5	Now: 2021-11-03 12:47:03.658036
6
7	Bytes to insert: 10 MB
8	Bulk insert batch size: 1 MB
9
10	1 megabytes inserted at 614.3 kilobytes/second
11	2 megabytes inserted at 639.3 kilobytes/second
12	3 megabytes inserted at 652.0 kilobytes/second
13	4 megabytes inserted at 631.0 kilobytes/second
14	5 megabytes inserted at 640.4 kilobytes/second
15	6 megabytes inserted at 645.3 kilobytes/second
16	7 megabytes inserted at 649.9 kilobytes/second
17	8 megabytes inserted at 652.7 kilobytes/second
18	9 megabytes inserted at 654.9 kilobytes/second
19	10 megabytes inserted at 657.2 kilobytes/second
20
21	27778 records inserted in 15.0 seconds
22
23	MongoDB Server Reported Megabytes In: 21.647 MB

1	✗ python3 write-to-mongo.py -c 'zstd'
2
3	MongoDB Network Compression Test
4	Network Compression: zstd
5	Now: 2021-11-03 12:48:16.485174
6
7	Bytes to insert: 10 MB
8	Bulk insert batch size: 1 MB
9
10	1 megabytes inserted at 599.4 kilobytes/second
11	2 megabytes inserted at 645.4 kilobytes/second
12	3 megabytes inserted at 645.8 kilobytes/second
13	4 megabytes inserted at 660.1 kilobytes/second
14	5 megabytes inserted at 669.5 kilobytes/second
15	6 megabytes inserted at 665.3 kilobytes/second
16	7 megabytes inserted at 671.0 kilobytes/second
17	8 megabytes inserted at 675.2 kilobytes/second
18	9 megabytes inserted at 675.8 kilobytes/second
19	10 megabytes inserted at 676.7 kilobytes/second
20
21	27778 records inserted in 15.0 seconds
22
23	MongoDB Server Reported Megabytes In: 8.179 MB