Encryption in Use: The Magic of MongoDB for Secure Queries
Avalie esse Tutorial
You’re responsible for a payroll system at a fast-growing tech company. Employees rely on you to protect highly sensitive data—things like salaries, bonuses, and personal information. Meanwhile, privacy regulations such as GDPR and LGPD demand strict compliance, leaving no room for mistakes. The stakes are high: Secure the data, or risk penalties and broken trust. But traditional encryption isn’t enough. Protecting data while it’s in transit or stored only solves part of the problem. What happens when HR needs to run a query on that data? You need an encryption solution that works seamlessly even during data processing—one that keeps everything secure but still allows meaningful searches, like “find all employees earning over $10,000.” That’s where MongoDB’s CSFLE and Queryable Encryption come in, offering cutting-edge tools to balance security and usability.
Before diving into the specifics of CSFLE and Queryable Encryption, let’s revisit the core encryption methods MongoDB offers:
MongoDB secures data in motion by encrypting client-server traffic with TLS/SSL. This method is essential for protecting data during network transmission. Also, all communications between nodes and processes are encrypted with TLS, ensuring comprehensive security for data in motion.
Encryption at rest secures data stored on disk, available in MongoDB Enterprise Advanced and Atlas versions. This approach protects data even if an adversary gains access to the disk or underlying database files, ensuring confidentiality beyond just server security.
Protects data at all stages of its lifecycle transmission, storage, and processing. MongoDB offers two approaches for in-use encryption:
- Queryable Encryption
- Client-side field-level encryption (CSFLE)
While there is some overlap, there are key differences between CSFLE and Queryable Encryption.
- Uses deterministic encryption for fields needing equality queries, producing identical ciphertexts for identical values
- Enables exact match queries but can lead to significant information leakage in some cases, as patterns may emerge, especially with low-cardinality data
- Uses non-deterministic encryption, generating unique ciphertexts for identical values, making attacks more difficult
- Supports equality queries and, from MongoDB 8.0 onward, range queries using operators like
$lt
,$lte
,$gt
, and$gte
- Supports only exact match (equality) queries on deterministically encrypted fields.
- Suitable for scenarios that don’t require complex query operations; this makes CSFLE ideal where equality is the only required query type
- Offers advanced query support for encrypted data; besides equality queries, MongoDB 8.0 adds support for range queries using operators like
$lt
,$lte
,$gt
, and$gte
- Will soon expand this functionality to include prefix, suffix, and substring queries, enhancing flexibility for encrypted data querying
- Ideal for scenarios where full control over encryption keys is needed, and equality queries suffice for application requirements
- Particularly useful for highly sensitive data that requires protection throughout client-server communication, offering granular control over the encryption process
- Recommended for applications requiring complex encrypted data queries, such as date ranges or numerical values
- Its design leaks less information about the encrypted data than CSFLE; the MongoDB Cryptography Research Group has performed extensive peer-reviewed research to analyze the security of its design and implementation
To demonstrate the implementation, we’ll create a simple Python application to securely manage employee information, storing and querying sensitive data (such as names and salaries) using MongoDB’s advanced encryption.
The following table shows which MongoDB editions support CSFLE and Queryable Encryption:
Product Name | Automatic Encryption Support | Explicit Encryption Support |
---|---|---|
MongoDB Atlas | Yes | Yes |
MongoDB Enterprise Advanced | Yes | Yes |
MongoDB Community Edition | No | Yes |
- Install the Python module:
1 pip install pymongo 2 python -m pip install 'pymongo[encryption]'
To encrypt data, we use a master key, which can be loaded from a local file (for demonstration purposes only). The following code checks if the key file exists and creates it, if necessary. To create a new local provider, create a key with this command: For Shell Unix:
1 echo $(head -c 96 /dev/urandom | base64 | tr -d '\n')
For PowerShell:
1 $r=[byte[]]::new(64);$g=[System.Security.Cryptography.RandomNumberGenerator]::Create();$g.GetBytes($r);[Convert]::ToBase64String($r)
Note: A local key provider is not secure for production environments. For production, use a remote key management system (KMS) such as AWS KMS, Azure Key Vault, or Google Cloud KMS for enhanced security and access control. MongoDB also supports local KMIP providers, like HashiCorp Vault, for production-grade key management. Place the key in the code below in YOUR_MASTER_KEY_HERE.
1 def load_local_master_key(filename): 2 if not os.path.exists(filename): 3 key = "YOUR_MASTER_KEY_HERE" 4 with open(filename, "w") as f: 5 f.write(key) 6 with open(filename, "r") as f: 7 return f.read()
Define the variables for the MongoDB connection string, database, and collection names, as well as the key vault namespace and KMS providers.
Note: Replace with your MongoDB Atlas URI.
1 uri = "<URI>" 2 key_vault_namespace = "encryption.__keyVault" 3 encrypted_database_name = "employee_data" 4 encrypted_collection_name = "employee_salary" 5 6 local_master_key_file = "local_master_key.txt" 7 local_master_key = load_local_master_key(local_master_key_file) 8 9 kms_providers = {"local": {"key": local_master_key}} 10 encrypted_fields_map = get_encrypted_fields_map()
The getEncryptedFieldsMap function specifies which fields will be encrypted and the allowed query types. In this example, name supports equality queries, and salary supports range queries.
1 def get_encrypted_fields_map(): 2 return { 3 "fields": [ 4 { 5 "keyId": None, 6 "path": "name", 7 "bsonType": "string", 8 "queries": [{"queryType": "equality"}], 9 }, 10 { 11 "keyId": None, 12 "path": "salary", 13 "bsonType": "int", 14 "queries": [{"queryType": "range", "min": 0, "max": 1000000}], 15 }, 16 ] 17 }
Configure automatic encryption options with the AutoEncryptionOpts function, specifying the key vault namespace, KMS provider, and shared encryption library path.
Note: In the AutoEncryptionOpts function, replace the path to the shared encryption library with the correct path on your system.
1 auto_encryption_opts = AutoEncryptionOpts( 2 kms_providers=kms_providers, 3 key_vault_namespace=key_vault_namespace, 4 crypt_shared_lib_path="./mongo_crypt_shared_v1-macos-arm64-enterprise-8.0.3/lib/mongo_crypt_v1.dylib", 5 encrypted_fields_map=encrypted_fields_map, 6 )
Create a MongoDB client with automatic encryption enabled.
1 client = MongoClient(uri, auto_encryption_opts=auto_encryption_opts) 2 db = client[encrypted_database_name] 3 collection = db[encrypted_collection_name] 4 collection.drop()
Insert example documents into the encrypted collection, with sensitive information protected by encryption.
1 employees = [ 2 EmployeeDocument("Alice Johnson", "Software Engineer", "MongoDB", 100000, "USD", datetime.datetime(2019, 3, 5)), 3 EmployeeDocument("Bob Smith", "Product Manager", "MongoDB", 150000, "USD", datetime.datetime(2018, 6, 15)), 4 EmployeeDocument("Charlie Brown", "Data Analyst", "MongoDB", 200000, "USD", datetime.datetime(2020, 4, 20)), 5 EmployeeDocument("Diana Prince", "HR Specialist", "MongoDB", 250000, "USD", datetime.datetime(2021, 12, 3)), 6 EmployeeDocument("Evan Peters", "Marketing Coordinator", "MongoDB", 80000, "USD", datetime.datetime(2022, 10, 7)), 7 ] 8 9 for employee in employees: 10 collection.insert_one(employee.__dict__) 11 print(f"Inserted document for {employee.name}")
Without the key, we can’t see the data. It worked.
Finally, perform queries on the encrypted collection. The search_by_name function searches documents by name, while search_by_salary_range uses a salary range filter.
1 def search_by_name(coll, name): 2 filter = {"name": name} 3 cursor = coll.find(filter) 4 print(f"Documents with name '{name}':") 5 for doc in cursor: 6 print(json.dumps(doc, indent=4, default=json_util.default)) 7 8 def search_by_salary_range(coll, min_salary, max_salary): 9 filter = {"salary": {"$gte": min_salary, "$lte": max_salary}} 10 cursor = coll.find(filter) 11 print(f"Documents with salary between {min_salary} and {max_salary}:") 12 for doc in cursor: 13 print(json.dumps(doc, indent=4, default=json_util.default)) 14 15 search_by_name(collection, "Alice Johnson") 16 search_by_salary_range(collection, 150000, 200000)
As we can see, in MongoDB version 8, we can also do this with data ranges. You can check the entire code in the GitHub repository and more informations in the MongoDB documentation.
Define the variables for the MongoDB connection string, database, and collection names, as well as the key vault namespace and KMS providers. Note: Replace with your MongoDB Atlas URI.
1 uri = "<URI>" 2 local_master_key_file = "local_master_key.txt" 3 local_master_key = load_local_master_key(local_master_key_file) 4 kms_providers = setup_kms_providers(local_master_key) 5 key_vault_namespace = "encryption.__keyVault" 6 7 client = connect_mongo(uri) 8 database_name = "employee_data" 9 collection_name = "employee_salary" 10 coll = client[database_name][collection_name] 11 coll.drop() 12 key_vault_coll = client["encryption"]["__keyVault"] 13 ensure_key_vault_index(key_vault_coll)
We ensure the keyAltNames index exists in the Key Vault, allowing alternate keys to be unique and filtered.
1 def ensure_key_vault_index(key_vault_coll): 2 index_name = "keyAltNames_1" 3 indexes = key_vault_coll.index_information() 4 if index_name not in indexes: 5 key_vault_coll.create_index( 6 [("keyAltNames", 1)], unique=True, partialFilterExpression={"keyAltNames": {"$exists": True}} 7 ) 8 print("Index created in key vault.") 9 else: 10 print("Index already exists in key vault.")
The ensure_data_key function checks if a specific data key already exists. If it doesn’t, it creates a new one.
1 def ensure_data_key(client_encryption, key_vault_coll, key_alt_name): 2 existing_key = key_vault_coll.find_one({"keyAltNames": key_alt_name}) 3 if existing_key: 4 print("Data key already exists. Reusing existing key.") 5 return existing_key["_id"] 6 7 print("Creating new data key.") 8 data_key_id = client_encryption.create_data_key("local", key_alt_names=[key_alt_name]) 9 return data_key_id
The encrypt_salary function converts the salary to cents, prepares the value for encryption, and encrypts it using the configured MongoDB encryption algorithm.
1 def encrypt_salary(client_encryption, data_key_id, salary): 2 salary_in_cents = int(salary * 100) 3 encrypted_salary = client_encryption.encrypt( 4 value=salary_in_cents, 5 algorithm="AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic", 6 key_id=data_key_id 7 ) 8 return encrypted_salary
After encrypting the salary, we insert each employee document into the collection. The document includes fields such as Name, Position, Company, Currency, Start Date, and the encrypted salary.
1 def insert_employee_doc(coll, name, position, company, salary_encrypted, currency, start_date): 2 employee_doc = { 3 "name": name, 4 "position": position, 5 "company": company, 6 "salary": salary_encrypted, 7 "currency": currency, 8 "startDate": start_date 9 } 10 coll.insert_one(employee_doc)
1 employees = [ 2 {"name": "Alice Johnson", "position": "Software Engineer", "company": "MongoDB", "salary": 50000, "currency": "USD", "start_date": datetime(2007, 2, 3)}, 3 {"name": "Bob Smith", "position": "Product Manager", "company": "MongoDB", "salary": 70000, "currency": "USD", "start_date": datetime(2009, 3, 14)}, 4 ] 5 for emp in employees: 6 encrypted_salary = encrypt_salary(client_encryption, data_key_id, emp["salary"]) 7 insert_employee_doc(coll, emp["name"], emp["position"], emp["company"], encrypted_salary, emp["currency"], emp["start_date"])
Without the key, we can’t see the data. It worked.
To query and view the encrypted salary field, we use find_all_and_decrypt_salaries, which retrieves all documents, decrypts the salary, and displays the data.
1 def find_all_and_decrypt_salaries(coll, client_encryption): 2 for doc in coll.find(): 3 encrypted_salary = doc["salary"] 4 decrypted_salary = client_encryption.decrypt(encrypted_salary) 5 salary_in_dollars = decrypted_salary / 100.0 6 print(f"Employee: {doc['name']}, Salary: {salary_in_dollars} USD")
For comparison, the find_all_without_decryption function retrieves the same documents but without decrypting the salaries.
1 def find_all_without_decryption(coll): 2 for doc in coll.find(): 3 print(f"Employee: {doc['name']}, Encrypted Salary: {doc['salary']}")
The result is:
Let’s zoom in and see the difference between two objects, one encrypted and the other not.
As we can see in this image, the real value of the object that is used without encryption comes in a format that is unreadable to us without the encryption key. You can check the entire code in the GitHub repository and more information in the MongoDB documentation.
CSFLE and Queryable Encryption are advanced encryption solutions in MongoDB, providing distinct methods for protecting sensitive data and enabling secure queries. CSFLE is ideal for cases where client-side control and equality queries are sufficient, while Queryable Encryption is effective for scenarios requiring range queries, with future support for more advanced query types, like substring searches, in development. With MongoDB 8.0’s range query support, Queryable Encryption becomes even more powerful and flexible for securing sensitive data. Using Python, you can easily configure and use these encryption solutions to enhance MongoDB application security, meeting strict data compliance and security requirements. For more MongoDB resources and tools, visit the MongoDB Developer Center to explore additional articles.
Principais comentários nos fóruns
Ainda não há comentários sobre este artigo.