How to Implement Client-Side Field Level Encryption (CSFLE) in Java with Spring Data MongoDB
Maxime Beugnet, Megha Arora11 min read • Published Nov 06, 2023 • Updated Jan 27, 2024
FULL APPLICATION
The source code of this template is available on GitHub:
1 git clone git@github.com:mongodb-developer/mongodb-java-spring-boot-csfle.git
To get started, you'll need:
- Java 17.
- MongoDB Automatic Encryption Shared Library v7.0.2 or higher.
This content is also available in video format.
This post will explain the key details of the integration of
MongoDB Client-Side Field Level Encryption (CSFLE)
with Spring Data MongoDB.
If you feel like you need a refresher on CSFLE before working on this more complicated piece, I can recommend a few
resources for CSFLE:
And for Spring Data MongoDB:
This template is significantly larger than other online CSFLE templates you can find online. It tries to provide
reusable code for a real production environment using:
- Multiple encrypted collections.
- Automated JSON Schema generation.
- Server-side JSON Schema.
- Separated clusters for DEKs and encrypted collections.
- Automated data encryption keys generation or retrieval.
- SpEL Evaluation Extension.
- Auto-implemented repositories.
- Open API documentation 3.0.1.
While I was coding, I also tried to respect the SOLID Principles as much
as possible to increase the code readability, usability, and reutilization.
Now that we are all on board, here is a high-level diagram of the different moving parts required to create a correctly-configured CSFLE-enabled MongoClient which can encrypt and decrypt fields automatically.
The arrows can mean different things in the diagram:
- "needs to be done before"
- "requires"
- "direct dependency of"
But hopefully it helps explain the dependencies, the orchestration, and the inner machinery of the CSFLE
configuration with Spring Data MongoDB.
Once the connection with MongoDB — capable of encrypting and decrypting the fields — is established, with the correct
configuration and library, we are just using a classical three-tier architecture to expose a REST API and manage the
communication all the way down to the MongoDB database.
Here, nothing tricky or fascinating to discuss, so we are not going to discuss this in this post.
Let's now focus on all the complicated bits of this template.
As this is a tutorial, the code can be started from a blank MongoDB cluster.
So the first point of order is to create the key vault collection and its unique index on the
keyAltNames
field.1 /** 2 * This class initialize the Key Vault (collection + keyAltNames unique index) using a dedicated standard connection 3 * to MongoDB. 4 * Then it creates the Data Encryption Keys (DEKs) required to encrypt the documents in each of the 5 * encrypted collections. 6 */ 7 8 public class KeyVaultAndDekSetup { 9 10 private static final Logger LOGGER = LoggerFactory.getLogger(KeyVaultAndDekSetup.class); 11 private final KeyVaultService keyVaultService; 12 private final DataEncryptionKeyService dataEncryptionKeyService; 13 14 private String CONNECTION_STR; 15 16 public KeyVaultAndDekSetup(KeyVaultService keyVaultService, DataEncryptionKeyService dataEncryptionKeyService) { 17 this.keyVaultService = keyVaultService; 18 this.dataEncryptionKeyService = dataEncryptionKeyService; 19 } 20 21 22 public void postConstruct() { 23 LOGGER.info("=> Start Encryption Setup."); 24 LOGGER.debug("=> MongoDB Connection String: {}", CONNECTION_STR); 25 MongoClientSettings mcs = MongoClientSettings.builder() 26 .applyConnectionString(new ConnectionString(CONNECTION_STR)) 27 .build(); 28 try (MongoClient client = MongoClients.create(mcs)) { 29 LOGGER.info("=> Created the MongoClient instance for the encryption setup."); 30 LOGGER.info("=> Creating the encryption key vault collection."); 31 keyVaultService.setupKeyVaultCollection(client); 32 LOGGER.info("=> Creating the Data Encryption Keys."); 33 EncryptedCollectionsConfiguration.encryptedEntities.forEach(dataEncryptionKeyService::createOrRetrieveDEK); 34 LOGGER.info("=> Encryption Setup completed."); 35 } catch (Exception e) { 36 LOGGER.error("=> Encryption Setup failed: {}", e.getMessage(), e); 37 } 38 39 } 40 41 }
In production, you could choose to create the key vault collection and its unique index on the
keyAltNames
field
manually once and remove the code as it's never going to be executed again. I guess it only makes sense to keep it if
you are running this code in a CI/CD pipeline.One important thing to note here is the dependency to a completely standard (i.e., not CSFLE-enabled) and ephemeral
MongoClient
(use of a
try-with-resources block) as we are already creating a collection and an index in our MongoDB cluster.1 /** 2 * Initialization of the Key Vault collection and keyAltNames unique index. 3 */ 4 5 public class KeyVaultServiceImpl implements KeyVaultService { 6 7 private static final Logger LOGGER = LoggerFactory.getLogger(KeyVaultServiceImpl.class); 8 private static final String INDEX_NAME = "uniqueKeyAltNames"; 9 10 private String KEY_VAULT_DB; 11 12 private String KEY_VAULT_COLL; 13 14 public void setupKeyVaultCollection(MongoClient mongoClient) { 15 LOGGER.info("=> Setup the key vault collection {}.{}", KEY_VAULT_DB, KEY_VAULT_COLL); 16 MongoDatabase db = mongoClient.getDatabase(KEY_VAULT_DB); 17 MongoCollection<Document> vault = db.getCollection(KEY_VAULT_COLL); 18 boolean vaultExists = doesCollectionExist(db, KEY_VAULT_COLL); 19 if (vaultExists) { 20 LOGGER.info("=> Vault collection already exists."); 21 if (!doesIndexExist(vault)) { 22 LOGGER.info("=> Unique index created on the keyAltNames"); 23 createKeyVaultIndex(vault); 24 } 25 } else { 26 LOGGER.info("=> Creating a new vault collection & index on keyAltNames."); 27 createKeyVaultIndex(vault); 28 } 29 } 30 31 private void createKeyVaultIndex(MongoCollection<Document> vault) { 32 Bson keyAltNamesExists = exists("keyAltNames"); 33 IndexOptions indexOpts = new IndexOptions().name(INDEX_NAME) 34 .partialFilterExpression(keyAltNamesExists) 35 .unique(true); 36 vault.createIndex(new BsonDocument("keyAltNames", new BsonInt32(1)), indexOpts); 37 } 38 39 private boolean doesCollectionExist(MongoDatabase db, String coll) { 40 return db.listCollectionNames().into(new ArrayList<>()).stream().anyMatch(c -> c.equals(coll)); 41 } 42 43 private boolean doesIndexExist(MongoCollection<Document> coll) { 44 return coll.listIndexes() 45 .into(new ArrayList<>()) 46 .stream() 47 .map(i -> i.get("name")) 48 .anyMatch(n -> n.equals(INDEX_NAME)); 49 } 50 }
When it's done, we can close the standard MongoDB connection.
We can now create the Data Encryption Keys (DEKs) using the
ClientEncryption
connection.1 /** 2 * ClientEncryption used by the DataEncryptionKeyService to create the DEKs. 3 */ 4 5 public class MongoDBKeyVaultClientConfiguration { 6 7 private static final Logger LOGGER = LoggerFactory.getLogger(MongoDBKeyVaultClientConfiguration.class); 8 private final KmsService kmsService; 9 10 private String CONNECTION_STR; 11 12 private String KEY_VAULT_DB; 13 14 private String KEY_VAULT_COLL; 15 private MongoNamespace KEY_VAULT_NS; 16 17 public MongoDBKeyVaultClientConfiguration(KmsService kmsService) { 18 this.kmsService = kmsService; 19 } 20 21 22 public void postConstructor() { 23 this.KEY_VAULT_NS = new MongoNamespace(KEY_VAULT_DB, KEY_VAULT_COLL); 24 } 25 26 /** 27 * MongoDB Encryption Client that can manage Data Encryption Keys (DEKs). 28 * 29 * @return ClientEncryption MongoDB connection that can create or delete DEKs. 30 */ 31 32 public ClientEncryption clientEncryption() { 33 LOGGER.info("=> Creating the MongoDB Key Vault Client."); 34 MongoClientSettings mcs = MongoClientSettings.builder() 35 .applyConnectionString(new ConnectionString(CONNECTION_STR)) 36 .build(); 37 ClientEncryptionSettings ces = ClientEncryptionSettings.builder() 38 .keyVaultMongoClientSettings(mcs) 39 .keyVaultNamespace(KEY_VAULT_NS.getFullName()) 40 .kmsProviders(kmsService.getKmsProviders()) 41 .build(); 42 return ClientEncryptions.create(ces); 43 } 44 }
We can instantiate directly a
ClientEncryption
bean using
the KMS and use it to
generate our DEKs (one for each encrypted collection).1 /** 2 * Service responsible for creating and remembering the Data Encryption Keys (DEKs). 3 * We need to retrieve the DEKs when we evaluate the SpEL expressions in the Entities to create the JSON Schemas. 4 */ 5 6 public class DataEncryptionKeyServiceImpl implements DataEncryptionKeyService { 7 8 private static final Logger LOGGER = LoggerFactory.getLogger(DataEncryptionKeyServiceImpl.class); 9 private final ClientEncryption clientEncryption; 10 private final Map<String, String> dataEncryptionKeysB64 = new HashMap<>(); 11 12 private String KMS_PROVIDER; 13 14 public DataEncryptionKeyServiceImpl(ClientEncryption clientEncryption) { 15 this.clientEncryption = clientEncryption; 16 } 17 18 public Map<String, String> getDataEncryptionKeysB64() { 19 LOGGER.info("=> Getting Data Encryption Keys Base64 Map."); 20 LOGGER.info("=> Keys in DEK Map: {}", dataEncryptionKeysB64.entrySet()); 21 return dataEncryptionKeysB64; 22 } 23 24 public String createOrRetrieveDEK(EncryptedEntity encryptedEntity) { 25 Base64.Encoder b64Encoder = Base64.getEncoder(); 26 String dekName = encryptedEntity.getDekName(); 27 BsonDocument dek = clientEncryption.getKeyByAltName(dekName); 28 BsonBinary dataKeyId; 29 if (dek == null) { 30 LOGGER.info("=> Creating Data Encryption Key: {}", dekName); 31 DataKeyOptions dko = new DataKeyOptions().keyAltNames(of(dekName)); 32 dataKeyId = clientEncryption.createDataKey(KMS_PROVIDER, dko); 33 LOGGER.debug("=> DEK ID: {}", dataKeyId); 34 } else { 35 LOGGER.info("=> Existing Data Encryption Key: {}", dekName); 36 dataKeyId = dek.get("_id").asBinary(); 37 LOGGER.debug("=> DEK ID: {}", dataKeyId); 38 } 39 String dek64 = b64Encoder.encodeToString(dataKeyId.getData()); 40 LOGGER.debug("=> Base64 DEK ID: {}", dek64); 41 LOGGER.info("=> Adding Data Encryption Key to the Map with key: {}", 42 encryptedEntity.getEntityClass().getSimpleName()); 43 dataEncryptionKeysB64.put(encryptedEntity.getEntityClass().getSimpleName(), dek64); 44 return dek64; 45 } 46 47 }
One thing to note here is that we are storing the DEKs in a map, so we don't have to retrieve them again later when we
need them for the JSON Schemas.
One of the key functional areas of Spring Data MongoDB is the POJO-centric model it relies on to implement the
repositories and map the documents to the MongoDB collections.
PersonEntity.java
1 /** 2 * This is the entity class for the "persons" collection. 3 * The SpEL expression of the @Encrypted annotation is used to determine the DEK's keyId to use for the encryption. 4 * 5 * @see com.mongodb.quickstart.javaspringbootcsfle.components.EntitySpelEvaluationExtension 6 */ 7 8 9 public class PersonEntity { 10 11 private ObjectId id; 12 private String firstName; 13 private String lastName; 14 15 private String ssn; 16 17 private String bloodType; 18 19 // Constructors 20 21 22 // toString() 23 24 // Getters & Setters 25 }
As you can see above, this entity contains all the information we need to fully automate CSFLE. We have the information
we need to generate the JSON Schema:
- Using the SpEL expression
#{mongocrypt.keyId(#target)}
, we can populate dynamically the DEK that was generated or retrieved earlier. ssn
is aString
that requires a deterministic algorithm.bloodType
is aString
that requires a random algorithm.
The generated JSON Schema looks like this:
1 { 2 "encryptMetadata": { 3 "keyId": [ 4 { 5 "$binary": { 6 "base64": "WyHXZ+53SSqCC/6WdCvp0w==", 7 "subType": "04" 8 } 9 } 10 ] 11 }, 12 "type": "object", 13 "properties": { 14 "ssn": { 15 "encrypt": { 16 "bsonType": "string", 17 "algorithm": "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic" 18 } 19 }, 20 "bloodType": { 21 "encrypt": { 22 "bsonType": "string", 23 "algorithm": "AEAD_AES_256_CBC_HMAC_SHA_512-Random" 24 } 25 } 26 } 27 }
The evaluation of the SpEL expression is only possible because of this class we added in the configuration:
1 /** 2 * Will evaluate the SePL expressions in the Entity classes like this: #{mongocrypt.keyId(#target)} and insert 3 * the right encryption key for the right collection. 4 */ 5 6 public class EntitySpelEvaluationExtension implements EvaluationContextExtension { 7 8 private static final Logger LOGGER = LoggerFactory.getLogger(EntitySpelEvaluationExtension.class); 9 private final DataEncryptionKeyService dataEncryptionKeyService; 10 11 public EntitySpelEvaluationExtension(DataEncryptionKeyService dataEncryptionKeyService) { 12 this.dataEncryptionKeyService = dataEncryptionKeyService; 13 } 14 15 16 17 public String getExtensionId() { 18 return "mongocrypt"; 19 } 20 21 22 23 public Map<String, Function> getFunctions() { 24 try { 25 return Collections.singletonMap("keyId", new Function( 26 EntitySpelEvaluationExtension.class.getMethod("computeKeyId", String.class), this)); 27 } catch (NoSuchMethodException e) { 28 throw new RuntimeException(e); 29 } 30 } 31 32 public String computeKeyId(String target) { 33 String dek = dataEncryptionKeyService.getDataEncryptionKeysB64().get(target); 34 LOGGER.info("=> Computing dek for target {} => {}", target, dek); 35 return dek; 36 } 37 }
Note that it's the place where we are retrieving the DEKs and matching them with the
target
: "PersonEntity", in this case.JSON Schemas are actually not trivial to generate in a Spring Data MongoDB project.
As a matter of fact, to generate the JSON Schemas, we need the MappingContext (the entities, etc.) which is created by
the automatic configuration of Spring Data which creates the
MongoClient
connection and the MongoTemplate
...But to create the MongoClient — with the automatic encryption enabled — you need JSON Schemas!
It took me a significant amount of time to find a solution to this deadlock, and you can just enjoy the solution now!
The solution is to inject the JSON Schema creation in the autoconfiguration process by instantiating
the
MongoClientSettingsBuilderCustomizer
bean.1 /** 2 * Spring Data MongoDB Configuration for the encrypted MongoClient with all the required configuration (jsonSchemas). 3 * The big trick in this file is the creation of the JSON Schemas before the creation of the entire configuration as 4 * we need the MappingContext to resolve the SpEL expressions in the entities. 5 * 6 * @see com.mongodb.quickstart.javaspringbootcsfle.components.EntitySpelEvaluationExtension 7 */ 8 9 10 public class MongoDBSecureClientConfiguration { 11 12 private static final Logger LOGGER = LoggerFactory.getLogger(MongoDBSecureClientConfiguration.class); 13 private final KmsService kmsService; 14 private final SchemaService schemaService; 15 16 private String CRYPT_SHARED_LIB_PATH; 17 18 private String CONNECTION_STR_DATA; 19 20 private String CONNECTION_STR_VAULT; 21 22 private String KEY_VAULT_DB; 23 24 private String KEY_VAULT_COLL; 25 private MongoNamespace KEY_VAULT_NS; 26 27 public MongoDBSecureClientConfiguration(KmsService kmsService, SchemaService schemaService) { 28 this.kmsService = kmsService; 29 this.schemaService = schemaService; 30 } 31 32 33 public void postConstruct() { 34 this.KEY_VAULT_NS = new MongoNamespace(KEY_VAULT_DB, KEY_VAULT_COLL); 35 } 36 37 38 public MongoClientSettings mongoClientSettings() { 39 LOGGER.info("=> Creating the MongoClientSettings for the encrypted collections."); 40 return MongoClientSettings.builder().applyConnectionString(new ConnectionString(CONNECTION_STR_DATA)).build(); 41 } 42 43 44 public MongoClientSettingsBuilderCustomizer customizer(MappingContext mappingContext) { 45 LOGGER.info("=> Creating the MongoClientSettingsBuilderCustomizer."); 46 return builder -> { 47 MongoJsonSchemaCreator schemaCreator = MongoJsonSchemaCreator.create(mappingContext); 48 Map<String, BsonDocument> schemaMap = schemaService.generateSchemasMap(schemaCreator) 49 .entrySet() 50 .stream() 51 .collect(toMap(e -> e.getKey().getFullName(), 52 Map.Entry::getValue)); 53 Map<String, Object> extraOptions = Map.of("cryptSharedLibPath", CRYPT_SHARED_LIB_PATH, 54 "cryptSharedLibRequired", true); 55 MongoClientSettings mcs = MongoClientSettings.builder() 56 .applyConnectionString( 57 new ConnectionString(CONNECTION_STR_VAULT)) 58 .build(); 59 AutoEncryptionSettings oes = AutoEncryptionSettings.builder() 60 .keyVaultMongoClientSettings(mcs) 61 .keyVaultNamespace(KEY_VAULT_NS.getFullName()) 62 .kmsProviders(kmsService.getKmsProviders()) 63 .schemaMap(schemaMap) 64 .extraOptions(extraOptions) 65 .build(); 66 builder.autoEncryptionSettings(oes); 67 }; 68 } 69 }
One thing to note here is the option to separate the DEKs from the encrypted collections in two completely separated
MongoDB clusters. This isn't mandatory, but it can be a handy trick if you choose to have a different backup retention
policy for your two clusters. This can be interesting for the GDPR Article 17 "Right to erasure," for instance, as you
can then guarantee that a DEK can completely disappear from your systems (backup included). I talk more about this
approach in
my Java CSFLE post.
Here is the JSON Schema service which stores the generated JSON Schemas in a map:
1 2 public class SchemaServiceImpl implements SchemaService { 3 4 private static final Logger LOGGER = LoggerFactory.getLogger(SchemaServiceImpl.class); 5 private Map<MongoNamespace, BsonDocument> schemasMap; 6 7 8 public Map<MongoNamespace, BsonDocument> generateSchemasMap(MongoJsonSchemaCreator schemaCreator) { 9 LOGGER.info("=> Generating schema map."); 10 List<EncryptedEntity> encryptedEntities = EncryptedCollectionsConfiguration.encryptedEntities; 11 return schemasMap = encryptedEntities.stream() 12 .collect(toMap(EncryptedEntity::getNamespace, 13 e -> generateSchema(schemaCreator, e.getEntityClass()))); 14 } 15 16 17 public Map<MongoNamespace, BsonDocument> getSchemasMap() { 18 return schemasMap; 19 } 20 21 private BsonDocument generateSchema(MongoJsonSchemaCreator schemaCreator, Class<?> entityClass) { 22 BsonDocument schema = schemaCreator.filter(MongoJsonSchemaCreator.encryptedOnly()) 23 .createSchemaFor(entityClass) 24 .schemaDocument() 25 .toBsonDocument(); 26 LOGGER.info("=> JSON Schema for {}:\n{}", entityClass.getSimpleName(), 27 schema.toJson(JsonWriterSettings.builder().indent(true).build())); 28 return schema; 29 } 30 31 }
We are storing the JSON Schemas because this template also implements one of the good practices of CSFLE: server-side
JSON Schemas.
Indeed, to make the automatic encryption and decryption of CSFLE work, you do not require the server-side JSON Schemas.
Only the client-side ones are required for the Automatic Encryption Shared Library. But then nothing would prevent
another misconfigured client or an admin connected directly to the cluster to insert or update some documents without
encrypting the fields.
To enforce this you can use the server-side JSON Schema as you would to enforce a field type in a document, for instance.
But given that the JSON Schema will evolve with the different versions of your application, the JSON Schemas need to be
updated accordingly each time you restart your application.
1 /** 2 * Create or update the encrypted collections with a server side JSON Schema to secure the encrypted field in the MongoDB database. 3 * This prevents any other client from inserting or editing the fields without encrypting the fields correctly. 4 */ 5 6 public class EncryptedCollectionsSetup { 7 8 private static final Logger LOGGER = LoggerFactory.getLogger(EncryptedCollectionsSetup.class); 9 private final MongoClient mongoClient; 10 private final SchemaService schemaService; 11 12 public EncryptedCollectionsSetup(MongoClient mongoClient, SchemaService schemaService) { 13 this.mongoClient = mongoClient; 14 this.schemaService = schemaService; 15 } 16 17 18 public void postConstruct() { 19 LOGGER.info("=> Setup the encrypted collections."); 20 schemaService.getSchemasMap() 21 .forEach((namespace, schema) -> createOrUpdateCollection(mongoClient, namespace, schema)); 22 } 23 24 private void createOrUpdateCollection(MongoClient mongoClient, MongoNamespace ns, BsonDocument schema) { 25 MongoDatabase db = mongoClient.getDatabase(ns.getDatabaseName()); 26 String collStr = ns.getCollectionName(); 27 if (doesCollectionExist(db, ns)) { 28 LOGGER.info("=> Updating {} collection's server side JSON Schema.", ns.getFullName()); 29 db.runCommand(new Document("collMod", collStr).append("validator", jsonSchemaWrapper(schema))); 30 } else { 31 LOGGER.info("=> Creating encrypted collection {} with server side JSON Schema.", ns.getFullName()); 32 db.createCollection(collStr, new CreateCollectionOptions().validationOptions( 33 new ValidationOptions().validator(jsonSchemaWrapper(schema)))); 34 } 35 } 36 37 public BsonDocument jsonSchemaWrapper(BsonDocument schema) { 38 return new BsonDocument("$jsonSchema", schema); 39 } 40 41 private boolean doesCollectionExist(MongoDatabase db, MongoNamespace ns) { 42 return db.listCollectionNames() 43 .into(new ArrayList<>()) 44 .stream() 45 .anyMatch(c -> c.equals(ns.getCollectionName())); 46 } 47 48 }
One big feature of this template as well is the support of multiple entities. As you probably noticed already, there is
a
CompanyEntity
and all its related components but the code is generic enough to handle any amount of entities which
isn't usually the case in all the other online tutorials.In this template, if you want to support a third type of entity, you just have to create the components of the
three-tier architecture as usual and add your entry in the
EncryptedCollectionsConfiguration
class.1 /** 2 * Information about the encrypted collections in the application. 3 * As I need the information in multiple places, I decided to create a configuration class with a static list of 4 * the encrypted collections and their information. 5 */ 6 public class EncryptedCollectionsConfiguration { 7 public static final List<EncryptedEntity> encryptedEntities = List.of( 8 new EncryptedEntity("mydb", "persons", PersonEntity.class, "personDEK"), 9 new EncryptedEntity("mydb", "companies", CompanyEntity.class, "companyDEK")); 10 }
Everything else from the DEK generation to the encrypted collection creation with the server-side JSON Schema is fully
automated and taken care of transparently. All you have to do is specify
the
@Encrypted(algorithm = "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic")
annotation in the entity class and the field
will be encrypted and decrypted automatically for you when you are using the auto-implemented repositories (courtesy of
Spring Data MongoDB, of course!).Maybe you noticed but this template implements the
findFirstBySsn(ssn)
method which means that it's possible to
retrieve a person document by its SSN number, even if this field is encrypted.Note that it only works because we are using a deterministic encryption algorithm.
1 /** 2 * Spring Data MongoDB repository for the PersonEntity 3 */ 4 5 public interface PersonRepository extends MongoRepository<PersonEntity, String> { 6 7 PersonEntity findFirstBySsn(String ssn); 8 }
Thanks for reading my post!
If you have any questions about it, please feel free to open a question in the GitHub repository or ask a question in
the MongoDB Community Forum.
Pull requests and improvement ideas are very welcome!