Getting started with Atlas Stream Processing security

Robert Walters9 min read • Published May 02, 2024 • Updated May 02, 2024

Stream Processing Atlas

Rate this tutorial

Security is paramount in the realm of databases, and the safeguarding of streaming data is no exception. Stream processing services like Atlas Stream Processing handle sensitive data from a variety of sources, making them prime targets for malicious activities. Robust security measures, including encryption, access controls, and authentication mechanisms, are essential to mitigating risks and upholding the trustworthiness of the information flowing through streaming data pipelines.

In addition, regulatory compliance may impose comprehensive security protocols and configurations such as enforcing auditing and separation of duties. In this article, we will cover the security capabilities of Atlas Stream Processing, including access control, and how to configure your environment to support least privilege access. Auditing and activity monitoring will be covered in a future article.

A primer on Atlas security

Recall that in MongoDB Atlas, organizations, projects, and clusters are hierarchical components that facilitate the organization and management of MongoDB resources. An organization is a top-level entity representing an independent deployment of MongoDB Atlas, and it contains one or more projects.

A project is a logical container within an organization, grouping related resources and serving as a unit for access control and billing. Within a project, MongoDB clusters are deployed. Clusters are instances of MongoDB databases, each with its own configurations, performance characteristics, and data. Clusters can span multiple cloud regions and availability zones for high availability and disaster recovery.

This hierarchy allows for the efficient management of MongoDB deployments, access control, and resource isolation within MongoDB Atlas.

With respect to security, Atlas has two separate user entities: Atlas users and Atlas database users. They are defined at different scopes, with Atlas users being used within Atlas organizations and Atlas projects, and Atlas database users being used within Atlas clusters. Understanding the differences and how your end users will authenticate to Atlas Stream Processing is important. Let’s dig deeper into these two types of users.

Atlas users (the control plane)

Atlas users authenticate with Atlas UI, API, or CLI only (a.k.a the control plane). Authorization includes access to an Atlas organization and the Atlas projects within the organization.

Atlas users' access to Atlas clusters is determined by membership in one or more of the fixed organizational or project roles. Here is a sample of some of these roles and their associated permissive capabilities:

Organization Owner — root access to the entire organization and projects contained within it
Organization Project Owner — can create Atlas projects
Organization Read Only — read-only access to the settings, users, and projects in the organization
Project Owner — has full administrative access
Project Cluster Manager — can update clusters
Project Data Access Admin — can access and modify a cluster's data and indexes and kill operations
Project Data Access Read/Write — can access a cluster's data and indexes and modify data
Project Data Access Read Only — can access a cluster's data and indexes
Project Read Only — may only modify personal preferences

While Atlas users may have access to an Atlas cluster through high-level permission like Project Owner, they can only access the cluster through the Atlas UI, the Atlas API, or Atlas CLI. Users who wish to connect to the Atlas cluster through a client driver like Java or a tool like mongosh cannot as these Atlas users do not exist within the Atlas database. This is where Atlas database users come into play.

Atlas database user (the data plane)

Atlas database users authenticate with an Atlas cluster directly and have no access to the Atlas UI, Atlas API, or Atlas CLI. These users authenticate using a client tool such as mongosh or via a MongoDB driver like the MongoDB Java driver. If you have previously used a self-hosted MongoDB server, Atlas database users are the equivalent of the MongoDB user. MongoDB Atlas supports a variety of authentication methods such as SCRAM (username and password), LDAP Proxy Authentication, OpenID Connect, Kerberos, and x.509 Certificates. While clients use any one of these methods to authenticate, Atlas services, such as Atlas Data Federation, access other Atlas services like Atlas clusters via temporary x.509 certificates. This same concept is used within Atlas Stream Processing and will be discussed later in this post.

Note: Unless otherwise specified, a “user” in this article refers to an Atlas database user.

Understanding the concepts of the data plane and control plane will make it easy to understand and configure Atlas Stream Processing. In summary, you create the Stream Processing Instance (SPI) and define Connection Registry entries using the Atlas Control plane and an Atlas user, and you connect to an SPI and create stream processors using an Atlas database user account. If you are not already familiar with the components that make up Atlas Stream Processing, we’ll dive deeper in the next section.

A primer on Atlas Stream Processing’s architecture

Atlas Stream Processing does not exist within an Atlas cluster. Instead, it is contained within an Atlas project and does not depend on an Atlas cluster. Consider the Atlas Stream Processing architecture diagram below:

A Stream Processing Instance is a logical grouping of zero or more stream processors (SP). These processors are created within the SPI and operate within the cloud provider and region specified within the SPI. SPs leverage data sources that are defined within the Connection Registry. Each Connection Registry is mapped directly to an SPI and contains connection definitions, such as connections to other Atlas clusters, within the same Atlas project or to external Apache Kafka systems such as Confluent Cloud, AWS MSK, or a self-hosted Kafka deployment, to name a few.

Now that you have a basic understanding of Atlas security and the architecture of Atlas Stream Processing, let’s take a look at some specific security features and walk through an example of implementing least privileges.

Stream Processing Instance

Atlas Stream Processing is introducing a new project-level role called Project Stream Processing Owner (PSPO). This role can perform the following:

Create, modify, and delete any Atlas Stream Processing Instances within the Atlas project.
Start and stop any stream processor within the Atlas project.
Create, update, and delete all Connection Registries within the Atlas project.
View/download system logs for all SPIs in the Atlas project.
View/download audit logs for all SPIs in the Atlas project.
Manage all database users and roles within the Atlas project.
Read/write access to Atlas clusters within the project.

The PSPO and any other elevated role, such as Project Owner or Organizational Owner, can perform the above-listed functions. While the introduction of the PSPO role reduces project privileges when compared to the Project Owner role, it is still an elevated role and only differs from a Project Owner in a few areas. For more information, see the MongoDB documentation.

Authentication to SPIs operates similarly to Atlas clusters, where only users defined within the Atlas data plane (e.g., Atlas database users) are allowed to connect to and create SPIs. It's crucial to grasp this concept because SPIs and Atlas clusters are distinct entities within an Atlas project, yet they share the same authentication process via Atlas database users.

By default, only Atlas users who are Project Owners or Project Stream Processing Owners can create Stream Processing Instances. These users also have the ability to create, update, and delete connection registry connections associated with SPIs.

Connecting to the Stream Processing Instance

Once the SPI is created, Atlas database users can connect to it just as they would with an Atlas cluster through a client tool such as mongosh. Any Atlas database user with the built-in “readWriteAnyDatabase” or “atlasAdmin” can connect to any SPIs within the project.

For users without one of these built-in permissions, or for scenarios where administrators want to follow the principle of least privilege, administrators can create a custom database role made up of specific actions.

Custom actions

Atlas Stream Processing introduces a number of custom actions that can be assigned to a custom database role. For example, if administrators wanted to create an operations-level role that could only start, stop, and view stream statistics, they could create a database user role, “ASPOps,” and add the startStreamProcessor, stopStreamProcessor, and listStreamProcessors actions. The administrator would then grant this role to the user.

The following is a list of Atlas Stream Processing actions:

createStreamProcessor
processStreamProcessor
startStreamProcessor
stopStreamProcessor
dropStreamProcessor
sampleStreamProcessor
listStreamProcessors
listConnections
streamProcessorStats

One issue you might realize is if a database user with the built-in “readWriteAnyDatabase” has all these actions granted by default, or if a custom role has these actions, they have these action permissions for all Stream Processing Instances within the Atlas project! If your organization wants to lock this down and restrict access to specific SPIs, they can do this by navigating to the “Restrict Access” section and selecting the desired SPIs.

An example of least privilege

Assume there are two users: an administrator and a developer (who is not an administrator). The administrator would like to provide the developer access to an Atlas Stream processing Instance for use with streaming data from a Kafka topic into a MongoDB Atlas cluster. The administrator is a Project Owner.

A high-level workflow is as follows:

Step 1: The administrator creates the Stream Processing Instance, “Stocks.”

Step 2: The administrator creates the Connection Registry entries, one for the connection to Apache Kafka and one for the Atlas cluster. The user selects “Read and Write to any database” under Execute As*.

See the comment below on “execute as.”

Step 3: The administrator creates a new database custom role, “SPIAccess,” and assigns the Atlas Stream Processing custom actions.

Step 4: The administrator creates a new database user account (or modifies the developer user if it already exists) and assigns the user the custom role “SPIAccess.”

Step 5: The administrator edits the database user and specifies the exact SPIs they wish the user to provide access to.

At this point, the developer can connect to only the Stream Processing Instance, “Stocks,” and create new stream processors using a client development tool such as mongosh or the Visual Studio Code extension for MongoDB. When they develop their stream processors, they write data to the Atlas cluster under the “read and write any database” built-in role.

Execute As

In the scenario described above, assume the developer creates a stream processor that uses the Kafka connection as a source and a MongoDB Atlas cluster as a destination. When the administrator defined the Kafka source, they provided the connection credentials. While the developer cannot see the credentials when they run stream processing commands (like .process or .sample) or when running stream processors, they are running under the security context of these credentials. In the case of an Atlas cluster, the administrator can define the security context (e.g., what user or role) the Stream Processor uses when connecting to the Atlas cluster. This is important because while the developer might not have access to the cluster specified in the connection registry, they would indirectly gain access through executing the Stream Processor. To mitigate this, administrators should execute as a database user or role with least privilege access to the Atlas cluster.

All three built-in user roles and any users with custom roles will show up as options under Execute As. Administrators should create a custom role that only has enough permissions on the target Atlas cluster to perform the desired actions.

A word on the “read and write to any database” built-in role

Recall that any Atlas database user with the “readWriteAnyDatabase” role has full access to any SPIs within the project. If an administrator wishes to deny access to SPIs for these users, they should review the “Restrict Access” settings and ensure no SPIs are selected.

Summary

Configuring security for Atlas Stream Processing involves both the Atlas users (control plane) and Atlas database users (data plane). While Project Owner is the highest Atlas user role available, the new Project Stream Processing Owner role is intended to give Atlas Stream Processing administrators enough privileges to create Stream Processing Instances and manage the Atlas database users who need access.

Once administrators create Stream Processing Instances, any Atlas database user with the built-in “readWriteAnyDatabase” or “atlasAdmin” can connect to any SPIs within the project. To support more fine-grained security, Atlas Stream Processing has introduced a number of custom actions that can be grouped to build out a custom database user role. This role can be granted to Atlas database users, enabling them to connect to a Stream Processing Instance without any elevated permission such as atlasAdmin.

Although a database user has access to an SPI, they can only reference the connections defined in the connection registry for that SPI. This user can not create, modify, or delete connections as only Atlas users who are members of the Project Owner or Project Stream Processing Owners role can manage connection registry connections. There are a few key reasons for this restriction.

First, it is important to not expose credentials or allow non-administrators the ability to view or modify sensitive connection information. Second, when connections are used within the Stream Processing pipeline, these connections are run under a specific execution context. Allowing a non-administrator the ability to run under a different context may give them an elevation of privilege. Thus, it is best for connection registry connections to execute as a role that has the minimum amount of permissions necessary to perform the operations of the Stream Processing Instances.

This ability for the stream processing pipeline to “execute as” a specific role applies when connections are to Atlas databases. If using Kafka as a source, only give the minimum amount of permissions needed to access the topic(s) used within the Stream Processing Instance.

With these granular permission capabilities in place, Atlas Stream Processing enables you to achieve the least privilege access to your stream processor instances and pipelines. We're excited to hear about your experience, and let us know if you have any feedback.

Get started today or read more about MongoDB Atlas Stream Processing in our documentation.

Rate this tutorial

Tutorial

Accessing Atlas Data in Postman with the Data API

Aug 26, 2022 | 6 min read

Tutorial

Using OpenAI Latest Embeddings In A RAG System With MongoDB

Feb 01, 2024 | 15 min read

Tutorial

How to Build a RAG System With LlamaIndex, OpenAI, and MongoDB Vector Database

Feb 16, 2024 | 10 min read

Article

Query Analytics Part 2: Tuning the System

Jan 17, 2024 | 10 min read

A primer on Atlas security
A primer on Atlas Stream Processing’s architecture
Stream Processing Instance
An example of least privilege
Summary

Atlas

Getting started with Atlas Stream Processing security

A primer on Atlas security

Atlas users (the control plane)

Atlas database user (the data plane)

A primer on Atlas Stream Processing’s architecture

Stream Processing Instance

Connecting to the Stream Processing Instance

Custom actions

An example of least privilege

Execute As

A word on the “read and write to any database” built-in role

Summary

Related

Accessing Atlas Data in Postman with the Data API

Using OpenAI Latest Embeddings In A RAG System With MongoDB

How to Build a RAG System With LlamaIndex, OpenAI, and MongoDB Vector Database

Query Analytics Part 2: Tuning the System

Table of Contents