In Kubernetes, running containers in a group of containers (called a pod) can access files stored in a folder through a volume. If one container crashes and a new one is created in the pod, depending on the type of the volume, the data that was there can still be accessible or not. Ephemeral storage, in the context of Kubernetes, is storage tied to the lifecycle of a pod, so when a pod finishes or is restarted, that storage is cleared out.
So, in short, what is ephemeral storage? It’s just temporary storage for a container that gets wiped out and lost when the container is stopped or restarted.
Each day, more and more workloads are moved to the cloud. Although we'll always have some workloads running on premises, the benefits of a managed solution are obvious: less downtime due to power outages or dropped Internet connections, instant scalability — both vertical (adding more resources to a computing node) and horizontal (distributing a workload among multiple nodes) — no need to pay big air conditioning bills to maintain our servers cool, etc. And especially: no need to shut down our servers and get them off the rack to upgrade RAM. Been there, done that.
When we start using computing nodes in the cloud, these can range from full-blown virtual machines (that emulate hardware and include a full guest operating system) to containers, lightweight computing nodes that share the kernel of the host machine and just add a filesystem with the OS and the applications needed. Fully managed containerized apps add to the former "hardware" benefits the "software" benefits of ease of deployment, configuration management, replicability, etc.
Containerized applications are a perfect solution for multiple problems:
But even running in containers, these are just apps. And every interesting application (even useful ones) uses some kind of data. Containers can use data from different sources:
But here lies the question: Which kind of storage does a container accept? A container can access files using volumes:
There are different kinds of ephemeral volumes:
This is a volume that's empty at pod startup. Files are stored locally in the kubelet base directory (usually the root disk) or RAM. To define a volume attached to a pod that starts empty, we use this config:
apiVersion: v1
kind: Pod
metadata:
name: empty-folder-demo
namespace: default
spec:
containers:
- name: empty-dir-demo-ctr
image: httpd:alpine
volumeMounts:
- mountPath: /test
name: emptydir-test-volume
volumes:
- name: emptydir-test-volume
emptyDir: {}
Note how we define the volumes as emptyDir and then reference those volumes in volumeMounts
. If we restart a container in this pod, the contents will survive, but if the whole pod is migrated, all content is lost.
These inject different kinds of Kubernetes data into a pod. ConfigMaps are used to store non-confidential data in key-value pairs, as can be seen in the example configuration below:
apiVersion: v1
kind: ConfigMap
metadata:
name: game-demo
data:
# property-like keys; each key maps to a simple value
player_initial_lives: "3"
ui_properties_file_name: "user-interface.properties"
Secrets are used to store sensitive information, such as passwords or API keys. Secrets are stored as base64 strings in configuration files:
apiVersion: v1
kind: Secret
metadata:
name: mysecret
type: Opaque
data:
username: YWRtaW4=
password: MWYyZDFlMmU2N2Rm
These can be provided by all storage drivers that also support persistent volumes.
An example of an ephemeral volume that takes up to 4GB of space would be:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
ephemeral-storage: "2Gi"
limits:
ephemeral-storage: "4Gi"
We can run apps in our in-premises servers, or in virtual machines in the cloud. The former forces us to maintain everything: hardware and software, updates, security patches, capacity planning, etc. The latter allows us to focus just on the software maintenance side. But it's still a big burden, and achieving scalability and ease of configuration replication is not easy.
Containers allow us to run any application in the cloud with the convenience of replicable configuration. For instance, we can run and manage our own MongoDB containers in our favorite cloud provider.
MongoDB Atlas is easy to use with your containerized application and makes it simple for you and your team to access your data. You can find out more about MongoDB and containers with the following resources:
By default, all storage used by a container is ephemeral. That is, it’ll get deleted if the container crashes and is restarted. Containers, by definition, are immutable, so their file systems cannot be changed while running. If we want to persist changes, we either need to define a non-ephemeral volume or store our data using some database. The MongoDB Kubernetes operator can be handy in this case.
By default, all storage in Kubernetes is ephemeral. Only if we define volumes that are not ephemeral do we get persistent storage.
Cloud storage is not ephemeral. If we need to create files and store them, we can use a service that implements Elastic Block Storage like OpenEBS or any of the major EBS services in the cloud providers.