Many Kubernetes examples you find online usually concentrate on running stateless applications.
Typically, these are your standard Nodejs express applications or a python based API written with Flask.
Running these types of application on Kubernetes today is relatively easy. You have everything you need to run and operate them at scale: rolling deployments, ingress controllers, control over termination timeouts, and more.
But how about running a stateful application that occasionally needs to write data on disk and make sure this data persists between container restarts or when the container is rescheduled to another node? Or running a database like MongoDB on Kubernetes?
That’s where things aren’t so straightforward. Fortunately, Kubernetes and its vibrant community provide many options for how to run these stateful workloads.
We’ll dive a bit deeper to review these options, but you might ask —
Can we just attach a volume to our pod template? shouldn’t it be enough? Theoretically, your application can now write to disk, and if the container will be restarted or travel to another node, the volume will be re-attached to the container in its new location.
That’s true for simple cases, but the situation is much more complicated for services like Elasticsearch, etcd, Consul, and such.
These services have a few requirements that aren’t satisfied by the regular Kubernetes Deployment controller.
For example, you may need to have predictable DNS names for each pod to make initial cluster formation easier. Or your deployed system may need to ensure that the pods will be started in a certain predefined order.
Additionally, you may want to create and attach a separate volume to each of your pods that will be tied to it through the whole pod’s lifecycle. With regular pods, you can only attach one volume that will be shared between all the pods created by the same deployment.
We also didn’t mention how you are going to operate your database. You’ll also need to make sure you have a plan for when and how backups are going to be performed or how a recovery/failover will be performed in case something bad happens.
Here are a few options for how you can deploy your databases on Kubernetes:
StatefulSet, which until recently was called a PetSet, is a built-in controller which in essence is similar to Kubernetes’ deployments.
Eventually, it will create and manage a set of pods based on the pod template you will specify.
The main difference is that it provides the following guarantees to the underlying applications:
Here are a few examples of open source database deployment implementation that are using StatefulSets:
StatefulSets are generic, so you can use them to model your databases’ unique cluster formation or master/slave architecture.
However, the end result will lack on the operational side. You will need to add additional resources or automation to make sure you can perform periodic backups or add scripts that deal with edge cases such as a failover.
Eventually, the modeling of the more complex stateful services using StatefulSets may feel a bit clunky and not native to Kubernetes, and, as mentioned above, will lack management automation. This is where operators come into play:
If one of the reasons you decided to run your database on Kubernetes was to unify management for all your application’s components, operators will probably provide the experience you were looking for!
Instead of shoving your application into a StatefulSets model, you basically write (or use someone else’s) custom controller.
As a user, this allow you to use kubectl CLI to control your stateful application as a native kubernetes resource. For example, if you deployed an etcd operator, you may check your cluster’s backup status with the following kubectl command:
The main advantage of operators over StatefulSets is that they add an automation layer which is unique to the stateful application they operate. You won’t need to worry about how you’re going to add a backup cron to your Elasticsearch cluster implemented using StatefulSets. With operators, you just need to specify the bucket where this backup should be stored.
Unfortunately, since writing a new operator requires an understanding of Kubernetes and its APIs in addition to the specifics of the stateful applications, there aren’t many operators available at the moment, and the ones out there are still relatively new.
Here are a few examples of operators so that you can test the concept yourself:
This section is less defined, and basically meant to indicate that for specific databases, like the PostgreSQL example we’ll see in a second, there are other options for how to deploy and manage them as Docker containers on Kubernetes.
Sometimes, there are other options available rather than a StatefulSet or a dedicated operator implementations.
For example, Stolon, which I personally hadn’t have a chance to use but saw mentioned in a few threads, is a “cloud-native PostgreSQL manager for PostgreSQL high availability”.
To deploy Stolon on Kubernetes, you can use the supplied StatefulSets definition. However, because of Stolon’s capabilities, you won’t need to add your own cluster management automation to control PostgreSQL cluster. Stolon comes with its own CLI for that.
Here’s a quick decision tree that I hope will help you make a decisions about how to best deploy and maintain your stateful workloads on Kubernetes:
Kubernetes is a pretty intuitive platform when it comes to stateless applications. However, when dealing with database-like services, you need to put a bit more consideration into how you’re going to deploy and manage them on Kubernetes. The good and bad news is that there are several options available.