Stateful Postgres Storage Using Kubernetes
Kubernetes was developed originally as an orchestration system for stateless applications. Today, Kubernetes is the backbone of countless full stack applications with, notably, a database as part of the stack. So, a question we often hear is:
How can Kubernetes be the foundation of that most stateful application of all, the database?
Kubernetes & Storage
Ephemeral Pods
Let’s say you maintain a Postgres database and you’ve been tasked with moving it to Kubernetes. You can just start up a Kubernetes pod running a Postgres image and load the data and call it a day.
As soon as that pod goes away, so will that small but critical database, because the database storage existed as part of that ephemeral pod.
When you created that pod, you told the underlying computer to reserve a certain amount of resources for it—a certain amount of compute power, a certain amount of memory, and a certain amount of storage comes with this automatically. But as soon as that pod goes away, all of that is released back into the pool.
And pods do go away even when you don’t want them to. Maybe a pod exceeded the resource limits you gave it, or maybe the process hit a fatal exception, or—well, there are a lot of ways a pod can die. Let’s call this the first lesson of Kubernetes: pods are ephemeral—as the old saying goes, consider these cattle, not pets. While this model of deployment is ideal for application services, we need something different to handle the database in Kubernetes.
Persistent Volumes
So, how can we create a database on Kubernetes and not worry about the ephemeral pod? The answer is “Volumes”, or more specifically certain types of volumes that are independent of the pod lifecycle.
For an example of a volume type that is not independent, let’s take emptyDir
. The Kubernetes doc on Volume types lets us know that when a pod with an emptyDir is deleted, the data in there will be “deleted permanently”. For backing a Postgres instance, this is not a good idea.
What we want here is a volume that won’t go away when the pods are removed. Luckily, Kubernetes has the concept of a Persistent Volume (PersistentVolume
or PV
)- a volume that will persist no matter what happens with the pod (sort of — we’ll get into that later).
What’s even better, a Persistent Volume is an abstraction for any kind of storage. So, do you want to put your storage somewhere remote, like on AWS, Azure, Google, etc.? Then you can use a Persistent Volume. But maybe you want to use some of your Kubernetes node’s own storage? Then you can also use a Persistent Volume.
As far as Kubernetes knows, a Persistent Volume is just some piece of storage somewhere and its lifecycle is independent of the pod that’s using it. So if the pod goes away — say, it hits a memory limit and is OOMkilled — the storage is still there, ready to be used by your Postgres pod when it regenerates.
But first, we have to tell the Postgres pod to use that storage and we have to tell other pods to leave our storage alone. And we do that with a Persistent Volume Claim (PersistentVolumeClaim
or PVC
).
Persistent Volume Claim
A Persistent Volume Claim is a claim on a persistent volume — and not just any persistent volume. After all, it wouldn’t be great for your 20G database if your Postgres pod tried to claim and use 1G of storage.
A Persistent Volume Claim lets you request a certain amount of storage. But a Persistent Volume Claim is more than just a specification of a certain amount of storage. You might also want to specify the access mode
for this storage or the StorageClass
. For example, if you have data that changes quickly, you might want a different storage option than if you are taking long-lasting backups that never change.
Storage Classes
By now you may have noticed that the Kubernetes developers are pretty good at using descriptive names and that’s true here: a storage class is a class of storage. That is, it is a category that defines certain behaviors and attributes of the storage. Or as the Kubernetes docs on Storage Class say, "Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators."
One of those behaviors is uniquely important and that’s the provisioner. That’s the field on the Storage Class that lets Kubernetes know how to dynamically provision new Persistent Volumes.
If you have a Kubernetes cluster that somehow doesn’t have any Storage Classes — or is missing a provisioner — then you won’t be able to dynamically create Volumes. In that case, the Kubernetes admin will have to manually provision those Volumes.
The main reason why I’m mentioning Storage Classes to you — the Postgres expert who needs to migrate your database to Kubernetes — is to remind you that where you put your data matters. As I said above, Kubernetes doesn’t care if your Persistent Volume is on-prem or in any particular cloud provider — but each of these options has different storage backends and those different storage backends will offer different options.
Mid-Article Kubernetes Summary
I’ve thrown a lot at you, so to summarize:
- Pods die, ephemeral pods can’t do it alone
- If you want your data to survive a pod death, you need a Persistent Volume
- In order for a Pod to use a Persistent Volume, you need to wire the Pod to the Persistent Volume through a Persistent Volume Claim
- Different Storage Classes offer different options
Putting it all together:
- Just like a Kubernetes cluster has compute and memory resources available that you can request for your pod, a Kubernetes cluster may have storage that you can request. The storage exists as a Persistent Volume and you request a certain amount of storage with a Persistent Volume Claim.
Postgres on Kubernetes
With that in mind, how would you architect a Postgres instance with persistent storage on Kubernetes? Let’s get a napkin:
That’s OK from a Kubernetes perspective: we have a pod that runs the database and we have a Persistent Volume that holds the storage. If the pod goes away, the data is still there in the volume.
But from a Postgres perspective, we have Postgres saving our data and our WAL files (which are good for backing up) to the same volume. That’s not great for some recovery scenarios. So let’s add a little more redundancy to our system by pushing our WAL files to another Persistent Volume.
That’s better for storage persistence and recovery from backup. But we probably want to add a Postgres replica for high-availability. What does that look like with persistent storage?
This is a little generic: I’m not getting into what’s pushing the WAL files to the Persistent Volume for backup storage. Theoretically, you might backup in some other ways. But the general lesson here is you probably want to have your primary storage separate from your backup storage. Maybe you want to have it really separate? You could use something like pgBackRest, which can push files to some remote cloud-based storage.
Again, the general idea here is you likely want to have two separate storage volumes for your database and your recovery material. There are a few ways to do that. I mean, if you wanted to, you could exec into your Postgres pod regularly and run pg_dump, and copy that output somewhere. That's not a production ready solution though.
The Postgres Operator & Storage
One of the great things about using an operator is that a lot of the storage handling is solved for you. With the Postgres Operator (PGO), when you spin up a Postgres instance, PGO can create the pod and the PVC according to your specifications and according to the needs of your Kubernetes cluster.
For instance, maybe you already have a Persistent Volume from a previous Postgres backup and you want to use that data to bootstrap a new cluster — we can do that. Or maybe you want to dynamically create a new Persistent Volume using a particular Storage Class or just want to use the default Storage Class — well, we can do that too with PGO.
(As a reminder, as I noted above, different commercial Kubernetes services offer different options for Storage Classes; and in general, up-to-date clusters on AWS EKS, Azure AKS, Google GKE, etc., will have a default Storage Class. But you can always — and probably should — check what the Storage Classes are with kubectl get storageclass
.)
PGO creates Pods and PVCs for you
Here’s example yaml for a very basic Postgres instance, with one Postgres pod (no replicas):
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo
namespace: postgres-operator
spec:
backups:
pgbackrest:
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
instances:
- dataVolumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
name: ''
replicas: 1
postgresVersion: 14
Notice that the postgrescluster
object has a Volume Claim Spec
under Spec.Backups
and a Data Volume Claim Spec
under Spec.Instances
. We have those separate and independent of each other so you could define each differently.
Once I create that Postgres instance, I can check on the pods:
$ kubectl get pods --namespace postgres-operator
NAME READY STATUS RESTARTS AGE
hippo-repo-host-0 2/2 Running 0 3m23s
hippo-00-6wh4-0 4/4 Running 0 3m23s
Wait, why do I have two pods if I only have one Postgres instance with no replicas? The hippo-repo-host-0
is running pgBackRest
, our preferred backup solution, which is connected to its own local PersistentVolume
. We can check the PersistentVolumeClaims
to see that in action:
$ kubectl get persistentvolumeclaims --namespace postgres-operator
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
hippo-00-l5gw-pgdata Bound pvc-9e19c77f-c111-4891-a1b5-776d23e06c18 1Gi RWO local-path 85s
hippo-repo1 Bound pvc-83a217c0-bffa-4e66-8ff6-15bbd5fadf07 1Gi RWO local-path 86s
Notice also that those pvc
s have a status of Bound
and tell us which volume they are bound to. They have a set capacity of 1Gi (as I requested in that original yaml above) and they have a specified access mode RWO
or read-write once
, meaning one pod can use this volume at a time.
And that StorageClass
”local-path”? That’s the default StorageClass
on this Kubernetes cluster that I’m using:
$ kubectl describe storageclass local-path
Name: local-path
IsDefaultClass: Yes
Provisioner: rancher.io/local-path
Because I have a default Storage Class with a provisioner, I don’t have to worry about creating a Persistent Volume by hand — the provisioner takes care of creating those based on the Persistent Volume Claims.
But what if you didn’t want to backup to another PV, but wanted to backup to some other location? PGO is built to support many different options and, out of the box, you can push your backups to:
- Any Kubernetes supported storage class (which is what we’re using here)
- Amazon S3 (or S3 equivalents like MinIO)
- Google Cloud Storage (GCS)
- Azure Blob Storage
You can even push backups to multiple repositories at the same time — so you could take a local backup and push to remote storage of your choice.
Now let’s check out the Persistent Volumes:
$ kubectl get persistentvolumes
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-83a217c0-bffa-4e66-8ff6-15bbd5fadf07 1Gi RWO Delete Bound default/hippo-repo1 local-path 12m
pvc-9e19c77f-c111-4891-a1b5-776d23e06c18 1Gi RWO Delete Bound default/hippo-00-l5gw-pgdata local-path 12m
What’s interesting here? Notice that the Capacity
and Access Mode
matches the PersistentVolumeClaim
's. It’s also very nice that PersistentVolumes
point to the PVC
that has claimed it, just like PVC
's point to the PersistentVolumes
that they have claimed.
But what’s really interesting here is the Reclaim Policy
. Remember when I said that the lifecycle of the PersistentVolume
was independent of the Pod
and added “sort of”? This is that “sort of.”
A Persistent Volume is independent of the Pod's lifecycle, but not independent of the Persistent Volume Claim's lifecycle. When the PVC is deleted, Kubernetes will handle the PV according to the Reclaim Policy.
So what do you do if you want to delete your postgrescluster
but want to keep storage around to use for something later? You can accomplish this by changing the Reclaim Policy of those Persistent Volumes to Retain
. If you do that and then delete your postgres cluster, your persistent volumes will, well, persist.
Summary
Kubernetes was created first with stateless applications in mind, but the project has grown to embrace databases, with Kubernetes-native architecture that perfectly fits the needs of persisting data.
This is just an introduction to the ideas behind persistent storage on Kubernetes and the many options available to you running a Postgres instance on Kubernetes.
If all this is something you don’t want to handle yourself, that doesn’t mean you can’t run Postgres in Kubernetes. Our Postgres Operator has been supporting customers with stateful apps for over five years. Try our Operator today with our quickstart.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read