Sidecar Service Meshes with Crunchy Postgres...

One of the great new features recently added to Kubernetes - native Sidecar Containers - continues to get closer to GA with each new Kubernetes release. I was reviewing all of the great progress recently made by the Kubernetes Enhancement Proposal (KEP) on Sidecar Containers and realized this feature has already produced some exciting results. For instance, this feature is already making it easier than ever before to use Crunchy Postgres for Kubernetes with two important service mesh solutions: Istio and Linkerd.

Service Meshes in Sidecars

Before we dig in, let’s review the sidecar pattern within a Kubernetes environment. More specifically, while different architectures and solutions do exist (for instance, Istio’s Ambient Service Mesh), service meshes such as Istio and Linkerd typically leverage a “sidecar” pattern. In this case a proxy container is run within each Pod to configure and enable network communications via the service mesh.

This can be seen in the following diagram, which shows the Pods for a PostgresCluster both before and after sidecars have been injected:

sidecars postgres kubernetes.jpg

While this looks great on paper, the problem with this approach is that up until now, “sidecars” were simply considered a pattern within Kubernetes, rather than being tied to a specific Kubernetes feature or functionality. This effectively meant that while additional containers can be added to Pods alongside the main application container, Kubernetes itself really has no formal support for - nor any true notion of - sidecar containers. In other words, Kubernetes had no way of distinguishing between an application container and a sidecar container, effectively treating them all the same, regardless of their specific purpose within the Pod.

Additionally, while the various containers within a Pod can share process namespaces, volumes, and more, the lifecycle of a sidecar container (e.g. the timing of when it is started, etc.) could not be coordinated with the main application containers. At least, not without resorting to workarounds and hacks (such as Linkerd’s linkerd-await project).

This, in turn, becomes problematic for sidecar use cases in which the coordination of container lifecycles is truly needed. Service meshes fall into this exact category, since you’re typically going to want all of your networking fully/properly configured before the main application container within the Pod attempts to use the network. Additionally, any running service mesh services (e.g. proxy containers) should then also cleanly terminate once they are no longer needed by the main application container.

The issue is especially pronounced with Kubernetes Jobs, where the sole purpose is to run a container through to completion (unlike a Pod for a Deployment or StatefulSet, which aims to keep a container up-and-running indefinitely). In other words, not only do you want the container setting-up/enabling networking to run before the main application container running the actual job workload, but you also want that container to properly terminate once the Job completes. The popularity of the following issue on the Kubernetes issue tracker (which has now been closed!) highlights the scope of this issue, and why a fix was truly needed: Better support for sidecar containers in batch jobs.

And for anyone who has tried using CPK with Istio or Linkerd without enabling support for native Sidecar Containers, you might have seen this manifest in a few ways. For instance, you might have seen the initial backup Job triggered by CPK fail with a network error. The Job then becomes “stuck”, i.e. retries are not attempted, and it never completes (successfully or not).

$ kubectl get jobs -n postgres-operator
NAME                STATUS    COMPLETIONS   DURATION   AGE
hippo-backup-56h7   Running   0/1           3m23s      3m23s
$ kubectl get pods -n postgres-operator \
    -l postgres-operator.crunchydata.com/pgbackrest-backup
NAME                      READY   STATUS   RESTARTS   AGE
hippo-backup-56h7-b85jp   1/2     Error    0          3m26s
$ kubectl logs -n postgres-operator hippo-backup-56h7-b85jp
time="2024-09-25T14:57:21Z" level=info msg="crunchy-pgbackrest starts"
time="2024-09-25T14:57:21Z" level=info msg="debug flag set to false"
time="2024-09-25T14:57:21Z" level=fatal msg="Get \"https://10.96.0.1:443/api/v1/namespaces/postgres-operator/pods?labelSelector=postgres-operator.crunchydata.com%2Fcluster%3Dhippo%2Cpostgres-operator.crunchydata.com%2Fpgbackrest%3D%2Cpostgres-operator.crunchydata.com%2Fpgbackrest-dedicated%3D\": dial tcp 10.96.0.1:443: connect: connection refused"

Or you might also see the backup complete successfully, yet the Job never completes since the service mesh container simply continues to run indefinitely.

$ kubectl get jobs -n postgres-operator
NAME                STATUS    COMPLETIONS   DURATION   AGE
hippo-backup-5lbk   Running   0/1           4m25s      4m25s
$ kubectl get pods -n postgres-operator \
    -l postgres-operator.crunchydata.com/pgbackrest-backup
NAME                      READY   STATUS     RESTARTS   AGE
hippo-backup-5lbk-wqr7k   1/2     NotReady   0          4m49s
$ kubectl logs -n postgres-operator -c pgbackrest hippo-backup-5lbk-wqr7k
time="2024-09-25T18:39:41Z" level=info msg="crunchy-pgbackrest starts"
time="2024-09-25T18:39:41Z" level=info msg="debug flag set to false"
time="2024-09-25T18:39:41Z" level=info msg="backrest backup command requested"
time="2024-09-25T18:39:41Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=1]"
time="2024-09-25T18:42:24Z" level=info msg="output=[]"
time="2024-09-25T18:42:24Z" level=info msg="stderr=[WARN: option 'repo1-retention-full' is not set for 'repo1-retention-full-type=count', the repository may run out of space\n      HINT: to retain full backups indefinitely (without warning), set option 'repo1-retention-full' to the maximum.\nWARN: no prior backup exists, incr backup has been changed to full\n]"
time="2024-09-25T18:42:24Z" level=info msg="crunchy-pgbackrest ends"

Native Kubernetes Sidecar Containers

The good news is that the native Kubernetes Sidecar Containers address this issue head-on. By introducing a special type of init container that not only runs before the main application container, but then remains running, Service meshes such as Istio and Linkerd can now be sure network services are fully configured and enabled before the main application container runs.

Additionally, Sidecar Containers have a lifecycle that is independent of the main application containers. This means Sidecar Containers can be started, stopped, updated and more, all without impacting the main application container. For Jobs, this means a Sidecar Container will no longer interfere with a Job properly completing.

I encourage you to read the Sidecar Containers documentation for full details about this exciting new feature, which has been in beta since Kubernetes v1.29. As you will see, creating a native Sidecar Container is as simple as adding an initContainer to the Pod with a restartPolicy of always.

And as a beta feature, Sidecar Containers are on by default in any recent version of Kubernetes, meaning it’s never been easier to try them out. This means you can start using Sidecar Containers today to seamlessly add your Postgres databases to your Istio and Linkerd service meshes. Read-on to see what this looks like in action with CPK!

Example Istio and Linkerd Set Up

For the following demonstrations I will be using a local kind cluster, which is running Kubernetes v1.31. Helm will be used to setup and install CPK, Istio and Linkerd. Additionally, Podman or Docker will be needed to generate the certificates required to install and test Linkerd.

Operator Installation

Installing Crunchy Postgres for Kubernetes via Helm is as simple as running a single helm install command.

helm install -n postgres-operator --create-namespace \
    pgo oci://registry.developers.crunchydata.com/crunchydata/pgo

This command installs the latest version directly from the Crunchy Developer registry. When you see the hippo, you can be confident the install was successful!

Istio Installation

For the first demonstration we’re going to use Istio to provide the service mesh for our PostgresCluster. Istio can be installed via Helm by running the following commands:

helm repo add istio https://istio-release.storage.googleapis.com/charts
helm install istio-base istio/base --create-namespace -n istio-system \
    --set defaultRevision=default
helm install istiod istio/istiod -n istio-system \
    --set pilot.env.ENABLE_NATIVE_SIDECARS=true

As you can see in the last command, we also set ENABLE_NATIVE_SIDECARS to true. This is what instructs Istio to leverage the new native Sidecar Containers feature.

Once Istio is installed, you should see the istiod Pod running in the istio-system namespace.

$ kubectl get pods -n istio-system
NAME                     READY   STATUS    RESTARTS   AGE
istiod-8c4cd4448-g7fnk   1/1     Running   0          75s

Next, you will need to enable Istio sidecar injection in the namespace that will be used to deploy our Postgres databases. This is what will instruct Istio to inject native Sidecar Containers (i.e., init containers with a restartPolicy of always) into each Pod created within the namespace. You will therefore need to run the following command, since we will be using the postgres-operator namespace to deploy our PostgresCluster:

kubectl label namespace postgres-operator istio-injection=enabled

With Istio now properly configured, you can finally create a simple PostgresCluster in the postgres-operator namespace:

kubectl apply -n postgres-operator --server-side -f - <<EOF
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-16.4-0
  postgresVersion: 16
  instances:
    - name: instance1
      dataVolumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.52.1-1
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 1Gi
EOF

As you will see, the PostgresCluster will now deploy successfully, with an Istio Sidecar Container injected into each Pod comprising the PostgresCluster. For instance, if you inspect the Pod for the initial backup job, you will see a Sidecar Container for the Istio proxy. More specifically, you will see an istio-proxy initContainer, with a restartPolicy of always.

$ kubectl get pods -n postgres-operator -o yaml \
    -l postgres-operator.crunchydata.com/pgbackrest-backup
...
  initContainers:
    ...
    name: istio-proxy
    ...
    restartPolicy: Always
    ...

More importantly, this also means the initial backup Job no longer fails due to network issues (as described above), nor does it become “stuck” attempting to complete successfully!

$ kubectl get jobs -n postgres-operator
NAME                STATUS     COMPLETIONS   DURATION   AGE
hippo-backup-9h2z   Complete   1/1           2m52s      2m54s
$ kubectl get pods -n postgres-operator \
    -l postgres-operator.crunchydata.com/pgbackrest-backup
NAME                      READY   STATUS      RESTARTS   AGE
hippo-backup-9h2z-2mm76   0/2     Completed   0          2m58s
$ kubectl logs -n postgres-operator hippo-backup-9h2z-2mm76
time="2024-09-25T15:05:27Z" level=info msg="crunchy-pgbackrest starts"
time="2024-09-25T15:05:27Z" level=info msg="debug flag set to false"
time="2024-09-25T15:05:27Z" level=info msg="backrest backup command requested"
time="2024-09-25T15:05:27Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=1]"
time="2024-09-25T15:08:11Z" level=info msg="output=[]"
time="2024-09-25T15:08:11Z" level=info msg="stderr=[WARN: option 'repo1-retention-full' is not set for 'repo1-retention-full-type=count', the repository may run out of space\n      HINT: to retain full backups indefinitely (without warning), set option 'repo1-retention-full' to the maximum.\nWARN: no prior backup exists, incr backup has been changed to full\n]"
time="2024-09-25T15:08:11Z" level=info msg="crunchy-pgbackrest ends"

So as you can see, by simply configuring Istio to use native sidecar containers, all of your Postgres databases will now be seamlessly added to your service mesh.

Now let’s see what this looks like with Linkerd. However, before moving on, let’s first cleanup the Istio demonstration by deleting the cluster, removing the istio-injection label on the namespace, and uninstalling Istio.

kubectl delete -n postgres-operator postgrescluster hippo
kubectl label namespace postgres-operator istio-injection-
helm uninstall -n istio-system istiod
helm uninstall -n istio-system istio-base

Linkerd Installation

Similar to Istio, you’ll first need to install Linkerd via Helm. Before installing, however, you’ll need to generate certs required by the Linkerd installer for mTLS. Following the example in the Linkerd documentation, you can use step (as provided by the step-cli container) to generate the required certs. Note that docker can be used in place of podman when running these commands.

mkdir /tmp/linkerd
chmod 777 /tmp/linkerd
podman run -v /tmp/linkerd:/home/step --userns keep-id \
    smallstep/step-cli \
    step certificate create root.linkerd.cluster.local ca.crt ca.key \
        --profile root-ca --no-password --insecure
podman run -v /tmp/linkerd:/home/step --userns keep-id \
    smallstep/step-cli \
    step certificate create identity.linkerd.cluster.local \
    issuer.crt issuer.key --profile intermediate-ca --not-after 8760h \
    --no-password --insecure --ca ca.crt --ca-key ca.key

With the certs now generated, we can proceed with installing Linkerd:

helm repo add linkerd-edge https://helm.linkerd.io/edge
helm install linkerd-crds linkerd-edge/linkerd-crds \
  -n linkerd --create-namespace
helm install linkerd-control-plane \
  -n linkerd \
  --set-file identityTrustAnchorsPEM=/tmp/linkerd/ca.crt \
  --set-file identity.issuer.tls.crtPEM=/tmp/linkerd/issuer.crt \
  --set-file identity.issuer.tls.keyPEM=/tmp/linkerd/issuer.key \
  linkerd-edge/linkerd-control-plane

And we can now annotate the postgres-operator namespace to indicate to Linkerd that sidecars should be injected into any Pods within the namespace. Additionally, Linkerd requires an annotation to enable the use of native Sidecar Containers.

kubectl annotate namespace postgres-operator \
    linkerd.io/inject=enabled
kubectl annotate namespace postgres-operator \
    config.alpha.linkerd.io/proxy-enable-native-sidecar=true

Now let’s once again create a PostgresCluster in the postgres-operator namespace:

kubectl apply -n postgres-operator --server-side -f - <<EOF
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-16.4-0
  postgresVersion: 16
  instances:
    - name: instance1
      dataVolumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.52.1-1
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 1Gi
EOF

As you will once again see, the PostgresCluster will now deploy successfully, only this time with a Linkerd Sidecar Container injected into each Pod comprising the PostgresCluster. So if you again inspect the Pod for the initial backup job, you can use a Sidecar Container for the Linkerd proxy. More specifically, you’ll now see a linkerd-proxy initContainer, with a restartPolicy of always.

$ kubectl get pods -n postgres-operator -o yaml \
    -l postgres-operator.crunchydata.com/pgbackrest-backup
...
  initContainers:
    ...
    name: linkerd-proxy
    ...
    restartPolicy: Always
    ...

Additionally, the backup Job will once again complete successfully.

$ kubectl get jobs -n postgres-operator
NAME                STATUS     COMPLETIONS   DURATION   AGE
hippo-backup-txnx   Complete   1/1           2m51s      47m
$ kubectl get pods -n postgres-operator \
    -l postgres-operator.crunchydata.com/pgbackrest-backup
NAME                      READY   STATUS      RESTARTS   AGE
hippo-backup-txnx-k7prd   0/2     Completed   0          48m
$ kubectl logs -n postgres-operator hippo-backup-txnx-k7prd
Defaulted container "pgbackrest" out of: pgbackrest, linkerd-init (init), linkerd-proxy (init)
time="2024-09-25T18:51:12Z" level=info msg="crunchy-pgbackrest starts"
time="2024-09-25T18:51:12Z" level=info msg="debug flag set to false"
time="2024-09-25T18:51:12Z" level=info msg="backrest backup command requested"
time="2024-09-25T18:51:12Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=1]"
time="2024-09-25T18:53:56Z" level=info msg="output=[]"
time="2024-09-25T18:53:56Z" level=info msg="stderr=[WARN: option 'repo1-retention-full' is not set for 'repo1-retention-full-type=count', the repository may run out of space\n      HINT: to retain full backups indefinitely (without warning), set option 'repo1-retention-full' to the maximum.\nWARN: no prior backup exists, incr backup has been changed to full\n]"
time="2024-09-25T18:53:56Z" level=info msg="crunchy-pgbackrest ends"

And that is it! By simply enabling native sidecar support (this time with Linkerd), your Postgres databases are once again seamlessly added to your service mesh.

Let’s now wrap-up by cleaning up the PostgresCluster and Linkerd.

kubectl delete postgrescluster hippo -n postgres-operator
kubectl annotate namespace postgres-operator \
    linkerd.io/inject-
kubectl annotate namespace postgres-operator \
    config.alpha.linkerd.io/proxy-enable-native-sidecar-
helm uninstall -n linkerd linkerd-control-plane
helm uninstall -n linkerd linkerd-crds
rm -r /tmp/linkerd

Conclusion

The introduction of native Kubernetes Sidecar Containers represents a significant evolution within Kubernetes. Especially for operators such as Crunchy Postgres for Kubernetes that have important functionality tied to proper Jobs (e.g. backups, restores, clones, and more). This new feature gives service meshes such as Istio and Linkerd the vital ability to properly coordinate with the main application container. We’re incredibly excited to see just how well this powerful new feature works in action with both Istio and Linkerd. I hope you can appreciate the importance of this new feature as well - especially as it pertains to your own Postgres deployments and service meshes!

Crunchy will be keeping a close eye on the Sidecar Containers KEP as it continues to progress across new releases of Kubernetes. In the meantime, feel free to reach out via the Crunchy Discord Server to discuss your own service mesh use cases, and/or any challenges you have faced using various service meshes for your own deployments and architectures.

Latest Articles

Sidecar Service Meshes with Crunchy Postgres for Kubernetes