Kubernetes + Postgres Cluster From Scratch on...

Co-authored by Brian Pace

I was excited to hear that Kubernetes 1.22 was recently released with better support for cgroup-v2 and has support for Linux swap. These changes potentially resolve two of my chief complaints about running Postgres under Kubernetes. Obviously it will take some time before we see uptake in the wild on these features, but I wanted to become familiar with them.

For what it's worth, I also want to eventually play with the new alpha seccomp support in Kubernetes v1.22, but will have to save that for another day. In the meantime, if you are interested in Seccomp with PostgreSQL, you can see my presentation from FOSDEM that discusses a Postgres extension I wrote, unsurprisingly called pgseccomp.

Disclaimer

I am not a Kubernetes Admin by profession, but rather an old database-curmudgeon(tm), so please be careful if you try this at home, and don't flame me if I don't approach everything in the canonical way.

The Goal

Specifically, I set out to build a three node v1.22 Kubernetes cluster with one main control-plane node and two worker-only nodes. However, I found that virtually every "recipe" I came across for doing this (using equivalent distributions, e.g. CentOS 8) would result in a non-working cluster, even when not trying to enable swap and/or cgroup v2.

And of course, once the cluster is up and running, my next desire was to install the Crunchy Data Operator v5 and deploy PostgreSQL starting from the examples.

So I enlisted some help from my friend and colleague Brian Pace, and documented my own successful recipe below.

The Journey

First, I started with a mostly vanilla Virtual machine image created with Rocky Linux 8.4 installed from ISO.

Mostly defaults
Server install, with desktop
Enable networking
Enable setting time from network
Create local user as an admin

After initial setup and rebooting into a working Rocky 8 base instance, I shut down the VM and made three copies of the qcow2 images. From there I created three VMs, each using one of the base-image copies.

On each of my three kube node VMs, I noted the MAC address for the network interface, and set up DHCP Static Mappings for the kube nodes by MAC address. I also set the desired hostnames -- kube01, kube02, and kube03.

Left TODO

Note that, in addition to seccomp, I have also punted on enabling the firewall, and on running with selinux in enforcing. I hope to tackle both of those later.

The Recipe

Without further ado, what follows is my recipe.

Caution: steps below with the preceding comment "... on main control-plane node only" should only be run on the control-plane node (in my case, kube01), and the ones with the preceding comment "... from worker-only nodes" should only be run on the worker-only nodes. Also note that this setup is in a lab as it is never recommended to run a single node control-plane configuration for production or other critical environments.

Basic Node Setup

Unless otherwise stated, each of the steps should be performed on each host with the necessary host name, ip, etc. modifications. Start from Rocky Linux 8 fresh server install as outlined above, and note the MAC address for the network interface. Set up DHCP Static Mappings for the kube nodes by MAC address, and then change the following variable values to suit your setup:

### variables setup ###

# IP Address for main control-plane node
MC_IP=<your-node-IP-here>

# POD network subnet
POD_NETWORK_CIDR="10.244.0.0/16"

# Hostname for the current node
MYHOSTNAME=kube01
#MYHOSTNAME=kube02
#MYHOSTNAME=kube03

# My username
LCLUSER=jconway

Local user setup

Next install ssh public key for my local username:

mkdir /home/${LCLUSER}/.ssh
vi /home/${LCLUSER}/.ssh/authorized_keys
# paste desired ssh public key and save

# ssh will not be happy if permissions are not correct
chmod 700 /home/${LCLUSER}/.ssh
chmod 600 /home/${LCLUSER}/.ssh/authorized_keys

Node setup

Update the system to get the latest fixes and fix the hostname:

sudo dnf update

# If needed, reset the desired hostname
# This may be required, for example, if the current
# host VM was cloned from a base image
sudo hostnamectl set-hostname ${MYHOSTNAME}

Kubernetes specific setup

Allow running with swap on

OUTFILE=/etc/sysconfig/kubelet
sudo out=$OUTFILE bash -c 'cat << EOF >> $out
KUBELET_EXTRA_ARGS="--fail-swap-on=false"
EOF'

Put selinux in permissive for now (but should be fixed later!)

sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config

Setup required sysctl params -- these persist across reboots

OUTFILE=/etc/sysctl.d/k8s.conf
sudo out=$OUTFILE bash -c 'cat << EOF >> $out
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
EOF'
sudo sysctl --system

systemd doesn't use cgroup v2 by default; configure the system to use it by adding systemd.unified_cgroup_hierarchy=1 to the kernel command line

sudo dnf install -y grubby && \
  sudo grubby \
  --update-kernel=ALL \
  --args="systemd.unified_cgroup_hierarchy=1"

Turn on controllers that are off by default; cpu controller, at least, seems to be required for the kubelet service to function

OUTFILE=/etc/systemd/system.conf
sudo out=$OUTFILE bash -c 'cat << EOF >> $out
DefaultCPUAccounting=yes
DefaultIOAccounting=yes
DefaultIPAccounting=yes
DefaultBlockIOAccounting=yes
EOF'

Seems that reboot is the only way to make the /etc/systemd/system.conf changes take effect and reboot is of course needed for the kernel command line args change anyway

sudo reboot

After the reboot we need to redo the variables setup

# IP Address for main control-plane node
MC_IP=<your-node-IP-here>

# POD network subnet
POD_NETWORK_CIDR="10.244.0.0/16"

Verify setup

# show swap is on
swapon --show

# check for type cgroup2
mount -l|grep cgroup

# check for cpu controller
cat /sys/fs/cgroup/cgroup.subtree_control

Install docker-ce

sudo dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf -y remove runc
sudo dnf -y install docker-ce --nobest

Tell docker to use systemd for cgroup control

sudo mkdir /etc/docker
OUTFILE=/etc/docker/daemon.json
sudo out=$OUTFILE bash -c 'cat << EOF >> $out
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF'

Enable and start the docker service

sudo systemctl enable --now docker
sudo systemctl status docker

Disable firewall for now (but fix later!): ports needed can be seen here and here.

sudo systemctl stop firewalld
sudo systemctl disable firewalld

Create kubernetes repo file (note that the "el7" is intentional)

OUTFILE=/etc/yum.repos.d/kubernetes.repo
sudo out=$OUTFILE bash -c 'cat << EOF >> $out
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF'

Install Kubernetes

sudo dnf -y install kubelet kubeadm kubectl --disableexcludes=kubernetes

Make systemd the kubelet cgroup driver

sudo mkdir -p /var/lib/kubelet
OUTFILE=/var/lib/kubelet/config.yaml
sudo out=$OUTFILE bash -c 'cat << EOF >> $out
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
EOF'

Enable and start the kubelet service (note that the kubelet service fails until the init/join step is run)

sudo systemctl enable --now kubelet
sudo systemctl status kubelet

Init Kubernetes on main control-plane node only. Don't forget to capture the "kubeadm join ..." output

sudo kubeadm init --pod-network-cidr=${POD_NETWORK_CIDR} --apiserver-advertise-address=${MC_IP} --kubernetes-version stable-1.22 --ignore-preflight-errors="Swap"

Enable root to run kubectl on main control-plane node only

sudo bash -c 'mkdir -p $HOME/.kube'
sudo bash -c 'cp -i /etc/kubernetes/admin.conf $HOME/.kube/config'

Install networking on main control-plane node only

sudo kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Join kube cluster from worker-only nodes. Note that you should use the actual token emitted during the init on the main control-plane node. If you forgot to record it, run the following command on the main control-plane node: kubeadm token create --print-join-command

sudo kubeadm join ${MC_IP}:6443 --token wze441.4v1fexq9ak1ew8eq \
        --discovery-token-ca-cert-hash sha256:44bda0ab055d721ae00a1ab8f0d45b0bf5690501209c26a810bf251688891f84 \
        --ignore-preflight-errors="Swap"

View state of play on main control-plane node

sudo kubectl get nodes
sudo kubectl get deployment,svc,pods,pvc,rc,rs --all-namespaces
sudo kubectl get deployment,svc,pods,pvc,rc,rs --all-namespaces -o wide|less

Installing the Crunchy pgo Operator

At this point, you should have a fully functional Kubernetes cluster ready for you to enjoy. The next step we want to take is to install something useful, so I am going to start with the Crunchy pgo operator.

Install the kube cluster configuration on your local machine (in my case, my desktop)

scp ${MC_IP}:/home/jconway/admin.conf $HOME/.kube/kube01.config
export KUBECONFIG=$HOME/.kube/kube01.config
kubectl get nodes

The output of the last command there should look something like this

NAME     STATUS   ROLES                  AGE   VERSION
kube01   Ready    control-plane,master   88d   v1.22.2
kube02   Ready    <none>                 88d   v1.22.2
kube03   Ready    <none>                 88d   v1.22.2

Grab the pgo operator examples repo from github

cd ${HOME}
git clone git@github.com:CrunchyData/postgres-operator-examples.git
cd postgres-operator-examples
kubectl apply -k kustomize/install

The output of the last command there should look something like this

namespace/postgres-operator unchanged
customresourcedefinition.apiextensions.k8s.io/postgresclusters.postgres-operator.crunchydata.com configured
serviceaccount/pgo configured
clusterrole.rbac.authorization.k8s.io/postgres-operator configured
clusterrolebinding.rbac.authorization.k8s.io/postgres-operator configured
deployment.apps/pgo configured

Install an appropriate storage class

kubectl apply -f https://openebs.github.io/charts/openebs-operator.yaml
kubectl patch storageclass openebs-hostpath -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Deploy a Crunchy PostgreSQL Pod

kubectl apply -k kustomize/postgres
kubectl get pods -n postgres-operator

The output of the last command there should look something like this

NAME                     READY   STATUS    RESTARTS   AGE
hippo-instance1-pd9w-0   3/3     Running   0          10m
hippo-repo-host-0        1/1     Running   0          10m
pgo-69949584b9-65bqw     1/1     Running   0          10m

Exec into the Postgres pod to explore as desired

kubectl exec -it -n postgres-operator -c database hippo-instance1-pd9w-0 -- bash

The operator does create a default user that is the same name as the cluster (hippo in your case). To get that password you can execute the following

export PGPASSWORD=$(kubectl get secret hippo-pguser-hippo -n postgres-operator -o jsonpath={.data.password} | base64 --decode)

Create a NodePort service to allow for connectivity from outside of the Kubernetes cluster.

cat hippo-np.yaml

apiVersion: v1
kind: Service
metadata:
  name: hippo-np
spec:
  type: NodePort
  selector:
    postgres-operator.crunchydata.com/cluster: hippo
    postgres-operator.crunchydata.com/role: master
  ports:
    - protocol: TCP
      port: 5432
      targetPort: 5432
      nodePort: 30032

kubectl apply -f hippo-np.yaml -n postgres-operator

Finally connect to PostgreSQL

$ export PGSSLMODE=require
$ psql -h kube02 -p 30032 -U hippo
psql (13.3, server 13.4)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

hippo=> select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
(1 row)

Cleanup postgres instance only (CAUTION!)

kubectl delete -k kustomize/postgres

Complete cleanup (CAUTION!)

Note: Before removing the Operator, all postgres clusters that were created by the Operator should be deleted first. Failing to do so will cause the finalizer to create some issues.

kubectl get postgrescluster --all-namespaces

kubectl delete -k kustomize/install

Summary

In this article I showed you how to create a Kubernetes cluster from scratch, including getting it to run with cgroup v2 and swap turned on.

I also showed the basic deployment of the Crunchy Data pgo operator and a PostgreSQL pod.

I hope in the future to run some tests to understand how this setup behaves versus the more common (but in my opinion likely less resilient) environment with cgroup v1 and swap turned off. Stay tuned!

Latest Articles

Kubernetes + Postgres Cluster From Scratch on Rocky 8

Disclaimer

The Goal

The Journey

Left TODO

The Recipe

Basic Node Setup

Local user setup

Node setup

Kubernetes specific setup

Installing the Crunchy pgo Operator

Summary

Related Articles