Multi-Cluster Postgres Federation on Kubernetes...

One of the reasons why I like deploying Postgres on Kubernetes with PGO, the open source Postgres Operator, is that it makes it easier to set up distributed database systems. One example is using Postgres for edge computing, and you can build an edge computing architecture with both Postgres and Debezium on Kubernetes.

You can also do some fun things with logical replication and many read/write nodes. I posted an example of doing this last year about how you could set up an active-active federated Postgres cluster on Kubernetes. With the release of version 5 of open source PGO, I thought it'd be neat to update the example to show how you can create an active-active federated Postgres cluster using GitOps.

Even better, we'll see how to do this with Postgres 14 and so we can take advantage of some of its logical replication improvements.

Set Up the Postgres Clusters

Similar to the previous example, let's create three Postgres clusters called hippo-east, hippo-central, and hippo-west. We can do so with the following manifests:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo-east
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
  postgresVersion: 14
  instances:
    - dataVolumeClaimSpec:
        accessModes:
          - 'ReadWriteOnce'
        resources:
          requests:
            storage: 1Gi
  users:
    - name: postgres
  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.0.3-0
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              accessModes:
                - 'ReadWriteOnce'
              resources:
                requests:
                  storage: 1Gi
---
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo-central
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
  postgresVersion: 14
  instances:
    - dataVolumeClaimSpec:
        accessModes:
          - 'ReadWriteOnce'
        resources:
          requests:
            storage: 1Gi
  users:
    - name: postgres
  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.0.3-0
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              accessModes:
                - 'ReadWriteOnce'
              resources:
                requests:
                  storage: 1Gi
---
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo-west
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
  postgresVersion: 14
  instances:
    - dataVolumeClaimSpec:
        accessModes:
          - 'ReadWriteOnce'
        resources:
          requests:
            storage: 1Gi
  users:
    - name: postgres
  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.0.3-0
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              accessModes:
                - 'ReadWriteOnce'
              resources:
                requests:
                  storage: 1Gi

Note that in each of the manifests, we are telling PGO to create a Kubernetes Secret postgres superuser. This allows us to login as postgres. This is for convenience: in a production cluster you would lock this down a bit further. We also enable monitoring to ensure that certain Kubernetes introspection functions are available via pgnodemx.

For convenience, let's set the password for the postgres Postgres user to be the same in each of these clusters. PGO can set custom passwords so long as the verifier attribute in the user Secret is set (setting password is for convenience).

Let's set the password to datalake. For convenience, here is an example password/verifier encoded in base64 that can be used to authenticate using password datalake:

PASSWORD='ZGF0YWxha2U='
VERIFIER='U0NSQU0tU0hBLTI1NiQ0MDk2OkZBblNrVXJMM2pINkJwMTdQNUZUdVE9PSRCZWNMVTBZWHpCVXduVGszYjNnaElaNy96UzJYdkQ4d1g1MEh6NUpMMHE0PTpweHVFdG96WVRacEFjNndJTlY1M3NDcjRBZnhrOExMZm9yNU12a1lwMjFzPQ=='

You can then set the postgres user to use password datalake by executing the following commands:

kubectl patch secret -n postgres-operator hippo-east-pguser-postgres -p \\
   "{\\"data\\":{\\"password\\":\\"${PASSWORD}\\",\\"verifier\\":\\"${VERIFIER}\\"}}"
kubectl patch secret -n postgres-operator hippo-central-pguser-postgres -p \\
  "{\\"data\\":{\\"password\\":\\"${PASSWORD}\\",\\"verifier\\":\\"${VERIFIER}\\"}}"
kubectl patch secret -n postgres-operator hippo-west-pguser-postgres -p \\
   "{\\"data\\":{\\"password\\":\\"${PASSWORD}\\",\\"verifier\\":\\"${VERIFIER}\\"}}"

You can test the password was set by connecting to one or all of the Postgres clusters. Using the port-forward technique, you can test this by doing the following:

PG_CLUSTER_NAME="hippo-east"
PG_CLUSTER_PRIMARY_POD=$(kubectl get pod -n postgres-operator -o name \\
  -l postgres-operator.crunchydata.com/cluster=$PG_CLUSTER_NAME,postgres-operator.crunchydata.com/role=master)
kubectl -n postgres-operator port-forward "${PG_CLUSTER_PRIMARY_POD}" 5432:5432

In a separate terminal, run the following:

PGPASSWORD="datalake" psql -h localhost -U postgres -c 'SELECT version();'

which should yield something similar to:

version
-----------------------------
PostgreSQL 14.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit

We now need to provide a way for each Postgres cluster to identify itself. This can be accomplished using some of the helper functions available in pgnodemx, which comes with PGO. For PGO v5, the helper function to accomplish this looks like this:

CREATE EXTENSION IF NOT EXISTS pgnodemx;

CREATE OR REPLACE FUNCTION get_node_name()
RETURNS text
AS $$
  SELECT val FROM kdapi_setof_kv('labels') WHERE key\='postgres-operator.crunchydata.com/cluster';
$$ LANGUAGE SQL SECURITY DEFINER IMMUTABLE;

If you have trouble adding it, try dropping and recreating the pgnodemx extension.

You will need to run the following in the postgres database on hippo-east, hippo-central, and hippo-west. You can verify the function works by executing hippo.get_node_name(), for example on hippo-east:

SELECT hippo.get_node_name();

which should yield:

     get_node_name --------------- hippo-east(1 row)

(Just for fun, if you're on Crunchy Bridge you can use the equivalent function:

CREATE OR REPLACE FUNCTION get_node_name()
RETURNS text
AS $$
  SELECT regexp_replace(pg_read_file('/etc/hostname'), '\\n', '');
$$ LANGUAGE SQL SECURITY DEFINER IMMUTABLE;

With the basis of the Postgres clusters set up, let's now create the schema that lets us build out an active-active federated cluster!

Creating the Schema

We will use a similar schema that we created in the previous example, though to add a Postgres 14 twist, we'll include a multirange data type (and a Postgres 13 twist of the gen_random_uuid() being available without the pgcrypto extension):

CREATE TABLE hippos (
    id uuid DEFAULT gen_random_uuid() NOT NULL,
    node_name text,
    value datemultirange,
    created_at timestamptz
) PARTITION BY LIST (node_name);

CREATE TABLE hippo_default PARTITION of hippos (PRIMARY KEY (id)) DEFAULT;
CREATE TABLE hippo_east PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-east');
CREATE TABLE hippo_central PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-central');
CREATE TABLE hippo_west PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-west');

CREATE OR REPLACE FUNCTION add_node_name()
RETURNS trigger AS $$
BEGIN
  UPDATE hippos
    SET node_name \= get_node_name()
    WHERE node_name IS NULL;

  RETURN NULL;
END
$$ LANGUAGE plpgsql;

CREATE TRIGGER add_node_name
AFTER INSERT ON hippos
  FOR EACH STATEMENT
  EXECUTE FUNCTION add_node_name();

What do we have here? Each partition represents the data that is stored in each Postgres cluster. This means that hippo-east Postgres cluster will only write data to the hippo-east partition, the hippo-central cluster will only write to hippo-central partition, and the hippo-west cluster will only write to the hippo-west partition.

You can actually bootstrap this data with PGO from a SQL file, but for the purposes of this example I wanted to spell it out.

By setting up our table in this way, we will ensure that data in each active instance is only written to its designated area locally, and then can be distributed to the other nodes. Let's look at how we can distribute the data.

Distributing Writes

We can distribute the data to each instance using logical replication. Unlike physical replication, Postgres logical replication allows for subsets of data to be sent between Postgres instances, and also allows data to be sent between writable nodes.

To use logical replication, you need to set up publications on the source servers, and subscriptions on the destinations.

Let's set up the publications. First on hippo-east, create the following publication:

CREATE PUBLICATION pub_hippo_east FOR TABLE hippo_east;

Next, set up the publication on hippo-central:

CREATE PUBLICATION pub_hippo_central FOR TABLE hippo_central;

Finally, set up the publication on hippo-west:

CREATE PUBLICATION pub_hippo_west FOR TABLE hippo_west;

We can now set up the subscriptions between all of the instances. We're going to do something special with the connections: because PGO v5 comes with TLS enabled by default, we're going to use stronger TLS protection for our connections.

First, let's create the subscriptions on hippo-east:

CREATE SUBSCRIPTION sub_hippo_east_hippo_central
  CONNECTION 'dbname=postgres host=hippo-central-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
  PUBLICATION pub_hippo_central;

CREATE SUBSCRIPTION sub_hippo_east_hippo_west
  CONNECTION 'dbname=postgres host=hippo-west-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
  PUBLICATION pub_hippo_west;

Note the sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full section. We need the TLS CA to perform a full verification of the source instances identity (verify-full). The TLS CA is mounted to /pgconf/tls/ca.crt, so we can reference it from there. We can get the host name both from the user Secret or from the list of Kubernetes standard FQDNs.

Let's set up the remaining subscriptions. First, set them up on hippo-central:

CREATE SUBSCRIPTION sub_hippo_central_hippo_east
  CONNECTION 'dbname=postgres host=hippo-east-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
  PUBLICATION pub_hippo_east;

CREATE SUBSCRIPTION sub_hippo_central_hippo_west
  CONNECTION 'dbname=postgres host=hippo-west-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
  PUBLICATION pub_hippo_west;

And finally on hippo-west:

CREATE SUBSCRIPTION sub_hippo_west_hippo_east
  CONNECTION 'dbname=postgres host=hippo-east-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
  PUBLICATION pub_hippo_east;

CREATE SUBSCRIPTION sub_hippo_west_hippo_central
  CONNECTION 'dbname=postgres host=hippo-central-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
  PUBLICATION pub_hippo_central;

With the partitions, publications, and subscriptions in place, now we can start the fun part: adding in the data!

Multiranges on Multi-Nodes!

The schema in place uses Postgres 14's new multirange data type. In essence, a multirange is an ordered list of noncontiguous ranges. For example:

SELECT datemultirange(
  daterange(CURRENT_DATE, CURRENT_DATE + 2),
  daterange(CURRENT_DATE + 5, CURRENT_DATE + 8)
);

may yield something like:

datemultirange
---------------------------------------------------
{[2021-10-12,2021-10-14),[2021-10-17,2021-10-20)}(1 row)

Postgres will detect if your ranges overlap, and will concatenate them within the multirange. For example:

SELECT datemultirange(
  daterange(CURRENT_DATE, CURRENT_DATE + 2),
  daterange(CURRENT_DATE + 1, CURRENT_DATE + 8)
);

may yield something like:

datemultirange
---------------------------
{[2021-10-12,2021-10-20)}(1 row)

As a final bonus, you can access the individual ranges using the unnest function:

SELECT unnest(datemultirange(
  daterange(CURRENT_DATE, CURRENT_DATE + 2),
  daterange(CURRENT_DATE + 5, CURRENT_DATE + 8)
));

may yield something like:

unnest
-------------------------
[2021-10-12,2021-10-14) [2021-10-17,2021-10-20)(2 rows)

Anyway, I digress. Let's now insert some actual data into our instances. First, log into hippo-east and run a query like this:

INSERT INTO hippos (value, created_at)
SELECT
  datemultirange(
    daterange(CURRENT_DATE + n, CURRENT_DATE + n + (100 \* random())::int),
    daterange(CURRENT_DATE + n + 100, CURRENT_DATE + n + 100 + (200 \* random())::int)
  ),
  clock_timestamp()
FROM generate_series(1,25) n;

This creates a bunch of randomly generated datemultirange types. Notice the use of clock_timestamp() -- this returns the actual clock time, vs. CURRENT_TIMESTAMP which returns the time of the start of the transaction.

Go ahead and execute this SQL on hippo-central and hippo-west.

Log back into hippo-east. Let's see if the data propagated:

TABLE hippos;

You should see all 75 results! For a more consolidated viewing:

SELECT node_name, count(\*)
FROM hippos
GROUP BY node_name;

should yield:

node_name   | count
---------------+-------
hippo-east    |    25
hippo-central |    25
hippo-west    |    25
(3 rows)

Run the query on hippo-central and hippo-west and see how your writes have propagated throughout your active-active cluster!

Scaling Up: Adding A New Instance

Let's say we want to add an additional node to our cluster. No problem. Let's call this node hippo-south. First, create hippo-south from a Kubernetes custom resource:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo-south
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
  postgresVersion: 14
  instances:
    - dataVolumeClaimSpec:
        accessModes:
          - 'ReadWriteOnce'
        resources:
          requests:
            storage: 1Gi
  users:
    - name: postgres
  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.0.3-0
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              accessModes:
                - 'ReadWriteOnce'
              resources:
                requests:
                  storage: 1Gi

Similar to before, set the password of the postgres account to datalake:

PASSWORD='ZGF0YWxha2U='
VERIFIER='U0NSQU0tU0hBLTI1NiQ0MDk2OkZBblNrVXJMM2pINkJwMTdQNUZUdVE9PSRCZWNMVTBZWHpCVXduVGszYjNnaElaNy96UzJYdkQ4d1g1MEh6NUpMMHE0PTpweHVFdG96WVRacEFjNndJTlY1M3NDcjRBZnhrOExMZm9yNU12a1lwMjFzPQ=='
kubectl patch secret -n postgres-operator hippo-south-pguser-postgres -p \\
   "{\\"data\\":{\\"password\\":\\"${PASSWORD}\\",\\"verifier\\":\\"${VERIFIER}\\"}}"

Once hippo-south is initialized, log in and add the partitioning function:

CREATE EXTENSION IF NOT EXISTS pgnodemx;

CREATE OR REPLACE FUNCTION get_node_name()
RETURNS text
AS $$
  SELECT val FROM kdapi_setof_kv('labels') WHERE key\='postgres-operator.crunchydata.com/cluster';
$$ LANGUAGE SQL SECURITY DEFINER IMMUTABLE;

Create the hippos data structure on the hippo-south instance. Note the additional of the hippo-south partition:

CREATE TABLE hippos (
    id uuid DEFAULT gen_random_uuid() NOT NULL,
    node_name text,
    value datemultirange,
    created_at timestamptz
) PARTITION BY LIST (node_name);

CREATE TABLE hippo_default PARTITION of hippos (PRIMARY KEY (id)) DEFAULT;
CREATE TABLE hippo_east PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-east');
CREATE TABLE hippo_central PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-central');
CREATE TABLE hippo_west PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-west');
CREATE TABLE hippo_south PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-south');

CREATE OR REPLACE FUNCTION add_node_name()
RETURNS trigger AS $$
BEGIN
  UPDATE hippos
    SET node_name \= get_node_name()
    WHERE node_name IS NULL;

  RETURN NULL;
END
$$ LANGUAGE plpgsql;

CREATE TRIGGER add_node_name
AFTER INSERT ON hippos
  FOR EACH STATEMENT
  EXECUTE FUNCTION add_node_name();

Now, set up hippo-south so it can be added to the cluster. This involves creating a publication on hippo-south and subscribing it to the current nodes:

CREATE PUBLICATION pub_hippo_south FOR TABLE hippo_south;

CREATE SUBSCRIPTION sub_hippo_south_hippo_east
  CONNECTION 'dbname=postgres host=hippo-east-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
  PUBLICATION pub_hippo_east;

CREATE SUBSCRIPTION sub_hippo_south_hippo_central
  CONNECTION 'dbname=postgres host=hippo-central-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
  PUBLICATION pub_hippo_central;

CREATE SUBSCRIPTION sub_hippo_south_hippo_west
  CONNECTION 'dbname=postgres host=hippo-west-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
  PUBLICATION pub_hippo_west;

We can test that hippo-south is in the cluster. Try running our verification query:

SELECT node_name, count(\*)
FROM hippos
GROUP BY node_name;

This should yield:

node_name   | count
---------------+-------
hippo-east    |    25
hippo-central |    25
hippo-west    |    25
(3 rows)

Excellent. Let's insert some data into hippo-south:

INSERT INTO hippos (value, created_at)
SELECT
  datemultirange(
    daterange(CURRENT_DATE + n, CURRENT_DATE + n + (100 \* random())::int),
    daterange(CURRENT_DATE + n + 100, CURRENT_DATE + n + 100 + (200 \* random())::int)
  ),
  clock_timestamp()
FROM generate_series(1,25) n;

Now we have one final step: making the other instances aware of hippo-south. Log into hippo-east and execute:

CREATE TABLE hippo_south PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-south');

CREATE SUBSCRIPTION sub_hippo_east_hippo_south
  CONNECTION 'dbname=postgres host=hippo-south-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
  PUBLICATION pub_hippo_south;

On hippo-central:

CREATE TABLE hippo_south PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-south');

CREATE SUBSCRIPTION sub_hippo_central_hippo_south
  CONNECTION 'dbname=postgres host=hippo-south-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
  PUBLICATION pub_hippo_south;

And on hippo-west:

CREATE TABLE hippo_south PARTITION OF hippos (PRIMARY KEY (id)) FOR VALUES IN ('hippo-south');

CREATE SUBSCRIPTION sub_hippo_west_hippo_south
  CONNECTION 'dbname=postgres host=hippo-south-primary.postgres-operator.svc user=postgres password=datalake sslrootcert=/pgconf/tls/ca.crt sslmode=verify-full'
  PUBLICATION pub_hippo_south;

You can verify that hippo-east, hippo-central, and hippo-west are receiving writes from hippo-south by executing:

SELECT node_name, count(\*)
FROM hippos
GROUP BY node_name;

which should yield:

node_name   | count
---------------+-------
hippo-east    |    25
hippo-south   |    25
hippo-central |    25
hippo-west    |    25
(4 rows)

Next Steps

The above guide should get you on your way to building out Postgres clusters that can accept writes across multiple nodes and share them with each other. By using PGO, the open source Postgres Operator, you can also set up high availability, backups, and monitoring to make your cluster production ready.

Further more, you can bootstrap a lot of the schema work by telling PGO to set up a database according to a SQL file you provide. This makes setting up such clusters reproducible across environments, which is the GitOps way!

Kubernetes and PGO make it easier to set up distributed Postgres clusters and allow for designing and scaling different topologies for managing your data, from active-active federation to edge computing.

Latest Articles

Multi-Cluster Postgres Federation on Kubernetes with Postgres 14 Using GitOp

Set Up the Postgres Clusters

Creating the Schema

Distributing Writes

Multiranges on Multi-Nodes!

Scaling Up: Adding A New Instance

Next Steps

Related Articles