Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Deploying Active-Active PostgreSQL on Kubernetes

Avatar for Dave Cramer

Dave Cramer

11 min read

Kubernetes is a very popular container orchestration framework. I’ll show you how to get Symmetric-DS working on a single Kubernetes instance.

I had previously explored how to build an active-active PostgreSQL cluster using Symmetric-DS. The steps are essentially the same on Kubernetes:

  1. Start 2 PostgreSQL pods
  2. Create a user and a database on each pod/instance
  3. Start the primary symmetric-ds pod
  4. Add the symmetric-ds triggers and routes
  5. Open registration
  6. Start the secondary symmertic-ds pod

However, there are some interesting nuances I discovered while building out this setup, which I will discuss below. Before diving into the details, I'd also like to credit my colleagues Patrick McLaughlin and Simon Nielly as co-authors for helping me to put together this guide.

Anyway, let's explore how to deploy an active-active PostgreSQL cluster on Kubernetes.

Start 2 PostgreSQL pods

We will use the Crunchy Container Suite to create two postgres instances. This demonstration is using container suite version 2.3.1

Note: I use a kubernetes namespace “demo”

kubectl create \
  -f /home/davec/cdev/src/github.com/crunchydata/crunchy-containers/conf/demo-namespace.json

kubectl config set-context $(kubectl config current-context) --namespace=demo

In the container suite there are examples, specifically in crunchy-containers/examples/kube/primary. This provides us with a Kubernetes pod description to create one instance as well as a simple way to start it. Setting this up is beyond the scope of this article, but you can find the details here. In order to create the second instance we need to modify the primary.json file contained in that directory. Below are the differences. Essentially all that changes are the names of the pods, services and the ports

diff -c primary.json secondary.json
*** primary.json	2019-05-13 18:47:56.934175762 +0000
--- secondary.json	2019-05-13 20:35:11.104767339 +0000
***************
*** 2,10 ****
      "kind": "Service",
      "apiVersion": "v1",
      "metadata": {
!         "name": "primary",
          "labels": {
!             "name": "primary",
              "cleanup": "$CCP_NAMESPACE-primary"
          }
      },
--- 2,10 ----
      "kind": "Service",
      "apiVersion": "v1",
      "metadata": {
!         "name": "secondary",
          "labels": {
!             "name": "secondary",
              "cleanup": "$CCP_NAMESPACE-primary"
          }
      },
***************
*** 12,24 ****
          "ports": [
              {
                  "protocol": "TCP",
!                 "port": 5432,
                  "targetPort": 5432,
                  "nodePort": 0
              }
          ],
          "selector": {
!             "name": "primary"
          },
          "type": "ClusterIP",
          "sessionAffinity": "None"
--- 12,24 ----
          "ports": [
              {
                  "protocol": "TCP",
!                 "port": 5433,
                  "targetPort": 5432,
                  "nodePort": 0
              }
          ],
          "selector": {
!             "name": "secondary"
          },
          "type": "ClusterIP",
          "sessionAffinity": "None"
***************
*** 29,37 ****
      "kind": "Deployment",
      "apiVersion": "extensions/v1beta1",
      "metadata": {
!         "name": "primary",
          "labels": {
!             "name": "primary",
              "cleanup": "$CCP_NAMESPACE-primary"
          }
      },
--- 29,37 ----
      "kind": "Deployment",
      "apiVersion": "extensions/v1beta1",
      "metadata": {
!         "name": "secondary",
          "labels": {
!             "name": "secondary",
              "cleanup": "$CCP_NAMESPACE-primary"
          }
      },
***************
*** 40,46 ****
          "template": {
              "metadata": {
                  "labels": {
!                     "name": "primary",
                      "cleanup": "$CCP_NAMESPACE-primary"
                  }
              },
--- 40,46 ----
          "template": {
              "metadata": {
                  "labels": {
!                     "name": "secondary",
                      "cleanup": "$CCP_NAMESPACE-primary"
                  }
              },
***************
*** 78,84 ****
                              },
                              {
                                  "name": "PGDATA_PATH_OVERRIDE",
!                                 "value": "primary"
                              },
                              {
                                  "name": "PGHOST",
--- 78,84 ----
                              },
                              {
                                  "name": "PGDATA_PATH_OVERRIDE",
!                                 "value": "secondary"
                              },
                              {
                                  "name": "PGHOST",


Starting the primary postgreSQL instance is as simple as running

run.sh

in the primary example directory.

Once you place the secondary.json pod descriptor in the directory it can be started using

expenv -f secondary.json | ${CCP_CLI?} create --namespace=${CCP_NAMESPACE?} -f -

Now we have two pods running PostgreSQL. One on port 5432, and another on port 5433

kubectl get services
NAMETYPECLUSTER-IPEXTERNAL-IPPORT(S)AGE
primaryClusterIP10.108.103.135<none>5432/TCP22h
secondaryClusterIP10.102.100.254<none>5433/TCP13d

Create a user and a database on each pod/instance

Logged in to the Kubernetes host, there are two options for creating users. PostgreSQL has a binary called createuser, but I tend to use psql.

psql -h 10.108.103.135 -U postgres

The password is password.

Execute the following SQL on both PostgreSQL instances.

CREATE USER rep WITH PASSWORD foo;

and

CREATE DATABASE sales OWNER rep;

Note to connect to the second instance the service is exposed on port 5433

psql -h 10.102.100.254 -p 5433 -U postgres

Additionally on the Primary create two simple tables to work with:

psql -h 10.108.103.135 -U rep sales
CREATE TABLE item (
    id serial primary key,
    description text,
    price numeric (8,2)
);

and

CREATE TABLE saleorder (
    id serial primary key,
    item_id int references item(id),
    price numeric(8,2)
);

Now for the Kubernetes unique bits.

Start the primary symmetric-ds pod

First we need some permanent disk space to store the configuration so we create two PersistentVolumes, one for the primary and one for the secondary. This is contained in a file called symmetric-pv.json. Note on my system /nfsfileshare is an NFS mount.

{
    "apiVersion": "v1",
    "kind": "PersistentVolume",
    "metadata": {
        "name": "symmetric-engines",
        "labels": {
            "name": "symmetric-ds"
        }
    },
    "spec": {
        "capacity": {
            "storage": "50M"
        },
        "accessModes": ["ReadWriteMany"],
        "nfs": {
            "path": "/nfsfileshare/symmetric-ds",
            "server": "<NFS_SERVER_IP>"
        },
        "persistentVolumeReclaimPolicy": "Retain"
    }
}
{
    "apiVersion": "v1",
    "kind": "PersistentVolume",
    "metadata": {
        "name": "symmetric-secondary-engines",
        "labels": {
            "name": "symmetric-secondary-ds"
        }
    },
    "spec": {
        "capacity": {
            "storage": "50M"
        },
        "accessModes": ["ReadWriteMany"],
        "nfs": {
            "path": "/nfsfileshare/symmetric-secondary-ds",
            "server": "`<NFS_SERVER_IP>`"
        },
        "persistentVolumeReclaimPolicy": "Retain"
    }
}

Execute:

kubectl create -f symmetric-pv.json

to create the persistent volume.

Next we need to claim the volume. In a file called symmetric-pvc.json add the following:

{
    "kind": "PersistentVolumeClaim",
    "apiVersion": "v1",
    "metadata": {
      "name": "symmetric-engines"
    },
    "spec": {
      "selector": {
        "matchLabels": {
          "name": "symmetric-ds"
        }
      },
      "accessModes": [
        "ReadWriteMany"
      ],
      "resources": {
        "requests": {
          "storage": "50M"
        }
      }
    }
}
{
    "kind": "PersistentVolumeClaim",
    "apiVersion": "v1",
    "metadata": {
      "name": "symmetric-secondary-engines"
    },
    "spec": {
      "selector": {
        "matchLabels": {
          "name": "symmetric-secondary-ds"
        }
      },
      "accessModes": [
        "ReadWriteMany"
      ],
      "resources": {
        "requests": {
          "storage": "50M"
        }
      }
    }
}

Execute kubectl create -f symmetric-pvc.json to create the claims.

kubectl get pv should now show: (I’ve removed some columns for brevity)

NAMEACCESS MODESRECLAIM POLICYSTATUSCLAIM
primary-pgdataRWXRetainBounddemo/primary-pgdata
symmetric-enginesRWXRetainBounddemo/symmetric-engines
symmetric-secondary-enginesRWXRetainBounddemo/symmetric-secondary-engines

In a file called symmetric-primary.json add the following: The important bits are highlighted below

  1. Ports, we expose port 31415 as a service and a container port. The container port allows communication between pods and the service allows us to access the symmetric-ds REST API.
  2. In the actual pod description below the container refers to the latest docker container from symmetric-ds. This is also where we define the mount points.
{
        "kind": "Service",
        "apiVersion": "v1",
        "metadata": {
                "name":"primary-ds"
        },
        "spec": {
            "ports": [{
 "protocol": "TCP",
 "port": 31415,
 "targetPort": 31415,
 "nodePort": 30000
            }],
            "selector": {
                "name": "primary-ds"
            },
            "type": "NodePort",
            "sessionAffinity": "None"
        }
}
{
    "kind": "Pod",
    "apiVersion": "v1",
    "metadata": {
        "name": "primary-ds",
        "namespace": "demo",
        "labels": {
            "name": "primary-ds"
        }
    },
    "spec": {
        "containers": [
            {
                "name": "symmetric-ds",
 "image": "jumpmind/symmetricds:latest",
                "ports": [
                    {
                        "containerPort": 31415,
                        "protocol": "TCP"
                    }
                ],
                "volumeMounts": [
 {
 "mountPath": "/opt/symmetric-ds/engines",
 "name": "engines",
 "readOnly": false
 },
        {
                        "mountPath": "/opt/symmetric-ds/logs",
                        "name": "logs",
                        "readOnly": false
                    }
                ]
            }
        ],
        "volumes": [
            {
                "name": "engines",
                "persistentVolumeClaim": {
                    "claimName": "symmetric-engines"
                }
            },
            {
                "name": "logs",
                "emptyDir": {}
            }
        ]
    }
}

Before we start the pod lets add an engine to the configuration. Recall in the PV descriptor we created a volume in /nfsfileshare/symmetric-ds. The folder symmetric-ds doesn’t exist yet at this stage the user should create the folder manually.

mkdir /nfsfileshare/symmetric-ds

In that new directory we add the following file: headoffice.properties

db.connection.properties=
db.password=foo
rest.api.enable=true
sync.url=http://primary-ds.demo.svc.cluster.local:31415/sync/engsales
group.id=primary
db.init.sql=
registration.url=
db.driver=org.postgresql.Driver
db.user=rep
initial.load.create.first=true
engine.name=engsales
external.id=headoffice
db.validation.query=select 1
cluster.lock.enabled=false
db.url=jdbc\:postgresql\://primary.demo.svc.cluster.local/sales?protocolVersion\=3&stringtype\=unspecified&socketTimeout\=300&tcpKeepAlive\=true

This file is almost exactly the same as a non-Kubernetes deploy except for the IP addresses. In Kubernetes we do not know the IP addresses of the pod before hand but what we do have is a DNS server which we can use to get the IP address of the Symmetric-DS pod primary-ds and the IP address of the PostgreSQL service. See the highlighted sections above.

At this point we can start the pod using:

kubectl create -f symmetric-primary.json

And view the logs using:

kubectl logs primary-ds

You should see:

_____ __ _ ____ _____
/ ___/ __ _____ __ ___ __ ___ _/ /_ ____(_)___ / __ | / ___/
__ \ / / / / _ `_ \/ _ `_ \/ _ \/_ __// __/ / __/ / / / / __ \
___/ // /_/ / // // / // // / __// / / / / / /_ / /_/ / ___/ /
/____/ __ /_//_//_/_//_//_/___/ _/ /_/ /_/__/ /_____/ /____/
/____/
+-----------------------------------------------------------------+
| Copyright (C) 2007-2019 JumpMind, Inc. |
| |
| Licensed under the GNU General Public License version 3. |
| This software comes with ABSOLUTELY NO WARRANTY. |
| See http://www.gnu.org/licenses/gpl.html |
+-----------------------------------------------------------------+
2019-05-06 16:08:21,016 INFO [startup] [SymmetricWebServer] [main] About to start SymmetricDS web server on host:port 0.0.0.0:31415
2019-05-06 16:08:21,973 INFO [sales] [AbstractSymmetricEngine] [symmetric-engine-startup-1] Initializing connection to database
2019-05-06 16:08:22,465 INFO [sales] [JdbcDatabasePlatformFactory] [symmetric-engine-startup-1] Detected database 'PostgreSQL', version '10', protocol 'postgresql'
2019-05-06 16:08:22,499 INFO [sales] [JdbcDatabasePlatformFactory] [symmetric-engine-startup-1] The IDatabasePlatform being used is org.jumpmind.db.platform.postgresql.PostgreSqlDatabasePlatform

Add the symmetric-ds triggers and routes

As per the non-kubernetes deployment we need to add tables to replicate and routes.

For simplicity the relevant tables are:

  • sym_node : Identifies the node and configures things like node id, node group, external id, sync url
  • sym_node_identity : unique identity for this node
  • sym_trigger : specify which tables are being replicated and which router to use
  • sym_router : create a “router” to route tables to synchronize

Connect to the primary

psql -h 10.108.103.135 -U rep sales

Next we have to create a node group link:

INSERT INTO sym_node_group_link (source_node_group_id, target_node_group_id, data_event_action)
VALUES ('primary','primary','P');

A route:

INSERT INTO sym_router
    (router_id,source_node_group_id,target_node_group_id,router_type,router_expression,sync_on_update,sync_on_insert,sync_on_delete,use_source_catalog_schema,create_time,last_update_by,last_update_time)
VALUES
    ('primary_2_primary','primary','primary','default',null,1,1,1,0,current_timestamp,'console',current_timestamp);

Some parameters:

INSERT INTO sym_parameter
    (external_id, node_group_id, param_key, param_value, create_time, last_update_by, last_update_time)
VALUES ('ALL','ALL','push.thread.per.server.count','10',current_timestamp,'console',current_timestamp);
INSERT INTO sym_parameter (external_id, node_group_id, param_key, param_value)
VALUES ('ALL', 'ALL', 'job.pull.period.time.ms', 2000);
INSERT INTO sym_parameter (external_id, node_group_id, param_key, param_value)
VALUES ('ALL', 'ALL', 'job.push.period.time.ms', 2000);

At this point the router is setup, now to add triggers:

INSERT INTO sym_trigger (trigger_id, source_schema_name, source_table_name, channel_id,
sync_on_update, sync_on_insert, sync_on_delete, sync_on_update_condition, sync_on_insert_condition, sync_on_delete_condition, last_update_time, create_time)
VALUES ('public.item', 'public', 'item', 'default', 1, 1, 1, '1=1', '1=1', '1=1', now(), now());

INSERT INTO sym_trigger (trigger_id, source_schema_name, source_table_name, channel_id,  sync_on_update, sync_on_insert, sync_on_delete, sync_on_update_condition,
sync_on_insert_condition, sync_on_delete_condition, last_update_time, create_time)
VALUES ('public.saleorder', 'public', 'saleorder', 'default', 1, 1, 1, '1=1', '1=1', '1=1', now(), now());

INSERT INTO sym_trigger_router (trigger_id, router_id, enabled, initial_load_order, create_time, last_update_time)
VALUES ('public.item', 'primary_2_primary', 1, 10, now(), now());

INSERT INTO sym_trigger_router (trigger_id, router_id, enabled, initial_load_order, create_time,  last_update_time)
VALUES ('public.saleorder', 'primary_2_primary', 1, 10, now(), now());

At this point we sync the symmetric-ds triggers.

kubectl exec -it primary-ds sh
cd /opt/symmetric-ds
bin/symadmin -e engsales sync-triggers
exit

Open registration on the primary

Login to the primary using:

kubectl exec -it primary-ds sh
cd /opt/symmetric-ds
bin/symadmin -e engsales open-registration primary store
exit

On the primary update the sym_node_security using psql

psql -h 10.108.103.135 -U rep sales
UPDATE sym_node_security SET initial_load_enabled=1 WHERE node_id='store';

Start the secondary Symmertic-DS pod

Create a file named symmetric-secondary.json with:

{
        "kind": "Service",
        "apiVersion": "v1",
        "metadata": {
                "name":"secondary-ds"
        },
        "spec": {
            "ports": [{
 "protocol": "TCP",
 "port": 31416,
 "targetPort": 31415,
 "nodePort": 30001
 }],
            "selector": {
                "name": "secondary-ds"
            },
            "type": "NodePort",
            "sessionAffinity": "None"
        }
}
{
    "kind": "Pod",
    "apiVersion": "v1",
    "metadata": {
        "name": "secondary-ds",
        "namespace": "demo",
        "labels": {
            "name": "secondary-ds"
        }
    },
    "spec": {
        "containers": [
            {
                "name": "symmetric-ds",
                "image": "jumpmind/symmetricds:latest",
                "ports": [
                    {
                        "containerPort": 31415,
                        "protocol": "TCP"
                    }
                ],
                "volumeMounts": [
                    {
                        "mountPath": "/opt/symmetric-ds/engines",
                        "name": "engines2",
                        "readOnly": false
                    },
                    {
                        "mountPath": "/opt/symmetric-ds/logs",
                        "name": "logs",
                        "readOnly": false
                    }
                ]
            }
        ],
        "volumes": [
            {
 "name": "engines2",
 "persistentVolumeClaim": {
 "claimName": "symmetric-secondary-engines"
                }
            },
            {
                "name": "logs",
                "emptyDir": {}
            }
        ]
    }
}

Note the directory of the engines for the secondary is /nfsfileshare/symmetric-secondary-ds.

The folder symmetric-secondary-ds doesn’t exist yet at this stage, we need to create the folder manually

mkdir /nfsfileshare/symmetric-secondary-ds

In this directory we require a properties file store.properties to define the engine. The relevant sections are highlighted below:

db.connection.properties=
rest.api.enable=true
db.password=foo
sync.url=http://secondary-ds.demo.svc.cluster.local:31416/sync/engsales
group.id=primary
db.init.sql=
db.driver=org.postgresql.Driver
db.user=rep
engine.name=engsales
external.id=store
registration.url=http://primary-ds.demo.svc.cluster.local:31415/sync/engsales
db.validation.query=select 1
cluster.lock.enabled=false
db.url=jdbc:postgresql://secondary.demo.svc.cluster.local:5433/sales?protocolVersion\=3&stringtype\=unspecified&socketTimeout\=300&tcpKeepAlive\=true

Note the registration URL is the primary-ds the others are the secondary.

Now we can start the pod with

kubectl create -f symmetric-secondary.json

View the logs with

kubectl logs -f secondary-ds

At this point you should see:

_____ __ _ ____ _____
/ ___/ __ _____ __ ___ __ ___ _/ /_ ____(_)___ / __ | / ___/
__ \ / / / / _ `_ \/ _ `_ \/ _ \/_ __// __/ / __/ / / / / __ \
___/ // /_/ / // // / // // / __// / / / / / /_ / /_/ / ___/ /
/____/ __ /_//_//_/_//_//_/___/ _/ /_/ /_/__/ /_____/ /____/
/____/
+-----------------------------------------------------------------+
| Copyright (C) 2007-2019 JumpMind, Inc. |
| |
| Licensed under the GNU General Public License version 3. |
| This software comes with ABSOLUTELY NO WARRANTY. |
| See https://www.gnu.org/licenses/gpl.html |
+-----------------------------------------------------------------+
2019-05-06 16:09:15,802 INFO [startup] [SymmetricWebServer] [main] About to start SymmetricDS web server on host:port 0.0.0.0:31415
2019-05-06 16:09:16,737 INFO [sales] [AbstractSymmetricEngine] [symmetric-engine-startup-1] Initializing connection to database

If everything worked properly on the secondary PostgreSQL instance there should be a number of sym_* tables and the item and saleorder table should have been created as well.

At this point we have two engines running, you can now experiment with changing data on either one and see the result on the other. Note this may take a few seconds to replicate.

Note there are some challenges with sequences, which I have not dealt with yet. For instance: if you add a row to item on the primary and then add a row to item on the secondary the sequences will clash. Again the solution is to alter the sequences so that one is odd and the other is even.

ALTER SEQUENCE item_id_seq RESTART 2;
ALTER SEQUENCE item_id_seq INCREMENT 2;