Deploying Active-Active PostgreSQL on Kubernetes
Kubernetes is a very popular container orchestration framework. I’ll show you how to get Symmetric-DS working on a single Kubernetes instance.
I had previously explored how to build an active-active PostgreSQL cluster using Symmetric-DS. The steps are essentially the same on Kubernetes:
- Start 2 PostgreSQL pods
- Create a user and a database on each pod/instance
- Start the primary symmetric-ds pod
- Add the symmetric-ds triggers and routes
- Open registration
- Start the secondary symmertic-ds pod
However, there are some interesting nuances I discovered while building out this setup, which I will discuss below. Before diving into the details, I'd also like to credit my colleagues Patrick McLaughlin and Simon Nielly as co-authors for helping me to put together this guide.
Anyway, let's explore how to deploy an active-active PostgreSQL cluster on Kubernetes.
Start 2 PostgreSQL pods
We will use the Crunchy Container Suite to create two postgres instances. This demonstration is using container suite version 2.3.1
Note: I use a kubernetes namespace “demo”
kubectl create \
-f /home/davec/cdev/src/github.com/crunchydata/crunchy-containers/conf/demo-namespace.json
kubectl config set-context $(kubectl config current-context) --namespace=demo
In the container suite there are examples, specifically in crunchy-containers/examples/kube/primary
. This provides us with a Kubernetes pod description to create one instance as well as a simple way to start it. Setting this up is beyond the scope of this article, but you can find the details here. In order to create the second instance we need to modify the primary.json
file contained in that directory. Below are the differences. Essentially all that changes are the names of the pods, services and the ports
diff -c primary.json secondary.json
*** primary.json 2019-05-13 18:47:56.934175762 +0000
--- secondary.json 2019-05-13 20:35:11.104767339 +0000
***************
*** 2,10 ****
"kind": "Service",
"apiVersion": "v1",
"metadata": {
! "name": "primary",
"labels": {
! "name": "primary",
"cleanup": "$CCP_NAMESPACE-primary"
}
},
--- 2,10 ----
"kind": "Service",
"apiVersion": "v1",
"metadata": {
! "name": "secondary",
"labels": {
! "name": "secondary",
"cleanup": "$CCP_NAMESPACE-primary"
}
},
***************
*** 12,24 ****
"ports": [
{
"protocol": "TCP",
! "port": 5432,
"targetPort": 5432,
"nodePort": 0
}
],
"selector": {
! "name": "primary"
},
"type": "ClusterIP",
"sessionAffinity": "None"
--- 12,24 ----
"ports": [
{
"protocol": "TCP",
! "port": 5433,
"targetPort": 5432,
"nodePort": 0
}
],
"selector": {
! "name": "secondary"
},
"type": "ClusterIP",
"sessionAffinity": "None"
***************
*** 29,37 ****
"kind": "Deployment",
"apiVersion": "extensions/v1beta1",
"metadata": {
! "name": "primary",
"labels": {
! "name": "primary",
"cleanup": "$CCP_NAMESPACE-primary"
}
},
--- 29,37 ----
"kind": "Deployment",
"apiVersion": "extensions/v1beta1",
"metadata": {
! "name": "secondary",
"labels": {
! "name": "secondary",
"cleanup": "$CCP_NAMESPACE-primary"
}
},
***************
*** 40,46 ****
"template": {
"metadata": {
"labels": {
! "name": "primary",
"cleanup": "$CCP_NAMESPACE-primary"
}
},
--- 40,46 ----
"template": {
"metadata": {
"labels": {
! "name": "secondary",
"cleanup": "$CCP_NAMESPACE-primary"
}
},
***************
*** 78,84 ****
},
{
"name": "PGDATA_PATH_OVERRIDE",
! "value": "primary"
},
{
"name": "PGHOST",
--- 78,84 ----
},
{
"name": "PGDATA_PATH_OVERRIDE",
! "value": "secondary"
},
{
"name": "PGHOST",
Starting the primary postgreSQL instance is as simple as running
run.sh
in the primary example directory.
Once you place the secondary.json pod descriptor in the directory it can be started using
expenv -f secondary.json | ${CCP_CLI?} create --namespace=${CCP_NAMESPACE?} -f -
Now we have two pods running PostgreSQL. One on port 5432, and another on port 5433
kubectl get services
NAME | TYPE | CLUSTER-IP | EXTERNAL-IP | PORT(S) | AGE |
---|---|---|---|---|---|
primary | ClusterIP | 10.108.103.135 | <none> | 5432/TCP | 22h |
secondary | ClusterIP | 10.102.100.254 | <none> | 5433/TCP | 13d |
Create a user and a database on each pod/instance
Logged in to the Kubernetes host, there are two options for creating users. PostgreSQL has a binary called createuser, but I tend to use psql.
psql -h 10.108.103.135 -U postgres
The password is password.
Execute the following SQL on both PostgreSQL instances.
CREATE USER rep WITH PASSWORD foo;
and
CREATE DATABASE sales OWNER rep;
Note to connect to the second instance the service is exposed on port 5433
psql -h 10.102.100.254 -p 5433 -U postgres
Additionally on the Primary create two simple tables to work with:
psql -h 10.108.103.135 -U rep sales
CREATE TABLE item (
id serial primary key,
description text,
price numeric (8,2)
);
and
CREATE TABLE saleorder (
id serial primary key,
item_id int references item(id),
price numeric(8,2)
);
Now for the Kubernetes unique bits.
Start the primary symmetric-ds pod
First we need some permanent disk space to store the configuration so we create two PersistentVolumes, one for the primary and one for the secondary. This is contained in a file called symmetric-pv.json
. Note on my system /nfsfileshare
is an NFS mount.
{
"apiVersion": "v1",
"kind": "PersistentVolume",
"metadata": {
"name": "symmetric-engines",
"labels": {
"name": "symmetric-ds"
}
},
"spec": {
"capacity": {
"storage": "50M"
},
"accessModes": ["ReadWriteMany"],
"nfs": {
"path": "/nfsfileshare/symmetric-ds",
"server": "<NFS_SERVER_IP>"
},
"persistentVolumeReclaimPolicy": "Retain"
}
}
{
"apiVersion": "v1",
"kind": "PersistentVolume",
"metadata": {
"name": "symmetric-secondary-engines",
"labels": {
"name": "symmetric-secondary-ds"
}
},
"spec": {
"capacity": {
"storage": "50M"
},
"accessModes": ["ReadWriteMany"],
"nfs": {
"path": "/nfsfileshare/symmetric-secondary-ds",
"server": "`<NFS_SERVER_IP>`"
},
"persistentVolumeReclaimPolicy": "Retain"
}
}
Execute:
kubectl create -f symmetric-pv.json
to create the persistent volume.
Next we need to claim the volume. In a file called symmetric-pvc.json
add the following:
{
"kind": "PersistentVolumeClaim",
"apiVersion": "v1",
"metadata": {
"name": "symmetric-engines"
},
"spec": {
"selector": {
"matchLabels": {
"name": "symmetric-ds"
}
},
"accessModes": [
"ReadWriteMany"
],
"resources": {
"requests": {
"storage": "50M"
}
}
}
}
{
"kind": "PersistentVolumeClaim",
"apiVersion": "v1",
"metadata": {
"name": "symmetric-secondary-engines"
},
"spec": {
"selector": {
"matchLabels": {
"name": "symmetric-secondary-ds"
}
},
"accessModes": [
"ReadWriteMany"
],
"resources": {
"requests": {
"storage": "50M"
}
}
}
}
Execute kubectl create -f symmetric-pvc.json
to create the claims.
kubectl get pv
should now show: (I’ve removed some columns for brevity)
NAME | ACCESS MODES | RECLAIM POLICY | STATUS | CLAIM |
---|---|---|---|---|
primary-pgdata | RWX | Retain | Bound | demo/primary-pgdata |
symmetric-engines | RWX | Retain | Bound | demo/symmetric-engines |
symmetric-secondary-engines | RWX | Retain | Bound | demo/symmetric-secondary-engines |
In a file called symmetric-primary.json add the following: The important bits are highlighted below
- Ports, we expose port 31415 as a service and a container port. The container port allows communication between pods and the service allows us to access the symmetric-ds REST API.
- In the actual pod description below the container refers to the latest docker container from symmetric-ds. This is also where we define the mount points.
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name":"primary-ds"
},
"spec": {
"ports": [{
"protocol": "TCP",
"port": 31415,
"targetPort": 31415,
"nodePort": 30000
}],
"selector": {
"name": "primary-ds"
},
"type": "NodePort",
"sessionAffinity": "None"
}
}
{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "primary-ds",
"namespace": "demo",
"labels": {
"name": "primary-ds"
}
},
"spec": {
"containers": [
{
"name": "symmetric-ds",
"image": "jumpmind/symmetricds:latest",
"ports": [
{
"containerPort": 31415,
"protocol": "TCP"
}
],
"volumeMounts": [
{
"mountPath": "/opt/symmetric-ds/engines",
"name": "engines",
"readOnly": false
},
{
"mountPath": "/opt/symmetric-ds/logs",
"name": "logs",
"readOnly": false
}
]
}
],
"volumes": [
{
"name": "engines",
"persistentVolumeClaim": {
"claimName": "symmetric-engines"
}
},
{
"name": "logs",
"emptyDir": {}
}
]
}
}
Before we start the pod lets add an engine to the configuration. Recall in the PV descriptor we created a volume in /nfsfileshare/symmetric-ds
. The folder symmetric-ds doesn’t exist yet at this stage the user should create the folder manually.
mkdir /nfsfileshare/symmetric-ds
In that new directory we add the following file: headoffice.properties
db.connection.properties=
db.password=foo
rest.api.enable=true
sync.url=http://primary-ds.demo.svc.cluster.local:31415/sync/engsales
group.id=primary
db.init.sql=
registration.url=
db.driver=org.postgresql.Driver
db.user=rep
initial.load.create.first=true
engine.name=engsales
external.id=headoffice
db.validation.query=select 1
cluster.lock.enabled=false
db.url=jdbc\:postgresql\://primary.demo.svc.cluster.local/sales?protocolVersion\=3&stringtype\=unspecified&socketTimeout\=300&tcpKeepAlive\=true
This file is almost exactly the same as a non-Kubernetes deploy except for the IP addresses. In Kubernetes we do not know the IP addresses of the pod before hand but what we do have is a DNS server which we can use to get the IP address of the Symmetric-DS pod primary-ds and the IP address of the PostgreSQL service. See the highlighted sections above.
At this point we can start the pod using:
kubectl create -f symmetric-primary.json
And view the logs using:
kubectl logs primary-ds
You should see:
_____ __ _ ____ _____
/ ___/ __ _____ __ ___ __ ___ _/ /_ ____(_)___ / __ | / ___/
__ \ / / / / _ `_ \/ _ `_ \/ _ \/_ __// __/ / __/ / / / / __ \
___/ // /_/ / // // / // // / __// / / / / / /_ / /_/ / ___/ /
/____/ __ /_//_//_/_//_//_/___/ _/ /_/ /_/__/ /_____/ /____/
/____/
+-----------------------------------------------------------------+
| Copyright (C) 2007-2019 JumpMind, Inc. |
| |
| Licensed under the GNU General Public License version 3. |
| This software comes with ABSOLUTELY NO WARRANTY. |
| See http://www.gnu.org/licenses/gpl.html |
+-----------------------------------------------------------------+
2019-05-06 16:08:21,016 INFO [startup] [SymmetricWebServer] [main] About to start SymmetricDS web server on host:port 0.0.0.0:31415
2019-05-06 16:08:21,973 INFO [sales] [AbstractSymmetricEngine] [symmetric-engine-startup-1] Initializing connection to database
2019-05-06 16:08:22,465 INFO [sales] [JdbcDatabasePlatformFactory] [symmetric-engine-startup-1] Detected database 'PostgreSQL', version '10', protocol 'postgresql'
2019-05-06 16:08:22,499 INFO [sales] [JdbcDatabasePlatformFactory] [symmetric-engine-startup-1] The IDatabasePlatform being used is org.jumpmind.db.platform.postgresql.PostgreSqlDatabasePlatform
Add the symmetric-ds triggers and routes
As per the non-kubernetes deployment we need to add tables to replicate and routes.
For simplicity the relevant tables are:
sym_node
: Identifies the node and configures things like node id, node group, external id, sync urlsym_node_identity
: unique identity for this nodesym_trigger
: specify which tables are being replicated and which router to usesym_router
: create a “router” to route tables to synchronize
Connect to the primary
psql -h 10.108.103.135 -U rep sales
Next we have to create a node group link:
INSERT INTO sym_node_group_link (source_node_group_id, target_node_group_id, data_event_action)
VALUES ('primary','primary','P');
A route:
INSERT INTO sym_router
(router_id,source_node_group_id,target_node_group_id,router_type,router_expression,sync_on_update,sync_on_insert,sync_on_delete,use_source_catalog_schema,create_time,last_update_by,last_update_time)
VALUES
('primary_2_primary','primary','primary','default',null,1,1,1,0,current_timestamp,'console',current_timestamp);
Some parameters:
INSERT INTO sym_parameter
(external_id, node_group_id, param_key, param_value, create_time, last_update_by, last_update_time)
VALUES ('ALL','ALL','push.thread.per.server.count','10',current_timestamp,'console',current_timestamp);
INSERT INTO sym_parameter (external_id, node_group_id, param_key, param_value)
VALUES ('ALL', 'ALL', 'job.pull.period.time.ms', 2000);
INSERT INTO sym_parameter (external_id, node_group_id, param_key, param_value)
VALUES ('ALL', 'ALL', 'job.push.period.time.ms', 2000);
At this point the router is setup, now to add triggers:
INSERT INTO sym_trigger (trigger_id, source_schema_name, source_table_name, channel_id,
sync_on_update, sync_on_insert, sync_on_delete, sync_on_update_condition, sync_on_insert_condition, sync_on_delete_condition, last_update_time, create_time)
VALUES ('public.item', 'public', 'item', 'default', 1, 1, 1, '1=1', '1=1', '1=1', now(), now());
INSERT INTO sym_trigger (trigger_id, source_schema_name, source_table_name, channel_id, sync_on_update, sync_on_insert, sync_on_delete, sync_on_update_condition,
sync_on_insert_condition, sync_on_delete_condition, last_update_time, create_time)
VALUES ('public.saleorder', 'public', 'saleorder', 'default', 1, 1, 1, '1=1', '1=1', '1=1', now(), now());
INSERT INTO sym_trigger_router (trigger_id, router_id, enabled, initial_load_order, create_time, last_update_time)
VALUES ('public.item', 'primary_2_primary', 1, 10, now(), now());
INSERT INTO sym_trigger_router (trigger_id, router_id, enabled, initial_load_order, create_time, last_update_time)
VALUES ('public.saleorder', 'primary_2_primary', 1, 10, now(), now());
At this point we sync the symmetric-ds triggers.
kubectl exec -it primary-ds sh
cd /opt/symmetric-ds
bin/symadmin -e engsales sync-triggers
exit
Open registration on the primary
Login to the primary using:
kubectl exec -it primary-ds sh
cd /opt/symmetric-ds
bin/symadmin -e engsales open-registration primary store
exit
On the primary update the sym_node_security using psql
psql -h 10.108.103.135 -U rep sales
UPDATE sym_node_security SET initial_load_enabled=1 WHERE node_id='store';
Start the secondary Symmertic-DS pod
Create a file named symmetric-secondary.json with:
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name":"secondary-ds"
},
"spec": {
"ports": [{
"protocol": "TCP",
"port": 31416,
"targetPort": 31415,
"nodePort": 30001
}],
"selector": {
"name": "secondary-ds"
},
"type": "NodePort",
"sessionAffinity": "None"
}
}
{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "secondary-ds",
"namespace": "demo",
"labels": {
"name": "secondary-ds"
}
},
"spec": {
"containers": [
{
"name": "symmetric-ds",
"image": "jumpmind/symmetricds:latest",
"ports": [
{
"containerPort": 31415,
"protocol": "TCP"
}
],
"volumeMounts": [
{
"mountPath": "/opt/symmetric-ds/engines",
"name": "engines2",
"readOnly": false
},
{
"mountPath": "/opt/symmetric-ds/logs",
"name": "logs",
"readOnly": false
}
]
}
],
"volumes": [
{
"name": "engines2",
"persistentVolumeClaim": {
"claimName": "symmetric-secondary-engines"
}
},
{
"name": "logs",
"emptyDir": {}
}
]
}
}
Note the directory of the engines for the secondary is /nfsfileshare/symmetric-secondary-ds.
The folder symmetric-secondary-ds doesn’t exist yet at this stage, we need to create the folder manually
mkdir /nfsfileshare/symmetric-secondary-ds
In this directory we require a properties file store.properties to define the engine. The relevant sections are highlighted below:
db.connection.properties=
rest.api.enable=true
db.password=foo
sync.url=http://secondary-ds.demo.svc.cluster.local:31416/sync/engsales
group.id=primary
db.init.sql=
db.driver=org.postgresql.Driver
db.user=rep
engine.name=engsales
external.id=store
registration.url=http://primary-ds.demo.svc.cluster.local:31415/sync/engsales
db.validation.query=select 1
cluster.lock.enabled=false
db.url=jdbc:postgresql://secondary.demo.svc.cluster.local:5433/sales?protocolVersion\=3&stringtype\=unspecified&socketTimeout\=300&tcpKeepAlive\=true
Note the registration URL is the primary-ds the others are the secondary.
Now we can start the pod with
kubectl create -f symmetric-secondary.json
View the logs with
kubectl logs -f secondary-ds
At this point you should see:
_____ __ _ ____ _____
/ ___/ __ _____ __ ___ __ ___ _/ /_ ____(_)___ / __ | / ___/
__ \ / / / / _ `_ \/ _ `_ \/ _ \/_ __// __/ / __/ / / / / __ \
___/ // /_/ / // // / // // / __// / / / / / /_ / /_/ / ___/ /
/____/ __ /_//_//_/_//_//_/___/ _/ /_/ /_/__/ /_____/ /____/
/____/
+-----------------------------------------------------------------+
| Copyright (C) 2007-2019 JumpMind, Inc. |
| |
| Licensed under the GNU General Public License version 3. |
| This software comes with ABSOLUTELY NO WARRANTY. |
| See https://www.gnu.org/licenses/gpl.html |
+-----------------------------------------------------------------+
2019-05-06 16:09:15,802 INFO [startup] [SymmetricWebServer] [main] About to start SymmetricDS web server on host:port 0.0.0.0:31415
2019-05-06 16:09:16,737 INFO [sales] [AbstractSymmetricEngine] [symmetric-engine-startup-1] Initializing connection to database
If everything worked properly on the secondary PostgreSQL instance there should be a number of sym_* tables and the item and saleorder table should have been created as well.
At this point we have two engines running, you can now experiment with changing data on either one and see the result on the other. Note this may take a few seconds to replicate.
Note there are some challenges with sequences, which I have not dealt with yet. For instance: if you add a row to item on the primary and then add a row to item on the secondary the sequences will clash. Again the solution is to alter the sequences so that one is odd and the other is even.
ALTER SEQUENCE item_id_seq RESTART 2;
ALTER SEQUENCE item_id_seq INCREMENT 2;
Related Articles
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read
- Smarter Postgres LLM with Retrieval Augmented Generation
6 min read
- Postgres Partitioning with a Default Partition
16 min read
- Iceberg ahead! Analyzing Shipping Data in Postgres
8 min read