Multi-Cloud Strategies with Crunchy Postgres for Kubernetes
Crunchy Postgres for Kubernetes can be used for cross-datacenter streaming replication out of the box. With so many folks asking for cross-cloud / cross-datacenter replication, we wanted to give people a large explanation of how that works. For this post, we use streaming replication, and prioritize reducing latency and adding stability.
Cross-cloud streaming replication can be used:
- To enable multi-cloud disaster recovery
- For moving clusters between cloud providers
- For moving clusters between on-premises and cloud
Given the power of this feature, we decided to incorporate streaming replication directly into PGO. With the 5.2 release this is easily configurable through the postgrescluster
spec without the need for manual Postgres configuration to set up the streaming replication.
Setup Cloud Environments
In this sample scenario, we will create postgresclusters
in both EKS and GKE clouds. EKS will be used as our primary environment, and GKE will be a standby. PGO will need to be deployed in EKS and GKE to create postgresclusters
in both environments.
The standby database needs to connect directly to the primary database over the network. This means the primary environment (EKS) needs to be able to create services with an external IP. In this example, we are using the LoadBalancer
service type, which is easily configurable through the postgrescluster
spec.
Both postgresclusters
will need copies of the same TLS certificates to allow replication. Please look at the custom TLS section of our docs for guidance on creating custom cert secrets in the format that PGO expects. This will need to be done in both environments. In this example, we have copies of the cluster-cert
and replication-cert
secrets in both Kubernetes environments.
Create Clusters
Now that our cloud environments are configured, we can create the primary and standby clusters. First, we will create the primary cluster and allow it to startup. Then we will have to take note of the external IP that is created for the primary service on the cluster. After we have the IP, we can create our standby cluster.
Primary
For the primary, we create a postgrescluster
with the following spec. We have defined the custom TLS certs that we created in both environments. We also specified that the service that exposes the PostgreSQL primary instance should have the type LoadBalancer
.
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: primary
namespace: postgres-operator
spec:
service:
type: LoadBalancer
postgresVersion: 14
customTLSSecret:
name: cluster-cert
customReplicationTLSSecret:
name: replication-cert
instances:
- name: instance1
replicas: 1
dataVolumeClaimSpec:
{
accessModes: [ReadWriteOnce],
resources: { requests: { storage: 1Gi } },
}
backups:
pgbackrest:
repos:
- name: repo1
volume:
volumeClaimSpec:
{
accessModes: [ReadWriteOnce],
resources: { requests: { storage: 1Gi } },
}
After you create a postgrescluster
with this spec, wait for an initial backup and the cluster to be ready. After that, your primary should be ready, and you can start setting up the standby. Before you switch to the GKE cluster, you will need the external IP from the primary-ha
service.
$ k get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
primary-ha LoadBalancer 10.100.4.48 a078e7d173f214d9ca0e7d122052aa5a-1097707392.us-east-1.elb.amazonaws.com 5432:30985/TCP
Standby
Now that we have the primary cluster, we can create our standby. Here we are using the spec.standby
fields in the PostgresCluster
spec. When filling out the standby spec, we have a few options. You can provide a host
, a repoName
, or both. In this scenario, we are using streaming replication and will need to provide a host
. The host
in the spec below is the external IP we copied from the primary-ha service.
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: standby
namespace: postgres-operator
spec:
standby:
enabled: true
host: a2eb494c1f05a414dafb62743c790ba1-2010516878.us-east-1.elb.amazonaws.com
postgresVersion: 14
customTLSSecret:
name: cluster-cert
customReplicationTLSSecret:
name: replication-cert
instances:
- name: instance1
replicas: 1
dataVolumeClaimSpec:
{
accessModes: [ReadWriteOnce],
resources: { requests: { storage: 1Gi } },
}
backups:
pgbackrest:
repos:
- name: repo1
volume:
volumeClaimSpec:
{
accessModes: [ReadWriteOnce],
resources: { requests: { storage: 1Gi } },
}
The standby cluster will look slightly different from the primary. You can expect the standby to have instance pods (one for every replica defined in the spec) and a repo-host pod. One thing you will not see is an initial backup on the cluster.
Verify Streaming Replication
Now that we have a standby using streaming replication, it is a good time to check that replication is configured correctly and working as expected. The first thing you should check is that any data you create is replicated over to the standby. The time this takes will depend on network latency and the size of the data. If you see data from your primary database, streaming replication is active, you are good to go.
If you have exec privileges in your Kubernetes cluster, there are a few commands you can use to verify data replication and streaming. In the following two commands, we exec into the standby database, check that the walreciever
process is running, and check that we have a streaming status in pg_stat_wal_receiver
.
$ kubectl exec -it standby-instance1-bkbl-0 -c database -- bash
bash-4.4$ ps -U postgres -x | grep walreceiver
95 ? Ss 0:10 postgres: standby-ha: walreceiver streaming 0/A000000bash-4.4$ psql -c "select pid,status,sender_host from pg_stat_wal_receiver;"
bash-4.4$ psql -c "select pid,status,sender_host from pg_stat_wal_receiver;"
pid | status | sender_host
-----+-----------+-------------------------------------------------------------------------
95 | streaming | a078e7d173f214d9ca0e7d122052aa5a-1097707392.us-east-1.elb.amazonaws.com
(1 row
Promote the standby
Now that you can see your data being replicated from the primary to the standby, you are ready to promote the standby in a disaster scenario. This is done by updating the spec of the standby cluster so that standby.enabled
is false or removing the standby section entirely.
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: standby
spec:
standby:
enabled: true
After you promote the standby, it will work as a fully functioning postgrescluster
that you can backup, scale and use as you would expect. You can also use the new primary to create another standby cluster!
Conclusion
If you've been looking for a solution for streaming replication, you may have come across Brian Pace's article earlier this year on Streaming Replication using pgBackRest. I'm excited that with PGO 5.2., this is even easier to setup. Streaming replication adds another tool into our operator to allow customers to find the disaster recovery solutions to meet their needs.
Related Articles
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read
- Smarter Postgres LLM with Retrieval Augmented Generation
6 min read
- Postgres Partitioning with a Default Partition
16 min read
- Iceberg ahead! Analyzing Shipping Data in Postgres
8 min read