Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Upgrading PostgreSQL from 9.4 to 10.3 with pglogical

Avatar for Douglas Hunley

Douglas Hunley

5 min read

I recently helped a customer upgrade a PostgreSQL instance from 9.4 on RHEL to 10.x on Ubuntu. While it initially sounded daunting, the use of pglogical and some planning actually made it rather straightforward. While there’s nothing new or original in this post, I still felt compelled to write it up for anyone else that might find the info useful as an example in their own endeavors.

pglogical is a logical replication system implemented entirely as a PostgreSQL extension. Fully integrated, it requires no triggers or external programs. This makes it faster than Slony, Londiste, et al. It is also (roughly) the basis upon which logical replication in PostgreSQL 10 core is built.

Installing pglogical

It will need to be installed on both the source (old Pg version) and destination servers (new Pg version).

Configuring pglogical

Tweaking the Cluster Configuration

You will need to adjust the postgresql.conf file to accommodate pglogical. On both the source and destination servers, execute the following commands:

echo "include 'pglogical.conf'" >> $PGDATA/postgresql.conf
echo "wal_level = 'logical'" >> $PGDATA/pglogical.conf
echo "max_worker_processes = 10" >> $PGDATA/pglogical.conf
echo "max_replication_slots = 10" >> $PGDATA/pglogical.conf
echo "max_wal_senders = 10" >> $PGDATA/pglogical.conf
echo "shared_preload_libraries = 'pglogical'" >> $PGDATA/pglogical.conf

NOTE: If you already have one or more values in shared_preload_libraries, simply append pglogical to the list of values already there.

Ensure the Presence of Primary Keys

Logical replication doesn’t work without primary keys. The below SQL commands will help you identify which schema/table pairs do not have a primary key:

SELECT
  n.nspname as schema,
  c.relname as table
FROM
  pg_class c
JOIN
  pg_namespace n
ON n.oid = c.relnamespace
WHERE
  c.relkind = 'r'
AND NOT EXISTS (
  SELECT 1
  FROM pg_constraint con
  WHERE con.conrelid = c.oid
  AND con.contype = 'p'
)
AND n.nspname <> ALL (
  ARRAY [
    'pg_catalog',
    'sys',
    'dbo',
    'information_schema'
  ]
);

Create the pglogical Extension

On both the source and destination Pg instances, create the pglogical extension in every database you wish to replicate by running the following command as a database superuser:

CREATE EXTENSION pglogical;

NOTE: On PostgreSQL 9.4 only you will first need to CREATE EXTENSION pglogical_origin;

Running pglogical

Ensure Global Objects Are Copied

The pglogical tool runs at the database level which means that global objects like roles are not copied. Therefore, you need to ensure these objects are created yourself.

On the source PostgreSQL server:

pg_dumpall -g -f globals.sql

Then copy globals.sql to the destination server and run:

psql -f globals.sql

Prepare the Destination Schema

At this time, pglogical does not replicate DDL, so it is necessary to ensure that both the source and destination have matching schema object definitions before attempting to replicate.

As such, for each source database that you want to replicate, you need to run a ‘schema only’ dump:

pg_dump -Fc -s -f dbname_schema.dmp dbname

Now copy the dbname_schema.dmp file(s) to the destination server, and run for each database:

pg_restore -d dbname dbname_schema.dmp

Create a Replication User

We’ll need a user that has the replication permission for this all to work. create the following user on both the source and destination PostgreSQL instances:

CREATE ROLE pglogical LOGIN REPLICATION SUPERUSER ENCRYPTED PASSWORD 'secret';

Tweak the pg_hba.conf on both the source and destination Pg instances to allow the replication user to connect:

local  replication  pglogical  trust
host   replication  pglogical  0.0.0.0/0  md5
local  dbname       pglogical  trust
host   dbname       pglogical  0.0.0.0/0  md5

NOTE: Make sure to edit 0.0.0.0/0 to match your actual CIDR or IP address and dbname to match the db you wish to replicate.

Create your Publication

Now, we’re ready to actually setup and start the replication. First, we need to SIGHUP the postmaster so it sees all the config changes we made on both the source and target Pg instances:

pg_ctl -D $(ps -efw|grep -- "[p]ost.*-D"|cut -d\- -f2|cut -d" " -f2) reload

On the source Pg instance, we need to create a publication to "push" the data to the new instance:

SELECT pglogical.create_node(node_name := 'dbname_provider', dsn := 'host=127.0.0.1 port=5432 dbname=test user=pglogical');

Adjust the port= and dbname= parameters to match your source Pg instance. If replicating more than one database, repeat this command for each database, changing dbname and dbname_provider accordingly.

Add Your Tables to the Publication

Now that we have a publication channel, we need content to publish. Let’s add that now:

  1. Add all of your tables:

    SELECT pglogical.replication_set_add_all_tables('default', '{public}'::text[]);
    
  2. Add all of your sequences:

    SELECT pglogical.replication_set_add_all_sequences(set_name := 'default', schema_names := '{public}'::text[], synchronize_data := true )
    

You should change public in both the above if you are using a different schema for your objects. If you are using multiple schemas, simply repeat the above and change public appropriately.

NOTE: The nextval of sequences will be synced roughly every 60 to 70 seconds.

Create your Subscription

Now that we have a publication channel and its content defined, we need to setup a subscriber on the target PostgreSQL instance to consume the channel:

SELECT pglogical.create_node(node_name := 'subscriber', dsn := 'host=127.0.0.1 port=5432 dbname=test user=pglogical');

Adjust the dbname= parameter to match your target Pg instance. If replicating more than one database, repeat this command for each database.

Now, tell the subscriber what to subscribe to:

SELECT pglogical.create_subscription(subscription_name := 'subscription', provider_dsn := 'host=172.28.173.18 port=5432 dbname=test user=pglogical', replication_sets := '{default}'::text[] );

Adjust host=, port=, and dbname= parameters to match your source PostgreSQL instance. If replicating more than one database, repeat this command for each database, changing dbname and subscription_name accordingly.

Conclusion

At this point, data should be replicating and (if not already) it will catch up to ‘current’ quickly. Once caught up, replication will maintain sync between the source and target instances in almost real time. You can easily determine the current state of pglogical by issuing this SQL on the subscriber:

SELECT subscription_name, status FROM pglogical.show_subscription_status();

If the query returns initializing then it is copying the original source data to the destination. If the query returns replicating then the initial synchronization has completed and replicating is now happening in real time as data changes.

When ready, you can simply stop any applications pointing at the source Pg instance, wait a few minutes to ensure replication drains any outstanding items, force an update of your sequences:

SELECT pglogical.synchronize_sequence( seqoid ) FROM pglogical.sequence_state;

and then re-point your applications at the target instance.

Post-upgrade, if you wish to clean everything up, you can use the following steps:

  1. Remove the subscription:

    SELECT pglogical.synchronize_sequence( seqoid ) FROM pglogical.sequence_state;
    
  2. Remove the subscriber:

    SELECT pglogical.drop_node('subscriber', true);
    
  3. Remove the extension:

    DROP EXTENSION pglogical CASCADE;
    
  4. Remove the user:

    DROP ROLE pglogical;
    
  5. Remove any pglogical lines in pg_hba.conf

  6. Remove $PGDATA/pglogical.conf

  7. Reload PostgreSQL

  8. Remove the OS packages using yum or apt

The original copy of this article can be found at https://hunleyd.github.io/posts/upgrading-postgresql-from-9.4-to-10.3-with-pglogical/