How To Get Started with pgBackRest and PostgreSQL 12
pgBackRest is a reliable and simple to configure backup and restore solution for PostgreSQL, which provides a powerful solution for any PostgreSQL database; be it a small project, or scaled up to enterprise-level use cases.
Many powerful features are included in pgBackRest, including parallel backup and restore, local or remote operation, full, incremental, and differential backup types, backup rotation, archive expiration, backup integrity, page checksums, backup resume, streaming compression and checksums, delta restore, and much more.
With the recent release of PostgreSQL 12 (and more recently 12.1), pgBackRest also received a number of updates and changes to take advantage of the latest features of Postgres.
On October 1st 2019, pgBackRest released version 2.18 which is the first release of pgBackRest to support PostgreSQL 12. As such, any deployment using PostgreSQL 12 where pgBackRest will be used requires version 2.18 or greater. At the time of this post, the latest version of pgBackRest is version 2.19.
In the following guide, we will explore the steps involved in configuring pgBackRest on a PostgreSQL 12 database, followed by simulating a disaster where the database files have been destroyed, and restoring a backup to regain the database.
Setting up the Demo
If following this guide for tutorial purposes, it is useful to have a similar working environment to ensure that the same settings, commands, and processes are used while performing the exercise.
For this example, we will be performing the install and configuration of PostgreSQL 12.1 and pgBackRest 2.19 on CentOS 7.
For simplicity, Hashicorp's Vagrant to start and manage the Official CentOS 7 Vagrant Box is recommended.
If using Vagrant, simply run:
cd temp_work_dir
vagrant init centos/7
vagrant up
vagrant ssh
Alternately, you can set up and configure a virtual machine manually, using the following
settings:
- 12 GB single partition ext4 disk
- 2 GB RAM
- 2 CPU
Install PostgreSQL and pgBackRest
There are several methods for installing PostgreSQL on a CentOS 7 server which
are detailed on the PostgreSQL wiki.
For the purposes of this guide, we will install both PostgreSQL 12 and the latest
version of pgBackRest using the PGDG Yum repository.
At the time of this writing, the latest versions of PostgreSQL was version 12.1 and
pgBackRest was version 2.19.
sudo yum -y install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sudo yum -y install postgresql12-server postgresql12-contrib pgbackrest
Next, initialize the PostgreSQL instance with the following commands:
sudo /usr/pgsql-12/bin/postgresql-12-setup initdb
sudo systemctl enable postgresql-12.service
sudo systemctl start postgresql-12.service
The first command is only necessary to run once, and is responsible for
initializing the database in $PGDATA
directory.
The second command ensures PostgreSQL will start automatically when the
operating system is started, and the third command will start the database at
the present time.
To verify PostgreSQL has started correctly, the following command will confirm:
sudo -iu postgres
psql --version psql (PostgreSQL) 12.1
Next, verify pgBackRest was installed correctly by running the following command
either as the default user, or as the postgres
user:
$ pgbackrest
pgBackRest 2.19 - General help
Usage:
pgbackrest [options] [command]
Commands:
archive-get Get a WAL segment from the archive.
archive-push Push a WAL segment to the archive.
backup Backup a database cluster.
check Check the configuration.
expire Expire backups that exceed retention.
help Get help.
info Retrieve information about backups.
restore Restore a database cluster.
stanza-create Create the required stanza data.
stanza-delete Delete a stanza.
stanza-upgrade Upgrade a stanza.
start Allow pgBackRest processes to run.
stop Stop pgBackRest processes from running.
version Get version.
Use 'pgbackrest help [command]' for more information.
Ensuring that the basic commands return valid responses and expected versions
is generally a good practice to follow, as it ensures that the software is functioning
properly and that the versions are compatible with one another.
Configure PostgreSQL
Depending on your specific use case, you may need to configure PostgreSQL's
options to meet your specific environment needs.
Possible considerations would be:
- Using replication or configuring a cluster
- How your backups will be stored (using the 3-2-1 Method?)
- Security concerns for your database
For the purposes of this exercise, we will be following a very simplistic model
which will do best in demonstrating the process, but is not ideal for production
environments.
If you wish to configure your environment to have replica PostgreSQL instances,
for example, further information may be found in these Crunchy Blog Posts.
For the simple purposes of this demonstration, we will configure PostgreSQL
with the least number of changes to make a minimum working model.
If you’re unaware of where your configuration files are on the PostgreSQL host, you can run the following:
$ sudo -iu postgres psql -U postgres -c 'SHOW config_file'
config_file
----------------------------------------
/var/lib/pgsql/12/data/postgresql.conf
(1 row)
Edit the
postgresql.conf
file with root privileges in your preferred text editor.
The following parameters will need to be defined:
listen_addresses = '*'
# Optionally, define the address as the host IP:
listen_addresses = '10.0.1.1'
password_encryption=’scram-sha-256’
archive_mode = on
A brief aside about configured PostgreSQL settings
It is always best practice to have an understanding of the configuration changes made to a database.
For the purposes of this demonstration, three options were changed from their defaults, and it is important
to know why this has happened.
listen_addresses
:
While PostgreSQL's pg_hba.conf
is the file responsible for restricting
connections, when listen_addresses
is set to *
(wildcard), it is possible
to discover the open port on 5432
using nmap
and learn the database
exists, thereby possibly opening the server up for an exploit. Setting it
to the an IP address prevents PostgreSQL from listening on an unintended
interface, preventing this potential exploit. More information on this
specific attack vector and how to avoid it can be found in this blog
post.
password_encryption
:
Starting with the release of PostgreSQL 10, SCRAM-SHA-256 authentication
was available for use, and
is used in this example for the explicit purpose of encouraging secure connections to the database.
Specifically, this method of authentication prevents password sniffing on
untrusted connections and offers support for cryptographically hashing
passwords on the server in a secure manner.
More detailed information on this authentication method can be found
here.
archive_mode
:
Another addition to recent versions of PostgreSQL, starting with the
major release for PostgreSQL 10, a change was introduced to reduce the
number of configuration edits that were necessary to perform streaming
backup and replication (specifically affecting the parameters wal_level
,max_wal_senders
, max_replication_slots
, and hot_standby
- these are now
all set by default). The release notes regarding this change can be found
here.
Once the changes have been made to the PostgreSQL configuration file, restart
the service to allow the changes to take effect.
A restart is necessary in this specific case, as all of the values changed
particularly require the PostgreSQL service to stop and start again.
sudo systemctl restart postgresql-12.service
It is possible to check if the configuration values have been correctly
applied to the database by running the following command:
sudo -iu postgres psql
SELECT name,setting,context,source FROM pg_settings WHERE NAME IN ('listen_addresses','archive_mode','password_encryption');
name | setting | context | source
---------------------+---------------+------------+--------------------
archive_mode | on | postmaster | configuration file
listen_addresses | * | postmaster | configuration file
password_encryption | scram-sha-256 | user | configuration file
(3 rows)
Use \q
to exit from the psql prompt.
Configure pgBackRest
Configure a location for the pgBackRest backup repository:
sudo mkdir -p /var/lib/pgbackrest
sudo chmod 0750 /var/lib/pgbackrest
sudo chown -R postgres:postgres /var/lib/pgbackrest
Configure the location and permissions on the pgbackrest log location:
sudo chown -R postgres:postgres /var/log/pgbackrest
Next, modify pgBackRest's configuration files to meet the needs of the environment.
As best practice, first create a backup of any existing pgbackrest.conf
file:
sudo cp /etc/pgbackrest.conf /etc/pgbackrest.conf.backup
Generate a secure, long, and random passphrase to encrypt the repository:
openssl rand -base64 48
This generated value will be used as the repo1-cipher-pass
option.
NOTE: Once the repository has been configured and the stanza created and
checked, the repository encryption settings cannot be changed.
Next, edit the pgbackrest.conf
file as root, entering the following parameters:
[global]
repo1-cipher-pass=uUQsaa7+CCFaqXVagFzNUix3XuLe9e2uqVskqfI6wcKf8BX8y5b+8bL3oimRpV1N
repo1-cipher-type=aes-256-cbc
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
log-level-console=info
log-level-file=debug
[demo]
pg1-path=/var/lib/pgsql/12/data
The [global]
section defines the location of backups, logging settings, and encryption settings.
The [demo]
section defines a stanza for the demo
backup repository, which we will configure.
As with the PostgreSQL settings, best practices encourage an understanding of the configuration options.
More information can be found about these configuration options within the pgBackRest Configuration Guide.
Finally, initialize the pgBackRest stanza, which contains the definitions for the
location, archiving options, backup settings, and other similar configurations
for the PostgreSQL database cluster.
There is generally one stanza defined for each database cluster that needs to have backups.
The stanza-create
command must be run on the primary host after pgbackrest.conf
has been configured.
$ sudo -u postgres pgbackrest --stanza=demo stanza-create
2019-11-15 18:08:57.158 P00 INFO: stanza-create command begin 2.19: --log-level-console=info --log-level-file=debug --pg1-path=/var/lib/pgsql/12/data --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/var/lib/pgbackrest --stanza=demo
2019-11-15 18:08:57.609 P00 INFO: stanza-create command end: completed successfully (455ms)
Pulling It All Together, Performing First Backup
Now that PostgreSQL and pgBackRest have been configured individually, a few final steps must be
performed to tie them together and perform the backup process.
First, edit the postgresql.conf
file once more, and configure the archive_command
:
archive_command = 'pgbackrest --stanza=demo archive-push %p'
This configuration option informs PostgreSQL to use pgBackRest to handle the WAL
segments, pushing them immediately to the archive.
Following this change to the configuration file, reload the PostgreSQL service:
sudo systemctl reload postgresql-12.service
Next, we will check the cluster with pgBackRest. This validates that pgBackRest
and the archive_command
settings are both accurately configured and performing
as expected.
$ sudo -iu postgres pgbackrest --stanza=demo check
2019-11-15 18:10:03.637 P00 INFO: check command begin 2.19: --log-level-console=info --log-level-file=debug --pg1-path=/var/lib/pgsql/12/data --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/var/lib/pgbackrest --stanza=demo
2019-11-15 18:10:04.757 P00 INFO: WAL segment 000000010000000000000001 successfully archived to '/var/lib/pgbackrest/archive/demo/12-1/0000000100000000/000000010000000000000001-bddaecf52ba8c3dd83e6157fea6a4dbeb6476010.gz'
2019-11-15 18:10:04.757 P00 INFO: check command end: completed successfully (1120ms)
If any errors are produced by this command, read and inspect the output for
recommendations on how to resolve the specific issue.
Now, after much ado; perform a full backup:
$ sudo -u postgres pgbackrest --stanza=demo --type=full backup
2019-11-15 18:10:32.421 P00 INFO: backup command begin 2.19: --log-level-console=info --log-level-file=debug --pg1-path=/var/lib/pgsql/12/data --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/var/lib/pgbackrest --repo1-retention-full=2 --stanz a=demo --type=full
2019-11-15 18:10:33.555 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2019-11-15 18:10:32": backup begins after the next regular checkpoint completes 2019-11-15 18:10:33.758 P00 INFO: backup start archive = 000000010000000000000003, lsn = 0/3000028
2019-11-15 18:10:35.930 P01 INFO: backup file /var/lib/pgsql/12/data/base/14187/1255 (632KB, 2%) checksum 60325e5cd07379af0ffe91eea27cfd4f2f07af69 [...] 2019-11-15 18:10:38.818 P00 INFO: full backup size = 24.2MB 2019-11-15 18:10:38.818 P00 INFO: execute non-exclusive pg_stop_backup() and wait for all WAL segments to archive
2019-11-15 18:10:38.920 P00 INFO: backup stop archive = 000000010000000000000003, lsn = 0/3000138
2019-11-15 18:10:39.235 P00 INFO: new backup label = 20191115-181032F
2019-11-15 18:10:39.286 P00 INFO: backup command end: completed successfully (6866ms)
And finally, confirm the backup is working:
$ sudo -u postgres pgbackrest info
stanza: demo
status: ok
cipher: aes-256-cbc
db (current)
wal archive min/max (12-1): 000000010000000000000003/000000010000000000000003
full backup: 20191115-181032F
timestamp start/stop: 2019-11-15 18:10:32 / 2019-11-15 18:10:39
wal start/stop: 000000010000000000000003 / 000000010000000000000003
database size: 24.2MB, backup size: 24.2MB
repository size: 2.9MB, repository backup size: 2.9MB
Restore a Backup
Now that a full backup is performed on a fresh database, it might be useful to test restoring from the full backup.
To do this, stop the PostgreSQL instance, and delete its data files, simulating a system administration disaster.
sudo systemctl stop postgresql-12.service
sudo find /var/lib/pgsql/12/data -mindepth 1 -delete
At this point, trying to start the database will result in a failure:
$ sudo systemctl start postgresql-12.service
## THIS WILL FAIL
Job for postgresql-12.service failed because the control process exited with error code. See "systemctl status postgresql-12.service" and "journalctl -xe" for details.
Perform a restore on the database:
sudo -iu postgres pgbackrest --stanza=demo --delta restore
Once the restore has completed, the database will start as expected:
sudo systemctl start postgresql-12.service
You can verify that pgBackRest is still working:
$ sudo -u postgres pgbackrest --stanza=demo check
2019-11-15 18:13:56.707 P00 INFO: check command begin 2.19: --log-level-console=info --log-level-file=debug --pg1-path=/var/lib/pgsql/12/data --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/var/lib/pgbackrest --stanza=demo
2019-11-15 18:13:57.594 P00 INFO: WAL segment 000000020000000000000005 successfully archived to '/var/lib/pgbackrest/archive/demo/12-1/0000000200000000/000000020000000000000005-bd01dc079338748cd9772a7c324eed0d68d45a9c.gz'
2019-11-15 18:13:57.594 P00 INFO: check command end: completed successfully (887ms)
After any sort of disaster instance, it is always best practice to follow up any restore with a fresh backup:
sudo -u postgres pgbackrest --stanza=demo --type=full backup
In Conclusion
In conclusion, pgBackRest offers a large amount of possibilities and use-cases.
It is quite simple to install, configure, and use, simplifying Point-in-time recovery through WAL archiving.
Ensuring that backups are working and valid allows for peace of mind, should any disaster strike.
Related Articles
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read
- Smarter Postgres LLM with Retrieval Augmented Generation
6 min read
- Postgres Partitioning with a Default Partition
16 min read
- Iceberg ahead! Analyzing Shipping Data in Postgres
8 min read