PostgreSQL Unlogged Tables - Look Ma, No WAL!
While supporting customers at Crunchy Data, I sometimes see users add unlogged tables. In this post, I'm going to review some of the specifics about this.
Unlogged tables in Postgresql are a way of preventing specific tables from generating WAL (Write Ahead Log) information by adding the unlogged
keyword to a create table
command:
CREATE UNLOGGED TABLE foobar (id int);
You can also change existing tables to unlogged, and switch them back again:
ALTER TABLE mytable SET UNLOGGED; -- cheap!
ALTER TABLE mytable SET LOGGED; -- expensive!
While there are strong advantages to using unlogged tables, you must use them carefully as their disadvantages make the number of valid use cases small.
Postgres unlogged table advantages
- Massive improvements to write performance (as seen below).
- Less vacuum impact (because vacuum changes are writes that also end up in the WAL stream).
- Less total WAL (leading to less traffic, smaller backups, etc.)
Postgres unlogged table disadvantages
- Tables are truncated on crash recovery. No durability!.
- Unlogged tables can only be accessed on the primary, not on the replicas.
- Unlogged tables can NOT be used in logical replication, or physical backups.
Unlogged tables in Postgres generate no WAL and are faster to update
Let's verify that an unlogged table prevents generating WAL. First, we will create a test table using the pgbench utility, and then update it a bit. Not only will we see how long the updates take, but we will peek inside the pg_wal directory and see how much WAL was created. For this test, it's important that autovacuum remains off and no other clients are connecting, to keep things honest.
$ # Ensure that autovacuum is (temporarily!) disabled:
$ psql -c 'alter system set autovacuum=off' -c 'select pg_reload_conf()'
$ # Generate a table with one million rows:
$ pgbench -i -s 10
dropping old tables...
creating tables...
generating data (client-side)...
1000000 of 1000000 tuples (100%) done (elapsed 1.25 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 3.60 s (drop tables 0.04 s, create tables 0.01 s,
client-side generate 1.28 s, vacuum 1.81 s, primary keys 0.48 s).
$ ## Update one million rows:
$ touch gregtest; time psql -c 'update pgbench_accounts set aid=aid'
UPDATE 1000000
real 0m5.592s
user 0m0.002s
sys 0m0.000s
$ ## Determine how much WAL was generated by the above command:
$ DD=$(psql -Atc 'show data_directory')
$ find $DD/pg_wal -newer gregtest -size 16M | wc -l | xargs echo 16* | bc
320
To update a million rows in a normal, logged Postgres table took us 5.6 seconds, and generated roughly 320MB of WAL. How much different will an unlogged table be?
$ ## pgbench can directly create its tables as unlogged:
$ pgbench -i -s 10 --unlogged
dropping old tables...
creating tables...
generating data (client-side)...
1000000 of 1000000 tuples (100%) done (elapsed 0.73 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 1.56 s (drop tables 0.03 s, create tables 0.03 s, client-side generate 0.80 s, vacuum 0.15 s, primary keys 0.20 s).
$ touch gregtest; time psql -c 'update pgbench_accounts set aid=aid'
UPDATE 1000000
real 0m3.173s
user 0m0.002s
sys 0m0.000s
$ find $DD/pg_wal -newer gregtest -size 16M | wc -l | xargs echo 16* | bc
16
Not only did we perform the update twice as fast as before, both the table creation time and the amount of WAL generated shrunk as well! Since pgbench is a benchmarking utility, let's have it run for 10 seconds and see the difference in speed for the default test pgbench uses:
$ pgbench -i -s 10
$ pgbench --time 10
pgbench (15.1)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 1680
number of failed transactions: 0 (0.000%)
latency average = 5.949 ms
initial connection time = 7.761 ms
tps = 168.087557 (without initial connection time)
Now unlogged:
$ pgbench -i -s 10 --unlogged
$ pgbench -T 10
pgbench (15.1)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 4857
number of failed transactions: 0 (0.000%)
latency average = 2.059 ms
initial connection time = 2.522 ms
tps = 485.706168 (without initial connection time)
That's a big difference! 168 transactions per second versus 485 transactions per second for the unlogged version. So while the performance gains are real, let's look closer at the drawbacks.
Unlogged tables in Postgres are not persistent
The data in unlogged tables is ephemeral and lives outside of the strict data protection that Postgres usually provides. So should someone literally pull the plug on your Postgres database server, your data is 100% safe, except for unlogged tables, in which your data is 100% gone - by design! Because things are not logged, there is no way to guarantee data integrity, so the table is simply truncated. Note that a simple restart will NOT truncate the table:
$ pgbench -i -s 10 --unlogged
$ psql -Atc 'select count(*) from pgbench_accounts'
1000000
$ pg_ctl -D data restart
waiting for server to shut down.... done
server stopped
...
server started
$ psql -Atc 'select count(*) from pgbench_accounts'
1000000
However, a crash will truncate it. We can force Postgres to crash by forcibly killing one of its subsystems. In this example, we will pick on the walwriter backend. Do not do this on a production system!
$ pgbench -i -s 10 --unlogged
$ kill -9 $(psql -Atc "select pid from pg_stat_activity where backend_type = 'walwriter'")
$ psql -Atc 'select count(*) from pgbench_accounts'
psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: the database system is in recovery mode
$ psql -Atc 'select count(*) from pgbench_accounts'
0
That data is lost forever. Lucky for us, it was only random sample data generated by the pgbench program, so we can recreate it easy enough using pgbench -i
. If you are using unlogged tables, you must always be ready to either lose the data forever, or have a way to quickly repopulate the table.
Unlogged tables in Postgres are not replicated
Because streaming replication (and for that matter, log shipping of WAL files) uses information from the WAL files to update replicas, unlogged tables are not able to be replicated. They do exist on the replica, but are empty and not accessible.
$ ## Spin up a replica
$ pg_basebackup -R -c fast -D data_replica
$ ## Set a new port:
$ echo port=5555 >> data_replica/postgresql.conf
$ ## Cluster names are handy:
$ echo "cluster_name='myreplica'" >> data_replica/postgresql.conf
$ ## We do not want this replica sending WAL files to pgbackrest
$ echo "archive_mode=off" >> data_replica/postgresql.conf
$ ## Start up the replica on port 5555
$ pg_ctl -D data_replica start
$ ## Generate some WAL traffic on the primary (default port)
$ pgbench -i --unlogged; pgbench -T 3
$ ## Confirm replication is happening
$ psql -c 'select application_name, replay_lag from pg_stat_replication'
application_name | replay_lag
------------------+-----------------
myreplica | 00:00:00.001221
On the replica, the table is not simply empty, it is completely inaccessible and throws an error when we try to view it:
$ psql -p 5432 -Atc 'select count(*) from pgbench_accounts'
100000
$ psql -p 5555 -Atc 'select count(*) from pgbench_accounts'
ERROR: cannot access temporary or unlogged relations during recovery
Here, "recovery" means that the the replica is constantly recovering, or playing through the WAL stream as it comes in. Because the WAL stream does not contain information about any changes in the table, we cannot look at its contents. Indeed, the underlying data files do not even exist on the replica. While the replica is almost a byte for byte copy of the primary, unlogged tables are absolutely part of the "almost" at the start of this sentence.
Postgres disk files for a regular table
$ pgbench -i
$ PRIMARYDIR=$( psql -Atc 'show data_directory' )
$ REPLICADIR=$( psql -p 5555 -Atc 'show data_directory' )
$ FILENODE=$( psql -Atc "select pg_relation_filenode('pgbench_accounts')" )
$ find $PRIMARYDIR -ls | grep $FILENODE | sort -k9
363139 13120 -rw------- 1 greg greg 13434880 Dec 25 8:03 data/base/5/94858
363184 24 -rw------- 1 greg greg 24576 Dec 25 8:03 data/base/5/94858_fsm
363183 8 -rw------- 1 greg greg 8192 Dec 25 8:03 data/base/5/94858_vm
$ find $REPLICADIR -ls | grep $FILENODE | sort -k9
363174 13120 -rw------- 1 greg greg 13434880 Dec 25 8:03 data_replica/base/5/94858
363180 24 -rw------- 1 greg greg 24576 Dec 25 8:03 data_replica/base/5/94858_fsm
364135 8 -rw------- 1 greg greg 8192 Dec 25 8:03 data_replica/base/5/94858_vm
The fsm and vm tables stand for "free space map" and "visibility map" respectively, and contain metadata about the actual data, which as we can see is 13,434,880 bytes on both system. Unlogged tables look different:
Postgres disk files for an unlogged table
$ pgbench -i --unlogged
$ FILENODE=$( psql -Atc "select pg_relation_filenode('pgbench_accounts')" )
$ find $PRIMARYDIR -ls | grep $FILENODE | sort -k9
363180 13120 -rw------- 1 greg greg 13434880 Dec 25 8:03 data/base/5/94880
363174 24 -rw------- 1 greg greg 24576 Dec 25 8:03 data/base/5/94880_fsm
363181 0 -rw------- 1 greg greg 0 Dec 25 8:03 data/base/5/94880_init
363133 8 -rw------- 1 greg greg 8192 Dec 25 8:03 data/base/5/94880_vm
$ find data_replica -ls | grep 94880 | sort -k9
363173 0 -rw------- 1 greg greg 0 Dec 25 8:03 data_replica/base/5/94880_init
The init
file is an empty version of the "real" table called an initialization fork whose job is to replace the real file if the database crashes and starts up again, thus performing the "truncation" part of the unlogged tables lifecycle.
While we cannot access the contents of the table on the replica, all of its metadata is still in place, so things like \d
still work fine:
$ psql -p 5555 -c '\d pgbench_accounts'
Unlogged table "public.pgbench_accounts"
Column | Type | Collation | Nullable | Default
----------+---------------+-----------+----------+---------
aid | integer | | not null |
bid | integer | | |
abalance | integer | | |
filler | character(84) | | |
Indexes:
"pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
Unlogged tables in Postgres are not backed up
Tools like pgBackRest exclude temporary and unlogged tables from backups. Again, this means that you must treat all your data in unlogged tables as very transient. Let's prove that backups do not copy the files:
$ psql -c 'create table aaa as select 867 as id from generate_series(5,309)'
$ psql -c 'create unlogged table bbb as select 867 as id from generate_series(5,309)'
$ psql -c 'vacuum aaa, bbb'
$ AAA=$( psql -Atc "select pg_relation_filenode('aaa')" ) ## gives 980808
$ BBB=$( psql -Atc "select pg_relation_filenode('bbb')" ) ## gives 980811
$ pgbackrest --stanza=mydemo backup
$ grep -E "$AAA|$BBB" /var/log/pgbackrest/mydemo-backup.log | cut -d\( -f1 | sort -k7
2023-12-25 13:09:37.482 P02 DETAIL: backup file /home/greg/pg/15/data_replica/base/5/980808
2023-12-25 13:09:34.492 P02 DETAIL: backup file /home/greg/pg/15/data_replica/base/5/980808_fsm
2023-12-25 13:09:37.479 P02 DETAIL: backup file /home/greg/pg/15/data_replica/base/5/980808_vm
2023-12-25 13:09:40.658 P02 DETAIL: backup file /home/greg/pg/15/data_replica/base/5/980811_init
If you peer closely at the output of our pgbackrest logs above, you will notice that table 'aaa' had three files for it backed up: the main file holding the rows, a visibility map file, and a free space map file. In contrast, the only thing that got copied for our unlogged table 'bbb' was the "init" file, which upon a restore will become a new and empty version of the "bbb" table. Also observe that our backup pulled the file from the replica, thanks to the awesome backup-standby pgBackRest feature.
Be careful, because tools that do logical backups such as pg_dump will still have the data rows, although this is not necessarily a good thing :)
$ pg_dump -t bbb | wc -l
350
Changing a Postgres logged table to unlogged is cheap
If you take a table that is logged and change it to unlogged, the process is relatively quick. While a new file on disk is written, no WAL is generated.
$ pgbench -i -s 20
$ time psql -c 'alter table pgbench_accounts set unlogged'
ALTER TABLE
real 0m1.754s
user 0m0.000s
sys 0m0.004s
Changing a Postgres unlogged table to logged is expensive
On the other hand, changing a table from unlogged to logged means writing information about the entire file to the WAL stream, which can be expensive:
$ pgbench -i -s 10 --unlogged
$ time psql -c 'alter table pgbench_accounts set logged'
ALTER TABLE
real 0m16.601s
user 0m0.004s
sys 0m0.000s
PostgreSQL unlogged tables - final judgement
As we have seen, the use of unlogged tables in Postgres can give you a very large performance improvement, but with some very large drawbacks. If the data is temporary and/or easily replaced, you may want to give unlogged tables a try. Just treat your unlogged tables as living in a Schrödinger's box, in which at any moment the table could contain all the rows, or none of the rows! :)
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read