When Does ALTER TABLE Require a Rewrite?
Greg Sabino Mullane
6 min readMore by this author
It is rare that a Postgres table keeps the exact same structure year after year. New columns get added. Old columns get dropped. Column data types need to change. Those are all done with the ALTER TABLE command. One big drawback to these changes is that they may force a complete table rewrite. A rewrite means a completely new copy of the table is created, and then the old one is dropped. This can take a very long time for large tables. Worse, everything else is blocked/locked from using the table, so your application may need downtime.
So which commands need a full table rewrite? Which only needs a split-second lock? The alter table documentation has some guidelines, but at the end of the day, you want to know for sure if your production application is going to require major downtime or not. Presented here is a quick recipe for safely determining if a rewrite will happen. In short make a copy of the table, modify that copy, see if the underlying file on disk for that table has changed.
Let’s look at what happens when a table is modified. For this, we will use the good old reliable pgbench_accounts table, which gets created when you run pgbench -i
. We can simulate the effects of a long-running table rewrite by putting our statement into a transaction. For this first one, let’s add a new column to it and see what locks are being held:
greg=> begin;
BEGIN
greg=*> alter table pgbench_accounts add chocolates int;
ALTER TABLE
greg=*> select locktype, mode from pg_locks
where relation::regclass::text = 'pgbench_accounts';
locktype | mode
----------+---------------------
relation | AccessExclusiveLock
The AccessExclusiveLock is on the entire table, and is a very, very strong lock that blocks almost all other access to the table from other processes. Let’s start another process and see what happens:
-- Without the lock_timeout, this update would hang forever,
-- or until the other transaction commits:
greg=> set lock_timeout TO '5s';
SET
greg=> update pgbench_accounts set bid = bid;
ERROR: canceling statement due to lock timeout
LINE 1: update pgbench_accounts set bid = bid;
-- This lock also stops SELECT statements as well!
-- (another way to not wait forever is with a statement_timeout)
greg=> set statement_timeout = '500ms';
SET
greg=> select * from pgbench_accounts limit 10;
ERROR: canceling statement due to statement timeout
This confirms that ALTER TABLE can prevent your application from accessing the table. Yet, all is not lost, watch what happens when we run it without a commit:
greg=> \timing on
Timing is on.
greg=> alter table pgbench_accounts add chocolates bigint;
ALTER TABLE
Time: 2.149 ms
Two milliseconds is quite fast. Most applications should be able to easily deal with that. One caveat is that the lock still needs to be acquired. To do this, it needs to have complete solo control of the table for a split second. For a very busy table, this may take some coordination with your application. The table rewrite may require that your application be blocked for hours, or even days!
Create a test table with CREATE TABLE LIKE
To start, we always want to work on a simulacrum of your important production table. In other words, a copy of the table structure, but without all the table data. Postgres provides a handy way to do this with the LIKE clause of the CREATE TABLE command. You may specify which optional parts of the table to copy over. For our purposes, we only need the column defaults, so we can write this:
CREATE UNLOGGED TABLE gregtest (LIKE pgbench_accounts INCLUDING defaults)
Might as well make this an unlogged table, as this will be a very temporary construct. If we compare the two tables, their structure is identical:
greg=> \d pgbench_accounts
Table "public.pgbench_accounts"
Column | Type | Collation | Nullable | Default
----------+---------------+-----------+----------+---------
aid | integer | | not null |
bid | integer | | |
abalance | integer | | |
filler | character(84) | | |
greg=> \d gregtest
Unlogged table "public.gregtest"
Column | Type | Collation | Nullable | Default
----------+---------------+-----------+----------+---------
aid | integer | | not null |
bid | integer | | |
abalance | integer | | |
filler | character(84) | | |
Was a table rewritten?
Now, we can ALTER our new table. But how do we determine if the table was rewritten? As this table has no rows, we need another way to detect it rather than seeing how long it takes. Every table in Postgres is mapped to one or more physical files in the Postgres data directory. We can see the file for the table with the pg_relation_filenode function. If the output changes, we know that it has been rewritten. First, let’s add a simple BIGINT column with no default, which should not cause a rewrite:
greg=> select pg_relation_filenode('gregtest');
pg_relation_filenode
----------------------
1495135
greg=> alter table gregtest add monkeys bigint;
ALTER TABLE
greg=> select pg_relation_filenode('gregtest');
pg_relation_filenode
----------------------
1495135
Notice that our filenode has not changed. So we still need the Access Exclusive lock, but a rewrite did not happen. Let’s run an example that does rewrite the table:
greg=> select pg_relation_filenode('gregtest');
pg_relation_filenode
----------------------
1495135
greg=> alter table gregtest add foobar4 bigint default random(1,10);
ALTER TABLE
greg=> select pg_relation_filenode('gregtest');
pg_relation_filenode
----------------------
1495140
Per the documentation, a non-static default will force a rewrite. In this case, all the existing rows will have a random number from 1 to 10 inserted into them. Although this table has zero rows, the rewrite happened anyway.
Another common use case is changing the data type of an existing column. As I know from bitter experience, changing from an int to a bigint always requires a rewrite:
greg=> select pg_relation_filenode('gregtest');
pg_relation_filenode
----------------------
1495140
greg=> alter table gregtest alter column bid type bigint;
ALTER TABLE
greg=> select pg_relation_filenode('gregtest');
pg_relation_filenode
----------------------
1495144
There are a few data type changes that do not require a rewrite, so you should always TIAS ("try it and see"). Does going from an 8-byte bigint to a 4-byte force a rewrite?
greg=> select pg_relation_filenode('gregtest');
pg_relation_filenode
----------------------
1495144
greg=> alter table gregtest alter bid type integer;
ALTER TABLE
greg=> select pg_relation_filenode('gregtest');
pg_relation_filenode
----------------------
1495147
Yep! Even though anything we put into a bigint can “fit” inside an integer, they are represented differently on disk and thus need a table rewrite.
Summary
Hopefully this helps you to learn if a rewrite is needed or not, without actually messing with your production table! Don’t forget to drop that test table when you are done. If you need to make a change that requires a table rewrite, but cannot afford the downtime, there are more complex options - such as using logical replication - that can help.
There are a few other operations that can cause a full table rewrite, but they are mostly things you probably will not encounter very often:
- VACUUM FULL
- CLUSTER
- REFRESH MATERIALIZED VIEW
- Changing a table from LOGGED to UNLOGGED, or vice-versa.