Devious SQL: Dynamic DDL in PostgreSQL
Supporting PostgreSQL DBAs is an important part of daily life here at Crunchy Data. I’ve recently run across a few use cases where utility queries based on the current state of the database are needed. A simple example could be where you have a table that is the target of logical replication and the id
column becomes out of sync with the sequence that generated the data. This would result in new rows having primary key conflicts. To correct this issue, you would need to set the sequence to generate values past the current max value in the table.
This example is part of a larger class of problems which are best solved with functionality that SQL by itself does not directly provide: Dynamic DDL. Data Definition Language (DDL) in SQL itself is notoriously non-dynamic, with strict parsing rules, predefined data types, table structures, and queries based on known and articulated columns.
So how can we bend SQL to our will and execute Dynamic DDL Postgres queries without having to manually write these queries each time? In this next installment of my Devious SQL series (see posts #1 and #2), I’ll show you some SQL approaches to get the job done.
Altering sequence restart values
Let us again consider a scenario where we want to explicitly provide the RESTART
value for a sequence via a query. This is an easy thing to express in terms of what we would like to do: we want to reset a sequence to start after the current maximum value of the table it is associated with.
Trying the naïve approach, we get:
ALTER SEQUENCE big_table_id_seq RESTART (SELECT max(id) + 1 FROM big_table);
ERROR: syntax error at or near "(", at character 41
STATEMENT: ALTER SEQUENCE big_table_id_seq RESTART (SELECT max(id) + 1 FROM big_table);
As we can see, this approach isn't supported by the PostgreSQL grammar, as it is expecting an actual value here, not a subquery (as nice as that would be).
So what are some approaches here?
Using psql
variable substitution
If we are using psql
, we have a few options on how to solve this problem. One approach is using psql
variables and first selecting the value we want into a variable, then substituting this value into the expression we pass to psql:
-- use \gset to set a psql variable with the results of this query
SELECT max(id) + 1 as big_table_max from big_table \gset
-- substitute the variable in a new query
ALTER SEQUENCE big_table_id_seq RESTART :big_table_max ;
ALTER SEQUENCE
In this example, we are using the \gset
command to capture the results of the first query and store it for use later in the psql
session. We then interpolate this variable into our expression using the :big_table_max
syntax, which will be passed directly to the PostgreSQL server.
Using psql
's \gexec
command
Another method of utilizing psql
for dynamic SQL is constructing the query as a SELECT
statement returning the statements you wish to run, then using the \gexec
command to execute the underlying queries. First let's look at making ourselves a query that returns the statement we want, then we'll run this statement using \gexec
:
SELECT 'ALTER SEQUENCE big_table_id_seq RESTART ' || max(id) + 1 as query FROM big_table;
SELECT 'ALTER SEQUENCE big_table_id_seq RESTART ' || max(id) + 1 as query FROM big_table \gexec
query
ALTER SEQUENCE big_table_id_seq RESTART 100001
ALTER SEQUENCE
A benefit of this approach compared to the variable substitution one is that this can work with more complex statements and multiple return values, so you could construct queries based on arbitrary conditions and generate more than one SQL query; the first implementation is limited to queries that return single rows at a time. This also gives you a preview of the underlying SQL statement that you will be running before you execute it against the server with \gexec
, so provides some level of safety if you were doing some sort of destructive action in the query.
Dynamic SQL without psql
Not everyone uses psql
as the interface to PostgreSQL, despite its obvious superiority :-), so are there ways to support dynamic SQL using only server-side tools? As it so happens there are several, using basically the same approach of writing a plpgsql
snippet to generate the query, then EXECUTE
to run the underlying utility statement. These roughly correlate to the approaches in the psql
section above in that they work best for single or multiple dynamic statements.
DO
blocks
To use server-side Dynamic SQL we will need to construct our queries using plpgsql
and execute the underlying text as if we were issuing the underlying query ourselves.
DO $$
BEGIN
EXECUTE format('ALTER SEQUENCE big_table_id_seq RESTART %s', (SELECT max(id) + 1 FROM big_table));
END
$$
LANGUAGE plpgsql;
DO
In this case we are using PostgreSQL's built-in format()
function which substitutes arguments similar to printf()
in C-based languages. This allows us to interpolate the subquery result we were wanting in this case, resulting in a string that PostgreSQL can EXECUTE
and giving us the result we want.
Create an exec()
function
Almost identical in function to the DO
block, we can also create a simple plpgsql
function that simply calls EXECUTE
on it input parameter like so:
CREATE OR REPLACE FUNCTION exec(raw_query text) RETURNS text AS $$
BEGIN
EXECUTE raw_query;
RETURN raw_query;
END
$$
LANGUAGE plpgsql;
SELECT exec(format('ALTER SEQUENCE big_table_id_seq RESTART %s', (SELECT max(id) + 1 FROM big_table)));
CREATE FUNCTION
exec
ALTER SEQUENCE big_table_id_seq RESTART 100001
This may seem like a fairly pointless change compared to the previous approach, as we have basically only moved our query into a parameter that we pass in, but what it buys us is the ability to call this function against a list of queries that we construct using normal SQL, giving us the option of running each in turn.
Restrictions
So what type of SQL can be run in each of these sorts of approaches, and are there any restrictions in what we can run via Dynamic SQL with these methods? The main consideration about the different approaches is related to commands that need to be run outside of an explicit transaction block.
Consider if we wanted to run a REINDEX CONCURRENTLY
on all known indexes, so we used the exec()
approach to construct a REINDEX CONCURRENTLY
statement for all indexes in the public
schema:
SELECT
exec(format('REINDEX INDEX CONCURRENTLY %I', relname))
FROM
pg_class
JOIN
pg_namespace
ON pg_class.relnamespace = pg_namespace.oid
WHERE
relkind = 'i' AND
nspname = 'public'
ERROR: REINDEX CONCURRENTLY cannot be executed from a function
CONTEXT: SQL statement "REINDEX INDEX CONCURRENTLY big_table_pkey"
PL/pgSQL function exec(text) line 3 at EXECUTE
As you can see here, this won't work as a function due to REINDEX CONCURRENTLY
needing to manage its own transaction state; in PostgreSQL, functions inherently run inside a transaction to allow the impact of a function to either completely succeed or completely fail. (Atomicity in ACID.)
Let's try this using \gexec
:
SELECT
format('REINDEX INDEX CONCURRENTLY %I', relname)
FROM
pg_class
JOIN
pg_namespace
ON pg_class.relnamespace = pg_namespace.oid
WHERE
relkind = 'i' AND
nspname = 'public'
\gexec
REINDEX
Since the \gexec
handling is done by psql
, the resulting statement is effectively run at the top-level as if it appeared literally in the SQL file.
More advanced usage
Look for a followup blog article where I go into more advanced techniques using Dynamic SQL, particularly using the Devious: longer and less direct than the most straightforward way. ↩\gexec
function or exec()
itself. Until next time, stay devious1!Footnotes
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read