Devious SQL: Run the Same Query Against Tables With Differing Columns
We spend time day in, day out, answering the questions that matter and coming up with solutions that make the most sense. However, sometimes a question comes up that is just so darn…interesting that even if there are sensible solutions or workarounds, it still seems like a challenge just to take the request literally. Thus was born this blog series, Devious SQL.
Devious: "longer and less direct than the most straightforward way."
The inaugural question
The basic question/request that inspired this article (and in fact this entire series) was an internal question about whether one could devise a query that ran against a table/view that might have different columns depending on the PostgreSQL version. The specific details are unimportant here, but the basic idea was intriguing. So let's polish up our toolkits and see what we can accomplish.
The naïve approach
It's always good to start with the basics, so let's just try to setup/simulate our environment; we want to have two tables that differ by the addition of a column on one. We will use the same database but two separate schemas in order to allow us to just change the search_path
to simulate running the query against two separate tables.
Here is the SQL:
CREATE SCHEMA db1;
CREATE SCHEMA db2;
CREATE TABLE db1.albums (album_name text, band_name text, release_date date, favorite_song text, rating int);
CREATE TABLE db2.albums (album_name text, band_name text, release_date date);
COPY db1.albums FROM STDIN;
Led Zeppelin Led Zeppelin 01-12-1969 Babe I'm Gonna Leave You 5
Led Zeppelin II Led Zeppelin 10-22-1969 The Lemon Song 5
Led Zeppelin III Led Zeppelin 10-05-1970 Since I've Been Loving You 5
The Wall Pink Floyd 11-30-1979 Comfortably Numb 5
Wish You Were Here Pink Floyd 09-12-1975 Wish You Were Here 5
Black Sabbath Black Sabbath 02-13-1970 N.I.B. 5
Paranoid Black Sabbath 09-18-1970 War Pigs 5
\.
INSERT INTO db2.albums SELECT album_name, band_name, release_date FROM db1.albums;
CREATE SCHEMA
CREATE SCHEMA
CREATE TABLE
CREATE TABLE
COPY 7
INSERT 0 7
As you can see, we've made a simple table to hold our database but one of the tables has additional columns that the other doesn't. Let's try the straightforward approach for our query.
-- running the query against db1-version of the table
SET search_path = db1;
SELECT album_name,favorite_song,rating FROM albums WHERE band_name = 'Led Zeppelin';
album_name favorite_song rating
Led Zeppelin Babe I'm Gonna Leave You 5
Led Zeppelin II The Lemon Song 5
Led Zeppelin III Since I've Been Loving You 5
-- running the query against db2-version of the table
SET search_path = db2;
SELECT album_name,favorite_song,rating FROM albums WHERE band_name = 'Led Zeppelin';
ERROR: column "favorite_song" does not exist
LINE 1: SELECT album_name,favorite_song,rating FROM albums WHERE ban...
^
Next attempt
Whelp, no surprise, the same query won't run against a table that lacks the column in question. But can we get a little more devious about it?
Let's think about it a bit: do we have a data type in SQL that can help us break away from the structured side of column names and deal with things a little more loosely? Such that the parser wouldn't hate us if we mention a column that doesn't exist? Answer: yes, there a few, however the one we care about is our friend JSON! There are functions that let us transform an arbitrary table row into a JSON record (aka to_json()
and the like), and then we can manipulate the entire row as a single object, extract columns from, etc.
Let's make our next attempt via casting the row to JSON and seeing about pulling the fields out dynamically. (Note: here we use the pre-PostgreSQL-14-style JSON key lookup syntax, but see Craig K.'s blog about upcoming changes.)
SET search_path = db1;
SELECT
json->>'album_name',
json->>'favorite_song',
json->>'rating'
FROM
(
SELECT to_json(albums) json
FROM albums
WHERE
band_name = 'Led Zeppelin'
) j
;
?column? ?column? ?column?
Led Zeppelin Babe I'm Gonna Leave You 5
Led Zeppelin II The Lemon Song 5
Led Zeppelin III Since I've Been Loving You 5
SET search_path = db2;
SELECT
json->>'album_name',
json->>'favorite_song',
json->>'rating'
FROM
(
SELECT to_json(albums) json
FROM albums
WHERE
band_name = 'Led Zeppelin'
) j
;
?column? ?column? ?column?
Led Zeppelin
Led Zeppelin II
Led Zeppelin III
All right, no errors! It's quittin' time!
But...
Fixing the deficiencies
We have a few deficiencies here that we could fix instead; we lost the column names, and the data types are all text, plus we don't have a default value for handling things when we run against the missing table. It would be nice to handle things like this and have the results act as much like a "real table" as possible.
So we can certainly structure this query to give it proper labels again, using AS <fieldname>
syntax for the field descriptions. And since we know the expected column data types, we can always use casting to handle things properly. And since the ->>
operator returns NULL
if a key is not found, we can end up using COALESCE()
to provide our defaults. Putting this all together, something like this should work for our example data:
SET search_path = db1;
SELECT
json->>'album_name' as album_name,
COALESCE(json->>'favorite_song', 'They all rock!') as favorite_song,
COALESCE((json->>'rating')::int, 99) as rating
FROM
(
SELECT to_json(albums) json
FROM albums
WHERE
band_name = 'Led Zeppelin'
) j
;
album_name favorite_song rating
Led Zeppelin Babe I'm Gonna Leave You 5
Led Zeppelin II The Lemon Song 5
Led Zeppelin III Since I've Been Loving You 5
SET search_path = db2;
SELECT
json->>'album_name' as album_name,
COALESCE(json->>'favorite_song', 'They all rock!') as favorite_song,
COALESCE((json->>'rating')::int, 99) as rating
FROM
(
SELECT to_json(albums) json
FROM albums
WHERE
band_name = 'Led Zeppelin'
) j
;
album_name favorite_song rating
Led Zeppelin They all rock! 99
Led Zeppelin II They all rock! 99
Led Zeppelin III They all rock! 99
Downsides
Of course there are downsides to this approach, however the mantra of this blog series is not "should we?" but "could we?" Among the downsides, you might need to create an expression index on the to_json()
call to make this performant over large amounts of data; this plus the need to manually tweak the data types and return values to be cast to the appropriate data type you expect makes this a ridiculous approach to use in actuality. Kiddies, don't try this at home.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read