Postgres Query Optimization: LEFT JOIN vs UNION ALL
Introduction
The PostgreSQL optimizer is an amazing thing, getting only more amazing with each release. It is able to take information about your data definitions, your data distribution, constraints, and the specific queries and come up with the generally most efficient way to return the results of that query.
Since SQL is a declarative language, we're explicitly giving up defining how the database determines the results and trusting it to get the correct results in whatever method it deems most efficient. Sometimes we can structure queries in a certain way to get a better result than we get with the straightforward approach.
The issue
I was working with one of our clients hosted on Crunchy Bridge, looking at the top queries in pg_stat_statements
by mean time spent. The top outlier was a query of the form:
SELECT
"table_a".*,
COALESCE("table_b"."count", 0) AS table_count
FROM
table_a
LEFT JOIN
table_b ON table_a.id = table_b.a_id
WHERE
NOT bool_1 AND NOT bool_2 AND NOT bool_3
ORDER BY
table_count DESC
LIMIT 100
The average time on this query, as reported by pg_stat_statements
was in the 2-3 second range, which for this client was too long. The tables involved here were decently-sized, with millions of rows per table.
The specific query involved here utilized a table which stored the rollup counts for an additional table; this query was used to support optimizing a COUNT(*)
for each row in a corresponding table. For this specific table, the data distribution was a bit lopsided, so for a common case, the underlying COUNT(*)
would be fast; however there were enough outliers in the table that had huge numbers of counts that the general-purpose plan would not work. For this reason, we use an unlogged table to store the results of the lookups. (Since this version of PostgreSQL does not support UNLOGGED MATERIALIZED VIEWS, we opted for this approach as the generation of this table generated quite a bit of WAL.)
Here is what the plan for the unmodified query looked like:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=978034.80..978036.03 rows=10 width=2934)
-> Gather Merge (cost=978034.80..1430258.92 rows=3680894 width=2934)
Workers Planned: 7
-> Sort (cost=977034.68..978349.28 rows=525842 width=2934)
Sort Key: (COALESCE(table_b.count, '0'::bigint)) DESC
-> Parallel Hash Left Join (cost=87599.95..965671.42 rows=525842 width=2934)
Hash Cond: (table_a.id = table_b.a_id)
-> Parallel Seq Scan on table_a (cost=0.00..874743.68 rows=525842 width=2926)
Filter: ((NOT bool_1) AND (NOT bool_2) AND (NOT bool_3))
-> Parallel Hash (cost=62782.20..62782.20 rows=1985420 width=12)
-> Parallel Seq Scan on table_b (cost=0.00..62782.20 rows=1985420 width=12)
(11 rows)
Since the ORDER BY
clause utilizes an expression, we would not be able to directly use an index here, and since the whole point of this COALESCE
is to provide a default value for rows that do not exist in the joined table, we cannot use some sort of an expression index. As such, we cannot directly use an index to return the values we are interested in and we will need to resort to an exernal sort.
So how can we see about improving this? One thing to note here is that we are substituting in a constant value in our LEFT JOIN
case; basically we are assuming that if there is a missing row, the count for this is now 0
. Let us get our light bulbs ready, since we can use this fact as follows:
A LEFT JOIN
for tables A and B is the intersection of A and B plus the rows in A that do not appear in B. (This is the NULL matching id case.) We cannot index a left join directly, however the intersection case can be indexed quite nicely, and we can perhap do something better than the current case for the null join case. So rather than using a LEFT JOIN
directly, let's create a query which explicitly calculates each part of the result set and appends them, and see how we can improve performance.
For the intersection, we have the following:
SELECT
"table_a".*,
"table_b"."count" AS table_count
FROM
table_a
JOIN
table_b ON table_a.id = table_b.a_id
WHERE
NOT bool_1 AND NOT bool_2 AND NOT bool_3
ORDER BY
table_count DESC
For the rows in A not in B we have:
SELECT
"table_a".*,
0 AS table_count
FROM
table_a
LEFT JOIN
table_b ON table_a.id = table_b.a_id
WHERE
NOT bool_1 AND NOT bool_2 AND NOT bool_3
AND table_b.id IS NULL
Combining these two gives us the following query:
SELECT * FROM (
SELECT
"table_a".*,
"table_b"."count" AS table_count
FROM
table_a
JOIN
table_b ON table_a.id = table_b.a_id
WHERE
NOT bool_1 AND NOT bool_2 AND NOT bool_3
ORDER BY
table_count DESC
) intersect
UNION ALL
SELECT * FROM (
SELECT
"table_a".*,
0 AS table_count
FROM
table_a
LEFT JOIN
table_b ON table_a.id = table_b.a_id
WHERE
NOT bool_1 AND NOT bool_2 AND NOT bool_3
AND table_b.id IS NULL
) antijoin
LIMIT 100
Since in our original query there was no secondary ordering, we don't care what the order of the antijoin rows are; they are always the same constant sort value, so the natural order returned by the database is sufficient. Also, since our intersecting set is an indexed column, our ORDER BY count DESC
can be fulfilled by using a reverse btree index scan, saving us a sort node.
That gives us the following plan, which we can see now is much more performant:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.87..19.30 rows=10 width=6116) (actual time=0.049..0.132 rows=10 loops=1)
-> Append (cost=0.87..6784732.67 rows=3680878 width=6116) (actual time=0.048..0.130 rows=10 loops=1)
-> Nested Loop (cost=0.87..5172687.85 rows=1956032 width=2934) (actual time=0.048..0.119 rows=10 loops=1)
-> Index Scan Backward using table_b_count on table_b (cost=0.43..191262.84 rows=7941680 width=12) (actual time=0.016..0.033 rows=12 loops=1)
-> Index Scan using table_a_pkey on table_a (cost=0.43..0.63 rows=1 width=2926) (actual time=0.006..0.006 rows=1 loops=12)
Index Cond: (id = table_b.a_id)
Filter: ((NOT bool_1) AND (NOT bool_2) AND (NOT bool_3))
Rows Removed by Filter: 0
-> Subquery Scan on "*SELECT* 2" (cost=0.87..1554519.81 rows=1724847 width=2934) (never executed)
-> Merge Anti Join (cost=0.87..1532959.22 rows=1724847 width=2930) (never executed)
Merge Cond: (table_a_1.id = table_b_1.a_id)
-> Index Scan using table_a_pkey on table_a table_a_1 (cost=0.43..1332577.10 rows=3680879 width=2926) (never executed)
Filter: ((NOT bool_1) AND (NOT bool_2) AND (NOT bool_3))
-> Index Only Scan using table_b_a_id on table_b table_b_1 (cost=0.43..152158.13 rows=7941680 width=4) (never executed)
Heap Fetches: 0
Planning Time: 1.107 ms
Execution Time: 0.244 ms
(17 rows)
We went from 2-3 seconds execution time to under 1ms for the exact same result set. This same technique could come in handy for any time there is a LEFT JOIN
that part of the results can be answered with an index, so rather than using a heavier ORDER BY
for all total results, you can optimize each part individually and combine.
Additionally, since we were applying a LIMIT
to the result set, you can see that the antijoin subselect is not even calculated in this case, since the number of rows we were requesting had already been fulfilled by the intersecting set. It goes without saying that skipping a calculation is always faster than running the calculation, no matter how fast.
I hope this technique comes in handy in your own query optimization journey.
Related Articles
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read
- Smarter Postgres LLM with Retrieval Augmented Generation
6 min read
- Postgres Partitioning with a Default Partition
16 min read