LATERAL JOINs
Have you ever needed to reference a table in a subquery? With something like the
following, where we try to reference the accounts table in the subquery:
SELECT
accounts.id,
accounts.name,
last_purchase.*
FROM
accounts
INNER JOIN (SELECT
*
FROM purchases
WHERE account_id = accounts.id
ORDER BY created_at DESC
LIMIT 1
) AS last_purchase ON true;
But, you had the following error:
ERROR: invalid reference to FROM-clause entry for table "accounts"
LINE 9: WHERE account_id = accounts.id
^
HINT: There is an entry for table "accounts", but it cannot be referenced from this part of the query.
Enter LATERAL
When you hear a reference to LATERAL, it is typically phrased as a
LATERAL JOIN, but LATERAL is just a keyword that allows us to reference
tables from the top-level query in a sub-query. The following is an example of
INNER JOIN using the LATERAL keyword to find the latest purchase for all
accounts (the only difference in this query and the one above is the use of the
LATERAL keyword):
SELECT
accounts.id,
accounts.name,
last_purchase.*
FROM
accounts
INNER JOIN LATERAL (SELECT
*
FROM purchases
WHERE account_id = accounts.id
ORDER BY created_at DESC
LIMIT 1
) AS last_purchase ON true;
The query above returns all records for accounts, then finds the latest
purchases for each account.id with a limit set to one. The LATERAL keyword
allows us to reference the accounts table in the subquery and filter the
purchases table based on values from the accounts table.
The LATERAL keyword can be combined with any join type or can be used as an
implicit join:
SELECT
accounts.id,
last_purchase.*
FROM
accounts, LATERAL (SELECT
*
FROM purchases
WHERE account_id = accounts.id
ORDER BY created_at DESC
LIMIT 1
) AS last_purchase;
For this example LATERAL works fine because it’s a relatively small data set.
For larger datasets, and this example GROUP BY is a better use. Just remember
that when using LATERAL, it behaves as a recursive loop. For this example, it
is evaluated for as many records as exist in accounts. For that reason, use the
following GROUP BY example if you seek performance or scale.
Using GROUP BY to solve the problem similarly
Prior to LATERAL this same query was solved using GROUP BY, like the following
query. This query uses GROUP BY within a CTE to find the maximum
purchases.created_at for each account account_id, then we join the accounts
and purchases based on their respective values.
WITH latest_purchase_per_account AS (
SELECT
account_id,
MAX(purchases.created_at) AS created_at
FROM purchases
GROUP BY 1
)
SELECT
accounts.id,
purchases.*
FROM latest_purchase_per_account
INNER JOIN accounts ON latest_purchase_per_account.account_id = accounts.id
INNER JOIN purchases ON latest_purchase_per_account.created_at = purchases.created_at
AND latest_purchase_per_account.account_id = purchases.account_id;
With a large number of rows, I anticipate the GROUP BY to perform much faster
than a LATERAL. But, each scenario can be slightly different so it’s
impossible to generalize. It’s important to know two patterns for solving a
problem.
Manipulating JSONB with LATERAL
Like most things SQL, LATERAL solves a simple problem, yet it can be used to
solve complex problems. You’ll find it commonly used in GIS functions and JSON.
Below, we will find matching sub-elements from a JSON structure using LATERAL,
with a conditional to return all of those in California:
SELECT
accounts.id,
accounts.name,
address_elements.value->>'state' AS state,
address_elements.value->>'city' AS city
FROM
accounts,
LATERAL jsonb_array_elements(accounts.addresses) AS address_elements
WHERE
address_elements.value->>'state' = 'California';
In this example, we're utilizing LATERAL with the jsonb_array_elements
function to unpack a JSON array from a column in the accounts table. We then
filter the results based on a specific type, allowing us to target certain
elements within the JSON structure.
Nested element expansion is the most common usage of LATERAL. This is because
nested elements usually exist as a limited set of values, and because LATERAL
works so well with it.
Manipulating Comma Separated Text with LATERAL
In the dataset, someone decided to store all the tags for a purchase as a comma
separated list. Suppose you want to find all purchases that have a specific tag.
You can achieve this by using the unnest function along with the
string_to_array function to split the tags field into an array and then
unnest those values into their own rows.
Here's an example query to find all purchases with the tag 'electronics':
SELECT
accounts.id AS account_id,
accounts.name AS account_name,
purchases.name AS product_name,
unnested_tags.tag
FROM
accounts
INNER JOIN purchases ON accounts.id = purchases.account_id
JOIN LATERAL unnest(REGEXP_SPLIT_TO_ARRAY(purchases.tags, E',')) AS unnested_tags(tag) ON true
WHERE
unnested_tags.tag = 'electronics';
This query performs the following operations:
- Splitting string using a regex: The
REGEXP_SPLIT_TO_ARRAYfunction splits thetagscolumn of thepurchasestable into an array using a comma as the delimiter. - Unnesting Tags: The
UNNESTfunction transforms the array of tags into separate rows - Filter by tag: Finally, the query filters the results to include only the rows with the specified tag.
This approach provides a way to work with comma-separated values in a database and allows for complex queries on individual values that are part of a delimited string.
A similar strategy can be used for counting the purchases per tag:
SELECT
unnested_tags.tag,
COUNT(*) AS purchases_per_tag
FROM
purchases,
LATERAL unnest(REGEXP_SPLIT_TO_ARRAY(purchases.tags, E',')) AS unnested_tags(tag)
GROUP BY
unnested_tags.tag;
Have fun!
LATERAL joins in Postgres provide a powerful and flexible way to perform
complex queries, particularly when referencing a table in a subquery. Whether
you're handling hierarchical data, working with JSON, or dealing with any other
scenarios where you need to reference a table in a subquery, the LATERAL
keyword can be an indispensable tool in your SQL toolkit. Explore its usage in
your queries, and you'll find it can simplify and optimize your SQL code.