Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Tutorial Instructions
Have you ever needed to reference a table in a subquery? With something like the following, where we try to reference the accounts
table in the subquery:
SELECT
accounts.id,
accounts.name,
last_purchase.*
FROM
accounts
INNER JOIN (SELECT
*
FROM purchases
WHERE account_id = accounts.id
ORDER BY created_at DESC
LIMIT 1
) AS last_purchase ON true;
But, you had the following error:
ERROR: invalid reference to FROM-clause entry for table "accounts"
LINE 9: WHERE account_id = accounts.id
^
HINT: There is an entry for table "accounts", but it cannot be referenced from this part of the query.
When you hear a reference to LATERAL
, it is typically phrased as a LATERAL JOIN
, but LATERAL
is just a keyword that allows us to reference tables from the top-level query in a sub-query. The following is an example of INNER JOIN
using the LATERAL
keyword to find the latest purchase for all accounts (the only difference in this query and the one above is the use of the LATERAL
keyword):
SELECT
accounts.id,
accounts.name,
last_purchase.*
FROM
accounts
INNER JOIN LATERAL (SELECT
*
FROM purchases
WHERE account_id = accounts.id
ORDER BY created_at DESC
LIMIT 1
) AS last_purchase ON true;
The query above returns all records for accounts
, then finds the latest purchases
for each account.id
with a limit set to one. The LATERAL
keyword allows us to reference the accounts
table in the subquery and filter the purchases
table based on values from the accounts
table.
The LATERAL
keyword can be combined with any join type or can be used as an implicit join:
SELECT
accounts.id,
last_purchase.*
FROM
accounts, LATERAL (SELECT
*
FROM purchases
WHERE account_id = accounts.id
ORDER BY created_at DESC
LIMIT 1
) AS last_purchase;
For this example LATERAL
works fine because it’s a relatively small data set. For larger datasets, and this example GROUP BY
is a better use. Just remember that when using LATERAL
, it behaves as a recursive loop. For this example, it is evaluated for as many records as exist in accounts. For that reason, use the following GROUP BY
example if you seek performance or scale.
Prior to LATERAL
this same query was solved using GROUP BY, like the following query. This query uses GROUP BY
within a CTE to find the maximum purchases.created_at
for each account account_id
, then we join the accounts and purchases based on their respective values.
WITH latest_purchase_per_account AS (
SELECT
account_id,
MAX(purchases.created_at) AS created_at
FROM purchases
GROUP BY 1
)
SELECT
accounts.id,
purchases.*
FROM latest_purchase_per_account
INNER JOIN accounts ON latest_purchase_per_account.account_id = accounts.id
INNER JOIN purchases ON latest_purchase_per_account.created_at = purchases.created_at
AND latest_purchase_per_account.account_id = purchases.account_id;
With a large number of rows, I anticipate the GROUP BY
to perform much faster than a LATERAL
. But, each scenario can be slightly different so it’s impossible to generalize. It’s important to know two patterns for solving a problem.
Like most things SQL, LATERAL
solves a simple problem, yet it can be used to solve complex problems. You’ll find it commonly used in GIS functions and JSON. Below, we will find matching sub-elements from a JSON structure using LATERAL
, with a conditional to return all of those in California:
SELECT
accounts.id,
accounts.name,
address_elements.value->>'state' AS state,
address_elements.value->>'city' AS city
FROM
accounts,
LATERAL jsonb_array_elements(accounts.addresses) AS address_elements
WHERE
address_elements.value->>'state' = 'California';
In this example, we're utilizing LATERAL
with the jsonb_array_elements
function to unpack a JSON array from a column in the accounts
table. We then filter the results based on a specific type, allowing us to target certain elements within the JSON structure.
Nested element expansion is the most common usage of LATERAL
. This is because nested elements usually exist as a limited set of values, and because LATERAL
works so well with it.
In the dataset, someone decided to store all the tags for a purchase as a comma separated list. Suppose you want to find all purchases that have a specific tag. You can achieve this by using the unnest
function along with the string_to_array
function to split the tags
field into an array and then unnest those values into their own rows.
Here's an example query to find all purchases with the tag 'electronics':
SELECT
accounts.id AS account_id,
accounts.name AS account_name,
purchases.name AS product_name,
unnested_tags.tag
FROM
accounts
INNER JOIN purchases ON accounts.id = purchases.account_id
JOIN LATERAL unnest(REGEXP_SPLIT_TO_ARRAY(purchases.tags, E',')) AS unnested_tags(tag) ON true
WHERE
unnested_tags.tag = 'electronics';
This query performs the following operations:
REGEXP_SPLIT_TO_ARRAY
function splits the tags
column of the purchases
table into an array using a comma as the delimiter.UNNEST
function transforms the array of tags into separate rowsThis approach provides a way to work with comma-separated values in a database and allows for complex queries on individual values that are part of a delimited string.
A similar strategy can be used for counting the purchases per tag:
SELECT
unnested_tags.tag,
COUNT(*) AS purchases_per_tag
FROM
purchases,
LATERAL unnest(REGEXP_SPLIT_TO_ARRAY(purchases.tags, E',')) AS unnested_tags(tag)
GROUP BY
unnested_tags.tag;
LATERAL
joins in Postgres provide a powerful and flexible way to perform complex queries, particularly when referencing a table in a subquery. Whether you're handling hierarchical data, working with JSON, or dealing with any other scenarios where you need to reference a table in a subquery, the LATERAL
keyword can be an indispensable tool in your SQL toolkit. Explore its usage in your queries, and you'll find it can simplify and optimize your SQL code.
Loading terminal...
Loading terminal...