Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Tutorial Instructions

LATERAL JOIN

LATERAL JOINs

Have you ever needed to reference a table in a subquery? With something like the following, where we try to reference the accounts table in the subquery:

SELECT accounts.id, accounts.name, last_purchase.* FROM accounts INNER JOIN (SELECT * FROM purchases WHERE account_id = accounts.id ORDER BY created_at DESC LIMIT 1 ) AS last_purchase ON true;

But, you had the following error:

ERROR:  invalid reference to FROM-clause entry for table "accounts"
LINE 9:  WHERE account_id = accounts.id
                            ^
HINT:  There is an entry for table "accounts", but it cannot be referenced from this part of the query.

Enter LATERAL

When you hear a reference to LATERAL, it is typically phrased as a LATERAL JOIN, but LATERAL is just a keyword that allows us to reference tables from the top-level query in a sub-query. The following is an example of INNER JOIN using the LATERAL keyword to find the latest purchase for all accounts (the only difference in this query and the one above is the use of the LATERAL keyword):

SELECT accounts.id, accounts.name, last_purchase.* FROM accounts INNER JOIN LATERAL (SELECT * FROM purchases WHERE account_id = accounts.id ORDER BY created_at DESC LIMIT 1 ) AS last_purchase ON true;

The query above returns all records for accounts, then finds the latest purchases for each account.id with a limit set to one. The LATERAL keyword allows us to reference the accounts table in the subquery and filter the purchases table based on values from the accounts table.

The LATERAL keyword can be combined with any join type or can be used as an implicit join:

SELECT accounts.id, last_purchase.* FROM accounts, LATERAL (SELECT * FROM purchases WHERE account_id = accounts.id ORDER BY created_at DESC LIMIT 1 ) AS last_purchase;

For this example LATERAL works fine because it’s a relatively small data set. For larger datasets, and this example GROUP BY is a better use. Just remember that when using LATERAL, it behaves as a recursive loop. For this example, it is evaluated for as many records as exist in accounts. For that reason, use the following GROUP BY example if you seek performance or scale.

Using GROUP BY to solve the problem similarly

Prior to LATERAL this same query was solved using GROUP BY, like the following query. This query uses GROUP BY within a CTE to find the maximum purchases.created_at for each account account_id, then we join the accounts and purchases based on their respective values.

WITH latest_purchase_per_account AS ( SELECT account_id, MAX(purchases.created_at) AS created_at FROM purchases GROUP BY 1 ) SELECT accounts.id, purchases.* FROM latest_purchase_per_account INNER JOIN accounts ON latest_purchase_per_account.account_id = accounts.id INNER JOIN purchases ON latest_purchase_per_account.created_at = purchases.created_at AND latest_purchase_per_account.account_id = purchases.account_id;

With a large number of rows, I anticipate the GROUP BY to perform much faster than a LATERAL. But, each scenario can be slightly different so it’s impossible to generalize. It’s important to know two patterns for solving a problem.

Manipulating JSONB with LATERAL

Like most things SQL, LATERAL solves a simple problem, yet it can be used to solve complex problems. You’ll find it commonly used in GIS functions and JSON. Below, we will find matching sub-elements from a JSON structure using LATERAL, with a conditional to return all of those in California:

SELECT accounts.id, accounts.name, address_elements.value->>'state' AS state, address_elements.value->>'city' AS city FROM accounts, LATERAL jsonb_array_elements(accounts.addresses) AS address_elements WHERE address_elements.value->>'state' = 'California';

In this example, we're utilizing LATERAL with the jsonb_array_elements function to unpack a JSON array from a column in the accounts table. We then filter the results based on a specific type, allowing us to target certain elements within the JSON structure.

Nested element expansion is the most common usage of LATERAL. This is because nested elements usually exist as a limited set of values, and because LATERAL works so well with it.

Manipulating Comma Separated Text with LATERAL

In the dataset, someone decided to store all the tags for a purchase as a comma separated list. Suppose you want to find all purchases that have a specific tag. You can achieve this by using the unnest function along with the string_to_array function to split the tags field into an array and then unnest those values into their own rows.

Here's an example query to find all purchases with the tag 'electronics':

SELECT accounts.id AS account_id, accounts.name AS account_name, purchases.name AS product_name, unnested_tags.tag FROM accounts INNER JOIN purchases ON accounts.id = purchases.account_id JOIN LATERAL unnest(REGEXP_SPLIT_TO_ARRAY(purchases.tags, E',')) AS unnested_tags(tag) ON true WHERE unnested_tags.tag = 'electronics';

This query performs the following operations:

  1. Splitting string using a regex: The REGEXP_SPLIT_TO_ARRAY function splits the tags column of the purchases table into an array using a comma as the delimiter.
  2. Unnesting Tags: The UNNEST function transforms the array of tags into separate rows
  3. Filter by tag: Finally, the query filters the results to include only the rows with the specified tag.

This approach provides a way to work with comma-separated values in a database and allows for complex queries on individual values that are part of a delimited string.

A similar strategy can be used for counting the purchases per tag:

SELECT unnested_tags.tag, COUNT(*) AS purchases_per_tag FROM purchases, LATERAL unnest(REGEXP_SPLIT_TO_ARRAY(purchases.tags, E',')) AS unnested_tags(tag) GROUP BY unnested_tags.tag;

Have fun!

LATERAL joins in Postgres provide a powerful and flexible way to perform complex queries, particularly when referencing a table in a subquery. Whether you're handling hierarchical data, working with JSON, or dealing with any other scenarios where you need to reference a table in a subquery, the LATERAL keyword can be an indispensable tool in your SQL toolkit. Explore its usage in your queries, and you'll find it can simplify and optimize your SQL code.

Loading terminal...

Loading terminal...