Iterators in PostgreSQL with Lateral Joins
There you are writing some SQL, having a great time. Uh oh, you need to iterate over each item in a result set and apply a function. You think, "Now I am going to have to write a stored procedure." Well today's post will give you an alternative by using lateral joins in Postgres.
What is a Lateral Join?
Lateral... See what I did there? You are probably familiar with normal database joins, which are usually used to match up a column in one table with a column in another table to bring the data from both tables together. When the keyword LATERAL is added to your join, the output will now apply the right hand part of the join to every record in the left part of the join.
Key Takeaway
Here are the two pieces of "magic" which can help you think about what a lateral provides:
- We run everything after the lateral for each row returned before the lateral.
- All the columns before the lateral are actually available to use after the lateral.
This online class has a nice clear example that uses generate_series
to clear demonstrate this effect. generate_series(x, y)
generates a set of numbers starting with x and ending with y (inclusive) with a step size of 1.
So if we look at the example SQL in the exercise you can see this in action:
SELECT *
FROM generate_series(1,4) cross join lateral generate_series(1,generate_series.generate_series) as g2;
The left side of the join is generating a series from 1 to 4 while the right side is taking the number from the left side and using it as the max number to generate in a new series. If you look at the output the effect is quite clear.
generate_series | gs2
-----------------+-----
1 | 1
2 | 1
2 | 2
3 | 1
3 | 2
3 | 3
4 | 1
4 | 2
4 | 3
4 | 4
For 1 on the left side we get a row with 1 on the right side. But then for 2 on the left side, first we get a row with 2 on the left and 1 and then we get another row with for the left and 2 for the right. This pattern continues until we get through all 4 elements generated on the left side.
You are probably saying, "That's cute and all but can you show how this might be useful in real life?"
Ask and you shall receive, let's look at some helpful queries.
Real Life Examples
I am not going to go too in depth here but one example is having a user defined function that returns more than 1 row. For example, what if you had a function that generated "top 3 next movie recommendations per user" (movie_rec will be the name of the function).
You could write SQL like this:
SELECT users.name,
recc.name,
recc.rank
FROM users CROSS JOIN LATERAL movie_rec(users.user_id) as recc(name, rank)
Where user_id is the user's id from the users table. Without the lateral in your statement, there would be no way to pass the user_id into your function.
Another great example is returning the top N features. Again if we took the movie example and wanted to look at the top 5 movies streamed by zip code of the user. Assume we have a table geo which is just geographies and a table streams which is the name and the count of all streams per zip code. We could write :
SELECT geo.zipcode,
geo.state,
movie.name
FROM geo CROSS JOIN LATERAL
(SELECT movie_name
FROM streams
WHERE geo.zipcode = streams.zipcode
ORDER BY streams.county DESC limit 5) AS movie(name);
Wrap Up
While today was about Lateral joins, I would also suggest you learn about the power of LATERAL with subqueries. If you add a LATERAL to your subqueries then each subquery can share column references. Take a look at this nice article for a good example. Just be aware you could achieve the same reuse with CTEs (but that is a topic for another day).
I hope you are intrigued enough to now go and try the Lateral Joins tutorial on our learning portal with your own two hands.
Would love to hear if you find the hands-on exercise useful or your fun adventures with Lateral joins. Leave your comments or hints below!
Thanks and happy iterating!
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read