Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Tutorial Instructions
This article will contain spoilers both on how to solve Day 1's challenge "Calorie Counting" in SQL. Data has been loaded for you already in a table called calories
.
We have preloaded our data in a table calories
with a single text column, calories_count
. This data file used a blank line for a group separator, which we will need to be aware of in our solution.
Now that we have a table with one row per line, how can we group it? Grouping similar rows together is a job for window functions. In this case, we need to somehow put all rows together until we hit an empty line, then we need to start a new group. Let's create a pseudo-column and use a sequence to increment it only when we find an empty value in our calories_count
column. We will also call setval()
on this column in order to define an initial value so the currval()
function will work.
CREATE SEQUENCE aoc;
SELECT setval('aoc', 1);
SELECT calories_count,
CASE WHEN calories_count = '' THEN nextval('aoc')
ELSE currval('aoc') END
FROM calories LIMIT 10;
calories_count | currval
----------------+---------
9686 | 1
10178 | 1
3375 | 1
9638 | 1
6318 | 1
4978 | 1
5988 | 1
6712 | 1
| 2
10422 | 2
As we can see, the currval
changes when it detects a different group. Let’s use this property to find out the sum of each group. Note that since this is a text
field, we will need to convert to an int
in order to sum these fields. Additionally, since the blank line cannot convert to an int
, we will explicitly detect when the row we’re looking at is the separator and consider its value 0
.
SELECT SUM(calories_count) OVER(partition by currval) FROM
(SELECT CASE WHEN calories_count = '' THEN 0
ELSE calories_count :: int END AS calories_count,
CASE WHEN calories_count = '' THEN nextval('aoc')
ELSE currval('aoc') END
FROM calories
) x LIMIT 10;
sum
-------
56873
56873
56873
56873
56873
56873
56873
56873
43456
43456
In this case since we are using a window function this will calculate the sum of those values once for each row, however the total for the group will be the same.
Note that we don't care which row it came from, nor do we care about the duplicate entries. All we care about is the maximum value in this new table. Rather than a subselect, let's use a CTE to make things look better. Since one of the goals is to do this in as few SQL statements as possible, let's put the setval()
into the top of the CTE:
WITH setup AS (SELECT setval('aoc',1)),
x AS (SELECT CASE WHEN calories_count = '' THEN 0
ELSE calories_count :: int END AS calories_count,
CASE WHEN calories_count = '' THEN nextval('aoc')
ELSE currval('aoc') END
FROM setup, calories),
y AS (SELECT sum(calories_count) OVER(partition by currval) FROM x)
SELECT max(sum) FROM y;
max
-------
69206
(1 row)
Voila! This returns the highest combined number for all the groups.
Loading terminal...
Loading terminal...