Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Tutorial Instructions
For day 3 of the AOC (Advent of Code), we are given a file in which each line represents items in a rucksack. There are two compartments, divided equally, and we need to find which item is in common for the sides, generate a score based on what character it is, then sum up the scores.
Let’s peek at our data first:
SELECT * FROM aoc_day3 limit 10;
items
-----------------------------------------------
lflZfgnSnlmmlgGfjGthQPtLNsQhvbHLLpSS
zrCVDVFMJTCTcCJMwCThWbtbpbWpPbtbHPLQssLsHP
rBFcrwFzFwwVDcDrzTzJfnRGjllBdGZnnZfhqmdn
FjpnFRDmbRtnbJrFJmSTsGShWVhGqGVVsmqs
ZwPvNPdzNZwfzBNLdNNNNcLvhnQhqMTVsTGSWSqGqTdVWhMT
vgLZHfvLffNLPbggnrbFpJnCbC
hzJzGjGfqmGtDQtDSvVV
plpcMBNBcCTlTgCMbvtrsSVsVJDJlrwDQr
McHBMMcTTHgJnWqnRqjzZnnRzR
ppvsGZhDGprrSjSllwfZ
So a random assortment of strings, with the limitation that all characters are
[a-zA-z]
and there are an even number of characters in each row. Our first
step is to divide things into two sections. We will use the built-in SQL
functions length()
, left()
, and right()
to accomplish this:
SELECT *, left(items, length(items)/2) AS first, right(items, length(items)/2) AS second
FROM aoc_day3;
-[ RECORD 1 ]--------------------------------------------
items | lflZfgnSnlmmlgGfjGthQPtLNsQhvbHLLpSS
first | lflZfgnSnlmmlgGfjG
second | thQPtLNsQhvbHLLpSS
-[ RECORD 2 ]--------------------------------------------
items | zrCVDVFMJTCTcCJMwCThWbtbpbWpPbtbHPLQssLsHP
first | zrCVDVFMJTCTcCJMwCThW
second | btbpbWpPbtbHPLQssLsHP
-[ RECORD 3 ]--------------------------------------------
items | rBFcrwFzFwwVDcDrzTzJfnRGjllBdGZnnZfhqmdn
first | rBFcrwFzFwwVDcDrzTzJ
second | fnRGjllBdGZnnZfhqmdn
I'm trying to not repeat any functions or actions when possible during this
challenge, but I think we can let the double length() slide. Now we need to find
letters that are in common to both the first and second sections. This can be
done in many ways, but we'll simply replace any letters in the first section
that also exists in the second with an empty character, using the
regexp_replace()
function:
WITH q AS (
SELECT *, length(items), left(items, length(items)/2) AS first, right(items, length(items)/2) AS second
FROM aoc_day3
)
,r AS (SELECT first, second, regexp_replace(first, '[^'||second||']', '', 'g') AS common FROM q)
SELECT * FROM r;
first | second | common
--------------------------+--------------------------+--------
lflZfgnSnlmmlgGfjG | thQPtLNsQhvbHLLpSS | S
zrCVDVFMJTCTcCJMwCThW | btbpbWpPbtbHPLQssLsHP | W
rBFcrwFzFwwVDcDrzTzJ | fnRGjllBdGZnnZfhqmdn | B
FjpnFRDmbRtnbJrFJm | STsGShWVhGqGVVsmqs | mm
ZwPvNPdzNZwfzBNLdNNNNcLv | hnQhqMTVsTGSWSqGqTdVWhMT | dd
vgLZHfvLffNLP | bggnrbFpJnCbC | g
hzJzGjGfqm | GtDQtDSvVV | GG
plpcMBNBcCTlTgCMb | vtrsSVsVJDJlrwDQr | ll
McHBMMcTTHgJn | WqnRqjzZnnRzR | n
ppvsGZhDGp | rrSjSllwfZ | Z
We will use the left() function again to grab a single character, then do a
simple mapping to get the score for that character. The puzzle states that a-z
are worth 1-26 and A-Z
are worth 27-52. Note that we need to use collation
'C', to ensure our "less than" check works in a case-insensitive manner. The
ascii() function has the curious effect of only converting the first letter of
any string passed to it, so we pass the duplicated letters directly to it.
Finally, we add up all the score to get our final result.
WITH q AS (
SELECT *, left(items, length(items)/2) AS first, right(items, length(items)/2) AS second
FROM aoc_day3
)
,r AS (SELECT regexp_replace(first, '[^'||second||']', '', 'g') AS common FROM q)
,s AS (
SELECT ascii(common) -
(case when left(common,1) < 'a' COLLATE "C" THEN 38 else 96 end) AS score
FROM r)
SELECT sum(score) FROM s;
sum
-----
8401
This puzzle has a second part. Using the same input, we now need to group the
lines into groups of three, and find out which character is in common across all
three lines, then generate a score for that character. The first trick is
grouping things by threes. We are going to approach this with a windowing
function. First, we want to grab out current line, and the values from the two
rows "before" it. I changed it to left(x,5)
just to make the output fit a
little better:
SELECT left(items,5) AS one,
left( LAG(items,1) over(), 5) AS two,
left( LAG(items,2) over(), 5) AS three
FROM aoc_day3;
one | two | three
-------+-------+-------
lflZf | |
zrCVD | lflZf |
rBFcr | zrCVD | lflZf
FjpnF | rBFcr | zrCVD
ZwPvN | FjpnF | rBFcr
vgLZH | ZwPvN | FjpnF
hzJzG | vgLZH | ZwPvN
plpcM | hzJzG | vgLZH
McHBM | plpcM | hzJzG
ppvsG | McHBM | plpcM
When we get to the third row, we have also captured the two rows before it. In other words, we have groups of three that we can manipulate. The next issue is to only pick every third row. As this is coming from a file via file_fdw, we can be sure that the order stays consistent. We will use a sequence to allow counting by threes and filter out the other two rows in every triad. To be on the safe side, we reset the sequence it case it already exists, to force Postgres to start counting from the number one:
CREATE SEQUENCE if not exists aoc;
SELECT setval('aoc', 1, false);
WITH p AS (SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three FROM aoc_day3)
SELECT left(one,5), left(two,5), left(three,5)
FROM p
WHERE 0 = nextval('aoc')%3;
left | left | left
-------+-------+-------
rBFcr | zrCVD | lflZf
vgLZH | ZwPvN | FjpnF
McHBM | plpcM | hzJzG
Next we need to find, for each row, the character that all three share. We will do this similar to the regexpreplace before, but this time we need to find all shared characters between two of them, and then check \_that returned list against the final one. The setval is crucial here, so we add that to our statement as well:
WITH setup AS (SELECT setval('aoc',1,false))
,p AS (
SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three
FROM aoc_day3, setup)
,q AS (SELECT * FROM p WHERE 0=nextval('aoc')%3)
,r AS (
SELECT *,
regexp_replace(regexp_replace(one, '[^'||two||']', '','g'), '[^'||three||']', '','g') AS common
FROM q)
SELECT left(one,5), common FROM r;
left | common
-------+--------
rBFcr | h
vgLZH | nn
McHBM | J
Finally, we simply use our code from before to grab the first character from the common column, assign it a score, and then tally the whole thing up:
WITH setup AS (SELECT setval('aoc',1,false))
,p AS (
SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three
FROM aoc_day3, setup)
,q AS (SELECT * FROM p WHERE 0=nextval('aoc')%3)
,r AS (
SELECT *,
regexp_replace(regexp_replace(one, '[^'||two||']', '','g'), '[^'||three||']', '','g') AS common
FROM q)
,s AS (
SELECT ascii(common) -
(CASE WHEN left(common,1) < 'a' COLLATE "C" THEN 38 ELSE 96 END) AS score
FROM r)
SELECT sum(score) from s;
sum
-----
2641
All done! Onwards to Day 4...
Loading terminal...
Loading terminal...