Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Tutorial Instructions

Advent of Code - Day 3

For day 3 of the AOC (Advent of Code), we are given a file in which each line represents items in a rucksack. There are two compartments, divided equally, and we need to find which item is in common for the sides, generate a score based on what character it is, then sum up the scores.

Let’s peek at our data first:

SELECT * FROM aoc_day3 limit 10;
                      items
-----------------------------------------------
 lflZfgnSnlmmlgGfjGthQPtLNsQhvbHLLpSS
 zrCVDVFMJTCTcCJMwCThWbtbpbWpPbtbHPLQssLsHP
 rBFcrwFzFwwVDcDrzTzJfnRGjllBdGZnnZfhqmdn
 FjpnFRDmbRtnbJrFJmSTsGShWVhGqGVVsmqs
 ZwPvNPdzNZwfzBNLdNNNNcLvhnQhqMTVsTGSWSqGqTdVWhMT
 vgLZHfvLffNLPbggnrbFpJnCbC
 hzJzGjGfqmGtDQtDSvVV
 plpcMBNBcCTlTgCMbvtrsSVsVJDJlrwDQr
 McHBMMcTTHgJnWqnRqjzZnnRzR
 ppvsGZhDGprrSjSllwfZ

So a random assortment of strings, with the limitation that all characters are [a-zA-z] and there are an even number of characters in each row. Our first step is to divide things into two sections. We will use the built-in SQL functions length(), left(), and right() to accomplish this:

SELECT *, left(items, length(items)/2) AS first, right(items, length(items)/2) AS second FROM aoc_day3;
-[ RECORD 1 ]--------------------------------------------
items  | lflZfgnSnlmmlgGfjGthQPtLNsQhvbHLLpSS
first  | lflZfgnSnlmmlgGfjG
second | thQPtLNsQhvbHLLpSS
-[ RECORD 2 ]--------------------------------------------
items  | zrCVDVFMJTCTcCJMwCThWbtbpbWpPbtbHPLQssLsHP
first  | zrCVDVFMJTCTcCJMwCThW
second | btbpbWpPbtbHPLQssLsHP
-[ RECORD 3 ]--------------------------------------------
items  | rBFcrwFzFwwVDcDrzTzJfnRGjllBdGZnnZfhqmdn
first  | rBFcrwFzFwwVDcDrzTzJ
second | fnRGjllBdGZnnZfhqmdn

I'm trying to not repeat any functions or actions when possible during this challenge, but I think we can let the double length() slide. Now we need to find letters that are in common to both the first and second sections. This can be done in many ways, but we'll simply replace any letters in the first section that also exists in the second with an empty character, using the regexp_replace() function:

WITH q AS ( SELECT *, length(items), left(items, length(items)/2) AS first, right(items, length(items)/2) AS second FROM aoc_day3 ) ,r AS (SELECT first, second, regexp_replace(first, '[^'||second||']', '', 'g') AS common FROM q) SELECT * FROM r;
          first           |          second          | common
--------------------------+--------------------------+--------
 lflZfgnSnlmmlgGfjG       | thQPtLNsQhvbHLLpSS       | S
 zrCVDVFMJTCTcCJMwCThW    | btbpbWpPbtbHPLQssLsHP    | W
 rBFcrwFzFwwVDcDrzTzJ     | fnRGjllBdGZnnZfhqmdn     | B
 FjpnFRDmbRtnbJrFJm       | STsGShWVhGqGVVsmqs       | mm
 ZwPvNPdzNZwfzBNLdNNNNcLv | hnQhqMTVsTGSWSqGqTdVWhMT | dd
 vgLZHfvLffNLP            | bggnrbFpJnCbC            | g
 hzJzGjGfqm               | GtDQtDSvVV               | GG
 plpcMBNBcCTlTgCMb        | vtrsSVsVJDJlrwDQr        | ll
 McHBMMcTTHgJn            | WqnRqjzZnnRzR            | n
 ppvsGZhDGp               | rrSjSllwfZ               | Z

We will use the left() function again to grab a single character, then do a simple mapping to get the score for that character. The puzzle states that a-z are worth 1-26 and A-Z are worth 27-52. Note that we need to use collation 'C', to ensure our "less than" check works in a case-insensitive manner. The ascii() function has the curious effect of only converting the first letter of any string passed to it, so we pass the duplicated letters directly to it. Finally, we add up all the score to get our final result.

WITH q AS ( SELECT *, left(items, length(items)/2) AS first, right(items, length(items)/2) AS second FROM aoc_day3 ) ,r AS (SELECT regexp_replace(first, '[^'||second||']', '', 'g') AS common FROM q) ,s AS ( SELECT ascii(common) - (case when left(common,1) < 'a' COLLATE "C" THEN 38 else 96 end) AS score FROM r) SELECT sum(score) FROM s;
 sum
-----
 8401

This puzzle has a second part. Using the same input, we now need to group the lines into groups of three, and find out which character is in common across all three lines, then generate a score for that character. The first trick is grouping things by threes. We are going to approach this with a windowing function. First, we want to grab out current line, and the values from the two rows "before" it. I changed it to left(x,5) just to make the output fit a little better:

SELECT left(items,5) AS one, left( LAG(items,1) over(), 5) AS two, left( LAG(items,2) over(), 5) AS three FROM aoc_day3;
  one  |  two  | three
-------+-------+-------
 lflZf |       | 
 zrCVD | lflZf | 
 rBFcr | zrCVD | lflZf
 FjpnF | rBFcr | zrCVD
 ZwPvN | FjpnF | rBFcr
 vgLZH | ZwPvN | FjpnF
 hzJzG | vgLZH | ZwPvN
 plpcM | hzJzG | vgLZH
 McHBM | plpcM | hzJzG
 ppvsG | McHBM | plpcM

When we get to the third row, we have also captured the two rows before it. In other words, we have groups of three that we can manipulate. The next issue is to only pick every third row. As this is coming from a file via file_fdw, we can be sure that the order stays consistent. We will use a sequence to allow counting by threes and filter out the other two rows in every triad. To be on the safe side, we reset the sequence it case it already exists, to force Postgres to start counting from the number one:

CREATE SEQUENCE if not exists aoc; SELECT setval('aoc', 1, false); WITH p AS (SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three FROM aoc_day3) SELECT left(one,5), left(two,5), left(three,5) FROM p WHERE 0 = nextval('aoc')%3;
 left  | left  | left
-------+-------+-------
 rBFcr | zrCVD | lflZf
 vgLZH | ZwPvN | FjpnF
 McHBM | plpcM | hzJzG

Next we need to find, for each row, the character that all three share. We will do this similar to the regexpreplace before, but this time we need to find all shared characters between two of them, and then check \_that returned list against the final one. The setval is crucial here, so we add that to our statement as well:

WITH setup AS (SELECT setval('aoc',1,false)) ,p AS ( SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three FROM aoc_day3, setup) ,q AS (SELECT * FROM p WHERE 0=nextval('aoc')%3) ,r AS ( SELECT *, regexp_replace(regexp_replace(one, '[^'||two||']', '','g'), '[^'||three||']', '','g') AS common FROM q) SELECT left(one,5), common FROM r;
 left  | common
-------+--------
 rBFcr | h
 vgLZH | nn
 McHBM | J

Finally, we simply use our code from before to grab the first character from the common column, assign it a score, and then tally the whole thing up:

WITH setup AS (SELECT setval('aoc',1,false)) ,p AS ( SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three FROM aoc_day3, setup) ,q AS (SELECT * FROM p WHERE 0=nextval('aoc')%3) ,r AS ( SELECT *, regexp_replace(regexp_replace(one, '[^'||two||']', '','g'), '[^'||three||']', '','g') AS common FROM q) ,s AS ( SELECT ascii(common) - (CASE WHEN left(common,1) < 'a' COLLATE "C" THEN 38 ELSE 96 END) AS score FROM r) SELECT sum(score) from s;
 sum
-----
  2641

All done! Onwards to Day 4...

Loading terminal...

Loading terminal...