How to Solve Advent of Code 2022 Using Postgres - Day 1
Time for the annual Advent of Code challenge for 2022! If you are not familiar with AOC, every December a new puzzle is added each day, getting harder as it goes along, and the challenge is to solve it any way you can by transforming an input file into an answer. I'm going to use PostgreSQL and solve things solely with the use of SQL.
Spoiler Alert
This article will contain spoilers both on how I solved Day 1's challenge "Calorie Counting", as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.
AOC Day 1
I am going to attempt to solve these challenges using SQL. Specifically, using Postgres and all its wonderful tricks to get as far as I can. Let's jump right into Day 1. The first challenge is to find some information about a text file that looks like this:
4428
3773
1076
8063
10386
5705
8397
1084
7661
We need to find the group of items (an empty line starts a new group) with the highest number. The bits of Postgres tech used to solve this puzzle are:
- file_fdw, a built-in foreign data wrapper for Postgres
- The
CASE
command - A sequence to help us separate the rows into groups
- The aggregate function sum inside a window to allow advanced counting
Hands on Tutorial
We've also loaded a tutorial for Day's 1 challenge if you want to try it with a pre-loaded data set.
Reading the input
The first step is to download the input and save it to a local file. In this case, it is 2,250 lines long and stored as /tmp/aoc2022.day1.input
While we could use the COPY
command to slurp it in, this is an excellent case for the file_fdw
extension, which allows you to read in external files as if they were Postgres tables! This is done with the amazing FDW (foreign data wrapper) system. So, from inside the psql prompt, we need to install the extension, create a foreign server, and then create a foreign table that points to our text file:
CREATE EXTENSION file_fdw;
CREATE SERVER aoc2022 foreign data wrapper file_fdw;
CREATE FOREIGN TABLE aoc_day1 (calorie int)
SERVER aoc2022 options(filename '/tmp/aoc2022.day1.input', null '');
Note that we also needed to explicitly tell Postgres that an empty line should become a null. Without that option, the table would throw an error when we tried to read it. Now that we have a table with one row per line, how can we group it? Grouping similar rows together is a job for window functions. In this case, we need to somehow put all rows together until we hit a null, then we need to start a new group. Let's create a pseudo-column and use a sequence to increment it only when we find a null value in our "calorie" column:
CREATE SEQUENCE aoc;
-- Call it once so that `currval` works. The starting number can be anything.
SELECT setval('aoc', 1);
SELECT calorie, CASE WHEN calorie is null THEN nextval('aoc') ELSE currval('aoc') END
FROM aoc_day1;
calorie | currval
---------+---------
4428 | 1
3773 | 1
1076 | 1
☃ | 2
8063 | 2
10386 | 2
☃ | 3
5705 | 3
8397 | 3
1084 | 3
7661 | 3
Now we can group those together, and find out the sum of each one by using the SUM OVER()
syntax. To keep it simple, we will use an ugly subselect:
SELECT SUM(calorie) OVER(partition by currval) FROM
(SELECT calorie, CASE WHEN calorie is null THEN nextval('aoc') ELSE currval('aoc') END
FROM aoc_day1
) x;
sum
-------
9277
9277
9277
18449
18449
18449
22847
22847
22847
22847
22847
Note that we don't care which row it came from, nor do we care about the duplicate entries. All we care about is the maximum value in this new table. Rather than a subselect, let's use a CTE to make things look better. Since one of the goals is to do this in as few SQL statements as possible, let's put the setval()
into the top of the CTE:
WITH
setup AS (SELECT setval('aoc',1)),
x AS (SELECT calorie, CASE WHEN calorie is null THEN nextval('aoc') ELSE currval('aoc') END
FROM setup, aoc_day1),
y AS (SELECT sum(calorie) OVER(partition by currval) from x)
SELECT max(sum) FROM y;
max
-------
22847
Voila! This returns the highest combined number for all the groups. The actual puzzle had 2,250 lines of input, but that was no problem for Postgres. Onwards to Day 2 of the Advent of Code challenge.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read