How to Solve Advent of Code 2022 Using Postgres - Day 6
Spoiler Alert
This article will contain spoilers both on how I solved 2022 Day 6's challenge "Tuning Trouble" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.
Hands on Tutorial
We've also loaded a tutorial for Day 6's challenge if you want to try it with a pre-loaded data set.
AOC Day 6
This day was a fairly easy one. The tech used this time:
- file_fdw, a built-in foreign data wrapper for Postgres
- The PL/pgSQL language, via a
DO
anonymous code block - The
regexp_split_to_table
andstrpos
functions - The
lag()
function COUNT(DISTINCT)
as a quick way to check for duplicates
The goal is to find a certain sequence of letters inside a stream of characters that looks like this:
nfddjzjjjmrjjfttzctzzhqzqbbvhhcfcpcqpcqpccsmsvsswbwzw
The actual string was 4095 characters long. Our job with this challenge is to find the first point in which four different characters appear. There are a lot of different ways to solve this. As we are using SQL, I used a quick CTE (Common Table Expression) to break it into a table in which each row holds a single character. The incredibly useful regexp_split_to_table
function is great for this. Once we have one character per row, multiple calls to the lag()
window function allows us to add characters from the previous three lines, yielding this:
WITH x AS (SELECT regexp_split_to_table(line, '') AS c FROM aoc_day6)
,y as (SELECT c AS c1, lag(c) over() as c2, lag(c,2) over() as c3, lag(c,3) over () as c4 FROM x)
SELECT * FROM y LIMIT 10;
c1 | c2 | c3 | c4
----+----+----+----
n | ☃ | ☃ | ☃
f | n | ☃ | ☃
d | f | n | ☃
d | d | f | n
j | d | d | f
z | j | d | d
j | z | j | d
j | j | z | j
j | j | j | z
m | j | j | j
(10 rows)
Then it's just a matter of making sure each of the four is unique. All we need to do is ensure that any of columns 1 through 4 is not the same as the others. For a small list like this, I just threw it into the WHERE clause. Then we can find our start-of-packet marker by using strpos
to navigate to the correct place in the stream right after where the distinct-four characters started:
WITH x AS (SELECT regexp_split_to_table(line, '') AS c FROM aoc_day6)
,y as (SELECT c AS c1, lag(c) over() as c2, lag(c,2) over() as c3, lag(c,3) over () as c4 FROM x)
,z as (
SELECT c4||c3||c2||c1 AS sopm FROM y
WHERE c1<>c2 and c1<>c3 AND c1<>c4 and c2<>c3 AND c3<>c4 AND c2<>c4 LIMIT 1
)
SELECT z.sopm AS "Start of packet marker", 3+strpos(line, sopm) AS value FROM aoc_day6, z;
Start of packet marker | value
------------------------+-------
qgds | 1210
Is this the "best" solution? Probably not. The fastest to run? Nope. But it was the quickest to write. :)
Part two was an extension of the same challenge - rather than 4 unique characters, we need to look for 14. Obviously, the solution above won't scale without a crazy amount of conditions in our WHERE clause, so a different approach is needed. Let's slide through the string, 14 characters at a time, until we find 14 characters that are unique. For every 14-character block we find, we will put them into a table with 14 rows, then run COUNT(DISTINCT)
to find out how many distinct characters are in the string. If there are 14, we have found our answer.
Looping generally requires either a recursive function, or a language such as plpgsql. While it is possible to solve many of these solutions using recursive SQL, they are really painful and annoying to write. A task like this is overkill for a full plpgsql function, as we only need to run it once. Luckily, Postgres provides a way to do "one off" functions, the DO command. By default, the language used by DO
is PG/pgSQL, but you can specify others. We will use it to loop through and pick a different starting character, and keep looping a crazy number of times, until we find an answer. Today's theme, as you may have guessed, is "quick and dirty" - and DO
commands are great for that! Here is the final SQL:
DO $$
DECLARE
sopm text; -- start of packetmarker
myanswer text;
BEGIN
FOR q IN 1..10000 LOOP
WITH x AS (SELECT substring(line FROM q FOR 14) as c FROM aoc_day6)
,y AS (SELECT regexp_split_to_table(c,'') AS d FROM x)
,z AS (SELECT x.c, COUNT(DISTINCT d) FROM x,y GROUP BY x.c)
SELECT into sopm z.c FROM z WHERE count=14;
IF sopm IS NOT NULL THEN
SELECT INTO myanswer 13+strpos(line, sopm) FROM aoc_day6;
RAISE NOTICE 'Characters before start-of-message marker: %', myanswer;
EXIT;
END IF;
END LOOP;
END
$$;
NOTICE: Characters before start-of-message marker: 3476
Hopefully Day 7 is a little more challenging for a pure Postgres solution (spoiler: it is).
Related Articles
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read
- Smarter Postgres LLM with Retrieval Augmented Generation
6 min read
- Postgres Partitioning with a Default Partition
16 min read