How to Solve Advent of Code 2022 Using Postgres...

Spoiler Alert

This article will contain spoilers both on how I solved 2022 Day 11's challenge "Monkey in the Middle" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.

Hands on Tutorial

We've also loaded a tutorial for Day 11's challenge if you want to try it with a pre-loaded data set.

AOC Day 11

This challenge involves a troop of monkeys who have stolen your items and are throwing them around to each other. For this challenge, some of the Postgres / SQL tech we used are:

file_fdw
Yet more non-standard use of sequences via setval and currval
String manipulation functions such as substring, replace, regexp_match, and regexp_substr
Window functions
A handy plpgsql function

The input file describes the attributes of each monkey. It's a very custom format, so we read the whole line as-is after our usual setup:

CREATE EXTENSION if not exists file_fdw;

CREATE SERVER if not exists aoc2022 FOREIGN DATA WRAPPER file_fdw;

DROP SCHEMA if exists aoc_monkeys CASCADE;
CREATE SCHEMA aoc_monkeys;
SET search_path = aoc_monkeys;

DROP FOREIGN TABLE if exists aoc_day11;

CREATE FOREIGN TABLE aoc_day11 (line text)
  SERVER aoc2022 options(filename '/tmp/aoc2022.day11.input');

SELECT * FROM aoc_day11 LIMIT 10;

                     line
----------------------------------------------
 Monkey 0:
   Starting items: 52, 60, 85, 69, 75, 75
   Operation: new = old * 17
   Test: divisible by 13
     If true: throw to monkey 6
     If false: throw to monkey 7

 Monkey 1:
   Starting items: 96, 82, 61, 99, 82, 84, 85
   Operation: new = old + 8
(10 rows)

As we can see, the first challenge will be in transforming the attributes for each monkey in to a more manageable table format. We are going to use a sequence that changes when we see a "Monkey" line, and multiple CASE statements to transform the various lines into the raw information we need from each. We do this for every line, even though each line will only give us one of the attributes for the monkey.

DROP SEQUENCE if exists aoc;
CREATE SEQUENCE aoc MINVALUE 0; -- Needed as monkey numbers start at 0!

WITH x AS (SELECT line FROM aoc_day11)
,y AS (SELECT case when line ~ 'Monkey' then
         setval('aoc',substring(line, '([0-9]+)')::int) else -1 end AS monkey
  ,currval('aoc') AS num
  ,case when line ~ 'Starting'  then substring(line, ': (.+)') else '' end AS items
  ,case when line ~ 'Operation' then substring(line, '= old (.+)') else '' end AS operation
  ,case when line ~ 'Test'      then substring(line, 'divisible by (\d+)')::int else 0 end AS test
  ,case when line ~ 'If true'   then substring(line, 'monkey (\d+)')::int else -1 end AS yesmonkey
  ,case when line ~ 'If false'  then substring(line, 'monkey (\d+)')::int else -1 end AS nomonkey
FROM x)
SELECT * FROM y LIMIT 10;

 monkey | num |           items            | operation | test | yesmonkey | nomonkey
--------+-----+----------------------------+-----------+------+-----------+----------
      0 |   0 |                            |           |    0 |        -1 |       -1
     -1 |   0 | 52, 60, 85, 69, 75, 75     |           |    0 |        -1 |       -1
     -1 |   0 |                            | * 17      |    0 |        -1 |       -1
     -1 |   0 |                            |           |   13 |        -1 |       -1
     -1 |   0 |                            |           |    0 |         6 |       -1
     -1 |   0 |                            |           |    0 |        -1 |        7
     -1 |   0 |                            |           |    0 |        -1 |       -1
      1 |   1 |                            |           |    0 |        -1 |       -1
     -1 |   1 | 96, 82, 61, 99, 82, 84, 85 |           |    0 |        -1 |       -1
     -1 |   1 |                            | + 8       |    0 |        -1 |       -1
(10 rows)

Now we need combine all those attributes into one row for each monkey. Sharp readers may have noted that the default values for our CASE statements are an empty string, 0, or -1. Why? Because this ensures that any "real" values are always greater than the defaults. Thus, we can sort the values and have the "real" ones come out higher than the default ones. We can use the MAX function to grab each attribute's value for each monkey. We also need to group it by monkey, so we use a DISTINCT num to ensure one monkey per row, and then use our MAX over a window. In this case, we need our window of rows we care about to span each numbered monkey, so we create our window as (PARTITION BY num). Note that rather than writing that after ever line (e.g. MAX(operation) OVER (PARTITION BY num)), we simply create a global window named w that each item in the SELECT clause can refer to.

WITH x AS (SELECT line FROM aoc_day11)
,y AS (SELECT case when line ~ 'Monkey' then
         setval('aoc',substring(line, '([0-9]+)')::int) else -1 end AS monkey
  ,currval('aoc') AS num
  ,case when line ~ 'Starting'  then substring(line, ': (.+)') else '' end AS items
  ,case when line ~ 'Operation' then substring(line, '= old (.+)') else '' end AS operation
  ,case when line ~ 'Test'      then substring(line, 'divisible by (\d+)')::int else 0 end AS test
  ,case when line ~ 'If true'   then substring(line, 'monkey (\d+)')::int else -1 end AS yesmonkey
  ,case when line ~ 'If false'  then substring(line, 'monkey (\d+)')::int else -1 end AS nomonkey
FROM x)
SELECT DISTINCT num
  ,REPLACE(max(items) OVER w ,',','') AS items
  ,max(test) OVER w
  ,max(operation) OVER w
  ,max(yesmonkey) OVER w
  ,max(nomonkey) OVER w
FROM y WINDOW w AS (partition by num) ORDER BY num;

The output looks nice: one line per monkey, and we have captured all the important information for each of the eight monkeys in the input file. We also replaced all the "items" commas with a space, to make things easier later on.

 num |          items          | max |  max  | max | max
-----+-------------------------+-----+-------+-----+-----
   0 | 52 60 85 69 75 75       |  13 | * 17  |   6 |   7
   1 | 96 82 61 99 82 84 85    |   7 | + 8   |   0 |   7
   2 | 95 79                   |  19 | + 6   |   5 |   3
   3 | 88 50 82 65 77          |   2 | * 19  |   4 |   1
   4 | 66 90 59 90 87 63 53 88 |   5 | + 7   |   1 |   0
   5 | 92 75 62                |   3 | * old |   3 |   4
   6 | 94 86 76 67             |  11 | + 1   |   5 |   2
   7 | 57                      |  17 | + 2   |   6 |   2
(8 rows)

While a recusive query is theoretically possible, there are only so many hours in a day, so we'll simply insert things into a table that we can manipulate with custom functions:

DROP TABLE IF EXISTS monkey;
CREATE TABLE monkey (
  num       INT,
  items     TEXT,
  op        TEXT,
  test      INT,
  yes       INT,
  no        INT,
  monkeysee INT
);

Now we can re-run our query, and change the SELECT to an INSERT:

WITH x AS (SELECT line FROM aoc_day11)
,y AS (SELECT case when line ~ 'Monkey' then
         setval('aoc',substring(line, '([0-9]+)')::int) else -1 end AS monkey
  ,currval('aoc') AS num
  ,case when line ~ 'Starting'  then substring(line, ': (.+)') else '' end AS items
  ,case when line ~ 'Operation' then substring(line, '= old (.+)') else '' end AS operation
  ,case when line ~ 'Test'      then substring(line, 'divisible by (\d+)')::int else 0 end AS test
  ,case when line ~ 'If true'   then substring(line, 'monkey (\d+)')::int else -1 end AS yesmonkey
  ,case when line ~ 'If false'  then substring(line, 'monkey (\d+)')::int else -1 end AS nomonkey
FROM x)
INSERT INTO monkey
  SELECT DISTINCT num
    ,REPLACE(max(items) OVER w ,',','') AS items
    ,max(test) OVER w
    ,max(operation) OVER w
    ,max(yesmonkey) OVER w
    ,max(nomonkey) OVER w
  FROM y WINDOW w AS (partition by num) ORDER BY num;

Finally, we are ready to create a function that will execute a given number of rounds, in which the monkeys throw items to each other, and then give a final value based on how many items the top two monkeys threw. Rather than walk through how the function works bit by bit, I used some heavy inline comments:

CREATE or replace FUNCTION monkey_business(maxrounds int) --version 1
  returns void language plpgsql as $$
DECLARE
  maxmonkey int;
  myrec record;
  myops text[]; mytests int[]; myitems text[]; monkeysee int[]; myway int[]; hiway int[];
  thismonkey int = 0;
  current_item int;
  round int = 1;
  yesno bool;
  newmonkey int;
  x int; y int;
BEGIN

  -- We need to know when we are at the "last" monkey:
  SELECT INTO maxmonkey MAX(num) FROM monkey;

  -- Load monkey attributes into arrays
  FOR myrec IN SELECT * FROM monkey ORDER BY num LOOP
    myops[myrec.num] = myrec.op;
    mytests[myrec.num] = myrec.test;
    myitems[myrec.num] = myrec.items;
    myway[myrec.num] = myrec.yes;
    hiway[myrec.num] = myrec.no;
    monkeysee[myrec.num] = 0;
  END LOOP;

  -- Make a cheap infinite loop with a bailout at 1 million rounds:
  FOR x IN 1..1000000 LOOP
    -- If the current monkey is not holding anything, try another one
    -- If all monkeys are done, start another round
    IF myitems[thismonkey] !~ '\d' THEN
      thismonkey = thismonkey + 1;
      IF thismonkey > maxmonkey THEN round = round + 1; thismonkey = 0; END IF;
      IF round > maxrounds THEN EXIT; END IF;
      CONTINUE;
    END IF;

    -- Get the next item from the current monkey, store it in 'current_item'
    current_item = (regexp_match(myitems[thismonkey], '\d+'))[1];

    -- Tidy up the list
    myitems[thismonkey] = regexp_replace(myitems[thismonkey], ' *\d+', '');

    -- Increment by one how may times this monkey has seen an item
    monkeysee[thismonkey] = monkeysee[thismonkey] + 1;

    -- Modify the item based on this monkey's rules
    IF myops[thismonkey] ~ '\* \d+' THEN
      current_item = current_item * (regexp_match(myops[thismonkey], '\d+'))[1]::int;
    ELSIF myops[thismonkey] ~ '\+ \d+' THEN
      current_item = current_item + (regexp_match(myops[thismonkey], '\d+'))[1]::int;
    ELSIF myops[thismonkey] ~ '\* old' THEN
      current_item = current_item * current_item;
    END IF;

    -- Divide by three per the puzzle's rules
    current_item = current_item / 3;

    -- Determine which monkey gets the item next
    yesno = 0 = current_item % mytests[thismonkey];
    newmonkey = CASE WHEN yesno THEN myway[thismonkey] ELSE hiway[thismonkey] END;

    -- Give the item to the new monkey
    myitems[newmonkey] = myitems[newmonkey] || ' ' || current_item;

  END LOOP;

  -- Find the two most active monkeys
  SELECT INTO x MAX(ms) FROM unnest(monkeysee) ms;
  SELECT INTO y ms FROM unnest(monkeysee) ms ORDER BY ms DESC LIMIT 1 OFFSET 1;

  RAISE NOTICE 'Monkey answer for % and %: %', x,y, x::bigint*y::bigint;

  return;
END
$$;

SELECT monkey_business(20);

Running the above gives the correct answer very quickly. For compatibility I used the regexp_match function, but if you are using Postgres 15 or newer you can use the better regexp_substr function:

-- >= v15:   current_item = regexp_substr(myitems[thismonkey], '\d+', 1);
-- <= v14:   current_item = (regexp_match(myitems[thismonkey], '\d+'))[1];

Part Two gets a lot more involved, with the changes being a lot more rounds (from 20 to 10,000) and removal of the current_items = current_items / 3 bit. Simple, right? Of course not, the AOC challenges are more devious than that! Here's the first attempt with just the divided by three part removed:

SELECT monkey_business(10000);

ERROR:  integer out of range

No problem, we will just switch the data type of the current_item variable from an INT to BIGINT!

SELECT monkey_business(10000);

ERROR:  bigint out of range

Okay, so we are dealing with some very large numbers here. Surely a NUMERIC will suffice?

SELECT monkey_business(10000);

ERROR:  value overflows numeric format

Well, that's not an error you see too often! Obviously these numbers are way too big, so we need another approach. Taking a closer look at the problem and the sample data, I noticed a few things:

The actual items have no identity - we are simply passing numbers representing those items around, and the values change mid flight.
All the tests are simple "divisible by", which means there are a range of solutions that give the same answer
All the divisor test numbers are prime
Thus, the numbers do not need to stay large - we can shrink them at each round, as long as they still meet the same requirements, e.g. true/false for each "divisible by" test.

So, we need a magic number that all the answers can be divided by. In this case, we simply multiply all those prime numbers together. In our setup loop at the top of the function, we add:

bigmodulo = bigmodulo * myrec.test;

Then instead of dividing by three each round, we can shrink the number using modulo math:

current_item = current_item % bigmodulo;

This worked great, and I was able to move current_item back to a BIGINT and ran 10,000 rounds in a little under 3 seconds! Here is the final version of the function:

CREATE or replace FUNCTION monkey_business(maxrounds int) --version 2
  returns void language plpgsql as $$
DECLARE
  maxmonkey int;
  myrec record;
  myops text[]; mytests int[]; myitems text[]; monkeysee int[]; myway int[]; hiway int[];
  thismonkey int = 0;
  current_item bigint;
  round int = 1;
  yesno bool;
  newmonkey int;
  x int; y int;
  bigmodulo int = 1;
BEGIN

  SELECT INTO maxmonkey MAX(num) FROM monkey;

  -- Load monkey attributes into arrays
  FOR myrec IN SELECT * FROM monkey ORDER BY num LOOP
    bigmodulo = bigmodulo * myrec.test;
    myops[myrec.num] = myrec.op;
    mytests[myrec.num] = myrec.test;
    myitems[myrec.num] = myrec.items;
    myway[myrec.num] = myrec.yes;
    hiway[myrec.num] = myrec.no;
    monkeysee[myrec.num] = 0;
  END LOOP;

  FOR x IN 1..1000000 LOOP
    -- If the current monkey is not holding anything, try another one
    -- If all monkeys are done, start another round
    IF myitems[thismonkey] !~ '\d' THEN
      thismonkey = thismonkey + 1;
      IF thismonkey > maxmonkey THEN round = round + 1; thismonkey = 0; END IF;
      IF round > maxrounds THEN EXIT; END IF;
      CONTINUE;
    END IF;

    -- Get the next item from the current monkey, store it in 'current_item'
    -- v15+: current_item = regexp_substr(myitems[thismonkey], '\d+', 1);
    current_item = (regexp_match(myitems[thismonkey], '\d+'))[1];

    -- Tidy up the list
    myitems[thismonkey] = regexp_replace(myitems[thismonkey], ' *\d+', '');

    -- Increment by one how may times this monkey has seen an item
    monkeysee[thismonkey] = monkeysee[thismonkey] + 1;

    -- Modify the item based on this monkey's rules
    IF myops[thismonkey] ~ '\* \d+' THEN
      current_item = current_item * (regexp_match(myops[thismonkey], '\d+'))[1]::int;
    ELSIF myops[thismonkey] ~ '\+ \d+' THEN
      current_item = current_item + (regexp_match(myops[thismonkey], '\d+'))[1]::int;
    ELSIF myops[thismonkey] ~ '\* old' THEN
      current_item = current_item * current_item;
    END IF;

    IF maxrounds <= 20 THEN current_item = current_item / 3; END IF;

    -- Determine which monkey we are going to next
    yesno = 0 = current_item % mytests[thismonkey];
    newmonkey = CASE WHEN yesno THEN myway[thismonkey] ELSE hiway[thismonkey] END;

    current_item = current_item % bigmodulo;

    -- Give the item to the new monkey
    myitems[newmonkey] = myitems[newmonkey] || ' ' || current_item;

  END LOOP;

  -- Find the two most active monkeys
  SELECT INTO x MAX(ms) FROM unnest(monkeysee) ms;
  SELECT INTO y ms FROM unnest(monkeysee) ms ORDER BY ms DESC LIMIT 1 OFFSET 1;

  RAISE NOTICE 'Monkey answer for % and %: %', x,y, x::bigint*y::bigint;

  return;
END
$$;

Hope you enjoyed! Will post some more solutions as time permits.

Latest Articles

How to Solve Advent of Code 2022 Using Postgres - Day 11

Spoiler Alert

Hands on Tutorial

AOC Day 11

Related Articles