Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Tutorial Instructions
For the AOC Day 15 challenge, we are presented with an input file that contains the locations of many sensors, along with the closest beacon to that sensor. The first goal is to find out how many beacons CANNOT be in a certain row. The distance from the sensor to the beacon is measured in taxi distance also known as the snake distance or the Manhattan distance. See the Wikipedia entry if you are unfamiliar with it, as it will help to understand this problem. The file consists of lines like these:
Sensor at x=1555825, y=18926: closest beacon is at x=1498426, y=-85030
Sensor at x=697941, y=3552290: closest beacon is at x=595451, y=3788543
Sensor at x=3997971, y=2461001: closest beacon is at x=3951198, y=2418718
As always, we are going to employ a handful of Postgres / SQL tools to help us solve this, including:
The input is in a fairly standard form - we want to first extract all the numbers from each line to eventually put them into a table. This is an excellent job for the regexp_substr() function:
WITH x AS (SELECT nextval('aoc') AS id, line FROM aoc_day15)
,y AS (
SELECT id
,regexp_substr(line, '=(\-?\d+)', 1, 1, '', 1)::int AS s1
,regexp_substr(line, '=(\-?\d+)', 1, 2, '', 1)::int AS s2
,regexp_substr(line, '=(\-?\d+)', 1, 3, '', 1)::int AS b1
,regexp_substr(line, '=(\-?\d+)', 1, 4, '', 1)::int AS b2
FROM x)
,coords as (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
SELECT * FROM coords LIMIT 10;
However, your version of Postgres may not support regexp_substr (needs to be at least Postgres 15), so older versions will need to use regexp_match instead:
WITH x AS (SELECT nextval('aoc') AS id, line FROM aoc_day15)
,y AS (
SELECT id
,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords as (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
SELECT * FROM coords LIMIT 10;
For both, we use a simple regex to grab the number: =(\-?\d+) The parens specify what part we want to capture, we put a backslash before the minus sign, and use a question mark to make it optional (e.g. 0 or 1 occurrences), and end with a backslash-d to indicate a digit (the plus sign indicates one or more). We then use all four numbers to create the snake distance for each line. The query breaks things into place nicely:
id | s1 | s2 | b1 | b2 | taxi
----+---------+---------+---------+---------+---------
2 | 1555825 | 18926 | 1498426 | -85030 | 161355
3 | 697941 | 3552290 | 595451 | 3788543 | 338743
4 | 3997971 | 2461001 | 3951198 | 2418718 | 89056
5 | 3818312 | 282332 | 4823751 | 1061753 | 1784860
6 | 2874142 | 3053631 | 3074353 | 3516891 | 663471
7 | 1704479 | 2132468 | 1749091 | 2000000 | 177080
8 | 3904910 | 2080560 | 3951198 | 2418718 | 384446
9 | 657061 | 3898803 | 595451 | 3788543 | 171870
10 | 3084398 | 3751092 | 3074353 | 3516891 | 244246
11 | 2582061 | 972407 | 1749091 | 2000000 | 1860563
(10 rows)
For part one, we need to only look at row 2,000,000 and determine how many positions there are which cannot have the beacon we are looking for.We basically need to find all places on the line at 2,000,000 that are within the taxi distance of any sensor. For this, we are going to make use of the int4range() function to build a list of line segments at row 2 million, and then filter out the duplicates. First, we use a CASE() statement to only worry about sensors within range. If they are in range, we pull out which columns intersect row 2,000,000 based on the taxi distance for the sensor. If you have a constant like 2 million, that's also a great use of CTEs:
WITH x AS (SELECT nextval('aoc') as id, line from aoc_day15)
,y AS (
SELECT id
,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords AS (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
,numbers AS (SELECT 2000000 AS magicrow)
,inty AS (
SELECT id, s1, s2, taxi,
CASE WHEN abs(s2 - magicrow) <= taxi
THEN int4range(s1 - (taxi-abs(s2-magicrow)), s1 + taxi-(abs(s2-magicrow)), '[]')
ELSE int4range(0,0)
END AS cannotbe
FROM coords, numbers)
SELECT * FROM inty limit 10;
id | s1 | s2 | taxi | cannotbe
----+---------+---------+---------+-------------------
35 | 1555825 | 18926 | 161355 | empty
36 | 697941 | 3552290 | 338743 | empty
37 | 3997971 | 2461001 | 89056 | empty
38 | 3818312 | 282332 | 1784860 | [3751120,3885505)
39 | 2874142 | 3053631 | 663471 | empty
40 | 1704479 | 2132468 | 177080 | [1659867,1749092)
41 | 3904910 | 2080560 | 384446 | [3601024,4208797)
42 | 657061 | 3898803 | 171870 | empty
43 | 3084398 | 3751092 | 244246 | empty
44 | 2582061 | 972407 | 1860563 | [1749091,3415032)
(10 rows)
Now that we have these into a collection of ranges, we can use the awesome range_agg()
function to combine all these ranges into a single collection of non-overlapping ranges. We pass that list to unnest to combine it back into a single list of values. In our case, this is a single range, but it could have been many. It looks like this:
WITH x AS (SELECT nextval('aoc') as id, line from aoc_day15)
,y AS (
SELECT id
,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords AS (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
,numbers AS (SELECT 2000000 AS magicrow)
,inty AS (
SELECT id, s1, s2, taxi,
CASE WHEN abs(s2 - magicrow) <= taxi
THEN int4range(s1 - (taxi-abs(s2-magicrow)), s1 + taxi-(abs(s2-magicrow)), '[]')
ELSE int4range(0,0)
END AS cannotbe
FROM coords, numbers)
,aggy AS (SELECT unnest(range_agg(cannotbe)) AS r FROM inty)
SELECT * FROM aggy;
r
-------------------
[-671176,4208797)
(1 row)
At this point, we are ready to solve the puzzle, so we extract all the non-overlapping ranges, figure out how "wide" they are by subtracting their lower bounds from their upper bounds, then add those all up together to get our answer. We also subtract one, as our final combined range is inclusive on one end only.
WITH x AS (SELECT nextval('aoc') as id, line from aoc_day15)
,y AS (
SELECT id
,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords AS (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
,numbers AS (SELECT 2000000 AS magicrow)
,inty AS (
SELECT id, s1, s2, taxi,
CASE WHEN abs(s2 - magicrow) <= taxi
THEN int4range(s1 - (taxi-abs(s2-magicrow)), s1 + taxi-(abs(s2-magicrow)), '[]')
ELSE int4range(0,0)
END AS cannotbe
FROM coords, numbers)
,aggy AS (SELECT unnest(range_agg(cannotbe)) AS r FROM inty)
,rangesums AS (SELECT upper(r)-lower(r) AS mysum FROM aggy)
SELECT sum( upper(r)-lower(r) ) -1 AS partone FROM aggy;
partone
---------
4879972
(1 row)
Part Two gets a lot more challenging. We are tasked with finding the spot on the grid that is not within range of any sensor. In other words, there is a spot on the grid somewhere that is not within the taxi range of any of our rows. Sounds pretty simple, right? Yes...but only up to a point. At first I tried a brute force solution of walking through the grid, and then walking through all the sensor ranges, but both of those were quickly ruled out due to the size of the grid. As an example, here is one of the lines from the input file. Look closely at the size of the "diamond" that is generated by the sensor expanding out to the taxi distance:
-- Sensor at x=2582061, y=972407: closest beacon is at x=1749091, y=2000000
id | s1 | s2 | b1 | b2 | taxi
----+---------+---------+----------+---------+---------
11 | 2582061 | 972407 | 1749091 | 2000000 | 1860563
That diamond has a radius of 1.8 million! So this grid is entirely too large to do anything resembling a brute force approach. Not going to lie here - I struggled with this a little, until I found a good approach. The key is to realize that any open space must be bordered by more than one sensor "edge". So, we can walk the edges of each sensor (sensor plus taxi distance along the whole diamond), and look for any spot just outside the edge that is NOT in range of any other sensor. Still not cheap to do in SQL, but way better than any alternative!
Once again, we are going to put things into a grid table with rows and columns. All we care about is the location of the sensors: now that we have the taxi distance, we can discard the beacon's locations. So we will store the location of each sensor, and its taxi distance. Using the SQL above, we can quickly populate the table:
CREATE UNLOGGED TABLE sensorfarm(col int, row int, taxi int, item text);
CREATE UNIQUE INDEX aa on sensorfarm(col,row);
WITH x AS (SELECT nextval('aoc') AS id, line FROM aoc_day15)
,y AS (
SELECT id
,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords as (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
INSERT into sensorfarm(col,row,taxi) select s1,s2,taxi FROM coords;
At one point, I tried using arrays, but they were slow and ran into some Postgres limits:
ERROR: array size exceeds the maximum allowed (1073741823)
For the final function, I was able to implement the logic above: walk the outside of the diamond, rule out things that are out of bounds (the problem says it will be in the zero to four million row/column range), and check all the other diamonds for overlap. The full function is at the bottom. Let's look at it section by section:
CREATE or replace FUNCTION diamond_walker()
returns void language plpgsql as $$
DECLARE
maxrange int = 4000000;
myrec record;
item int = 0;
mysensor record;
current_top_row int; current_bottom_row int;
current_left_col int; current_right_col int;
northwest bool; northeast bool; southwest bool; southeast bool;
oob int = 0;
BEGIN
FOR myrec IN
SELECT row_number() OVER (ORDER BY taxi), row, col, taxi FROM sensorfarm
LOOP
This is a standard plpgsql function, which takes no arguments, and returns null, as we are going to throw an exception when we find the value we are looking for. We DECLARE all of the variables we are going to use, then start the main loop, in which we iterate over each sensor in our sensorfarm table. The row_number
is not needed to solve things, but helps give us some nice output as we are searching.
current_top_row = myrec.row;
current_bottom_row = myrec.row;
current_left_col = myrec.col - (myrec.taxi + 1);
current_right_col = myrec.col + (myrec.taxi + 1);
We are going to walk the outside of the diamond formed by the sensor and its taxi distance. We start with the same row as the sensor, but shoot out bot columns as wide as the taxi distance - in other words, the widest part of the diamond.
-- Walk the outside of the diamond for this sensor
FOR t IN 1 .. myrec.taxi+1 LOOP
-- Every once in a while, give a status report
IF (item % 10000) = 0 THEN
RAISE NOTICE
'%Trying sensor #% at % / % taxi is % [% candidates] [OOB: %]', E'\033c',
myrec.row_number,
to_char(myrec.col, 'FM999,999,999'), to_char(myrec.row, 'FM9,999,999'),
to_char(myrec.taxi, 'FM9,999,999'), to_char(item, 'FM999,999,999'),
to_char(oob, 'FM999,999,999');
END IF;
item = item + 1;
Now we start the loop in which we will walk the outside of the diamond, checking each point along the way. As this can take a very long time to run, I added some output so that clears the screen (that magic 033c
code, then does some pretty output showing which row we are on, and some totals on about how many places we have checked, and how many were declared out of bounds (explained below).
-- There are cases where the whole diamond has wandered out of range
-- If that happens, we abandon the entire loop
if current_left_col > maxrange
or current_right_col < 0
or current_top_row > maxrange
or current_bottom_row < 0
THEN exit;
END IF;
It may be that our diamond is no longer in the range that we are considering, so we have a short-circuit here to simply abandon this entire sensor if that should happen.
-- Set each direction as not found
northwest = false; northeast = false; southwest = false; southeast = false;
-- Skip if this candidate is out of range
IF current_left_col < 0
THEN northwest = true; southwest = true; oob = oob + 1;
END IF;
IF current_right_col > maxrange
THEN northeast = true; southeast = true; oob = oob + 1;
END IF;
IF current_top_row < 0
THEN northwest = true; northeast = true; oob = oob + 1;
END IF;
IF current_bottom_row > maxrange
THEN southwest = true; southeast = true; oob = oob + 1;
END IF;
We are going to track each of the four edges of the diamond: northeast, northwest, southeast, and southwest. These variables are all booleans, and start as false. If we find something that disqualifies them as a potential candidate for "winning", we mark it as true. To start, we do a quick check to see if any of the four edges are out of range. If they are, we mark the appropriate boolean and also increment our out of bounds variable.
-- Do any other diamonds contain this spot?
FOR mysensor in SELECT col,row,taxi FROM sensorfarm WHERE ctid <> myrec.ctid LOOP
-- Does this sensor cover our "northwest" pixel?
IF not northwest AND abs(mysensor.col - current_left_col)
+ abs(mysensor.row - current_top_row) <= mysensor.taxi
THEN northwest = true;
END IF;
-- Does this sensor cover our "northeast" pixel?
IF not northeast AND abs(mysensor.col-current_right_col)
+ abs(mysensor.row-current_top_row) <= mysensor.taxi
THEN northeast = true;
END IF;
-- Does this sensor cover our "southwest" pixel?
IF not southwest AND abs(mysensor.col - current_left_col)
+ abs(mysensor.row - current_bottom_row) <= mysensor.taxi
THEN southwest = true;
END IF;
-- Does this sensor cover our "southeast" pixel?
IF not southeast AND abs(mysensor.col - current_right_col)
+ abs(mysensor.row - current_bottom_row) <= mysensor.taxi
THEN southeast = true;
END IF;
-- Stop checking other sensors if all four directions are covered
IF northwest and northeast and southwest and southeast
THEN exit;
END IF;
END LOOP;
This bit of code is the meat of our function. We want to iterate through all of the other sensors to see if, for each of our edges, that sensor overlaps the spot we are looking at. Note that we use the special ctid
column (the physical location of the row, more or less) to exclude comparing our current sensor to itself. If we find that another sensor has covered any of the four edges, we immediately exit the loop as we know this spot is not a winner.
-- If any of our four directions had no coverage, they are the winner
IF not northeast or not northwest or not southeast or not southwest
THEN
mysensor.col = CASE when northeast or southeast
THEN current_right_col else current_left_col END;
mysensor.row = CASE when northeast or northwest
THEN current_top_row else current_bottom_row END;
RAISE 'Winner from row %, edges of sensor at %,%. Spot %,% has a value of %',
myrec.row_number, myrec.col, myrec.row, mysensor.col, mysensor.row,
mysensor.col * maxrange::bigint + mysensor.row;
END IF;
Finally, we can check if we won, as indicated by the fact that one of our edges is still false. If so, we throw an exception and give the information about the winning spot.
current_left_col = current_left_col + 1;
current_right_col = current_right_col - 1;
current_top_row = current_top_row - 1;
current_bottom_row = current_bottom_row + 1;
END LOOP; -- outside of the diamond
END LOOP; -- each sensor
return; END; $$;
The final bit of the function is just cleanup. We move our current positions up, down, left, and right, as we continue to walk the edge of the diamond. Then we end the diamond loop, the sensor, loop, and the function ends. Here is what it looks like as it runs:
And here is the final output, showing the correct solution to part two:
select diamond_walker();
Trying sensor #21 at 3,505,508 / 2,805,537 taxi is 532,165 [4,300,000 candidates] [OOB: 1,865,312]
ERROR: Winner from edges of sensor at 3505508,2805537. Spot 3879585,2647448 has a value of 15518342647448
CONTEXT: PL/pgSQL function diamond_walker() line 92 at RAISE
The complete function:
CREATE or replace FUNCTION diamond_walker()
returns void language plpgsql as $$
DECLARE
maxrange int = 4000000;
myrec record;
item int = 0;
mysensor record;
current_top_row int; current_bottom_row int;
current_left_col int; current_right_col int;
northwest bool; northeast bool; southwest bool; southeast bool;
oob int = 0;
BEGIN
FOR myrec IN
SELECT row_number() OVER (ORDER BY taxi), row, col, taxi FROM sensorfarm
LOOP
current_top_row = myrec.row;
current_bottom_row = myrec.row;
current_left_col = myrec.col - (myrec.taxi + 1);
current_right_col = myrec.col + (myrec.taxi + 1);
/* Walk the outside of the diamond */
FOR t IN 1 .. myrec.taxi+1 LOOP
/* Every once in a while, give a status report */
IF (item % 10000) = 0 THEN
RAISE NOTICE
'%Trying sensor #% at % / % taxi is % [% candidates] [OOB: %]', E'\033c',
myrec.row_number,
to_char(myrec.col, 'FM999,999,999'), to_char(myrec.row, 'FM9,999,999'),
to_char(myrec.taxi, 'FM9,999,999'), to_char(item, 'FM999,999,999'),
to_char(oob, 'FM999,999,999');
END IF;
item = item + 1;
/* There are cases where the whole diamond has wandered out of range
If that happens, we abandon the entire loop */
if current_left_col > maxrange
or current_right_col < 0
or current_top_row > maxrange
or current_bottom_row < 0
THEN exit;
END IF;
/* Set each direction as not found */
northwest = false; northeast = false; southwest = false; southeast = false;
/* Skip if this candidate is out of range */
IF current_left_col < 0
THEN northwest = true; southwest = true; oob = oob + 1;
END IF;
IF current_right_col > maxrange
THEN northeast = true; southeast = true; oob = oob + 1;
END IF;
IF current_top_row < 0
THEN northwest = true; northeast = true; oob = oob + 1;
END IF;
IF current_bottom_row > maxrange
THEN southwest = true; southeast = true; oob = oob + 1;
END IF;
/* Do any other diamonds contain this spot? */
FOR mysensor in SELECT col,row,taxi FROM sensorfarm WHERE ctid <> myrec.ctid LOOP
/* Does this sensor cover our "northwest" pixel? */
IF not northwest AND abs(mysensor.col - current_left_col)
+ abs(mysensor.row - current_top_row) <= mysensor.taxi
THEN northwest = true;
END IF;
/* Does this sensor cover our "northeast" pixel? */
IF not northeast AND abs(mysensor.col-current_right_col)
+ abs(mysensor.row-current_top_row) <= mysensor.taxi
THEN northeast = true;
END IF;
/* Does this sensor cover our "southwest" pixel? */
IF not southwest AND abs(mysensor.col - current_left_col)
+ abs(mysensor.row - current_bottom_row) <= mysensor.taxi
THEN southwest = true;
END IF;
/* Does this sensor cover our "southeast" pixel? */
IF not southeast AND abs(mysensor.col - current_right_col)
+ abs(mysensor.row - current_bottom_row) <= mysensor.taxi
THEN southeast = true;
END IF;
/* Stop checking other sensors if all four directions are covered */
IF northwest and northeast and southwest and southeast
THEN exit;
END IF;
END LOOP;
/* If any of our four directions had no coverage, they are the winner */
IF not northeast or not northwest or not southeast or not southwest
THEN
mysensor.col = CASE when northeast or southeast
THEN current_right_col else current_left_col END;
mysensor.row = CASE when northeast or northwest
THEN current_top_row else current_bottom_row END;
RAISE 'Winner from row %, edges of sensor at %,%. Spot %,% has a value of %',
myrec.row_number, myrec.col, myrec.row, mysensor.col, mysensor.row,
mysensor.col * maxrange::bigint + mysensor.row;
END IF;
current_left_col = current_left_col + 1;
current_right_col = current_right_col - 1;
current_top_row = current_top_row - 1;
current_bottom_row = current_bottom_row + 1;
END LOOP; /* outside of the diamond */
END LOOP; /* each sensor */
return; END; $$;
This function is preloaded, so if you are brave enough, and have lots of time on your hands, you can try it out:
SELECT diamond_walker();
That one was a doozy, but we got it solved! Stay tuned for more solutions soon.
Loading terminal...
Loading terminal...