Better Range Types in Postgres 14: Turning 100 Lines of SQL Into 3
I can talk about the benefits of PostgreSQL for application development and operations all day. But there two enduring topics that are close to my heart: SCRAM (you need to update your passwords to use SCRAM) and range types.
I've been stoked about range types since they were released in PostgreSQL 9.2. Before I joined Crunchy Data, I deployed them heavily in productionto help manage a mature scheduling and space booking application. Performance wise, range types provided a fast way to perform sophisticated lookups over temporal data sets!
With all the power and flexibility of range types, there was an unfortunate limitation: working with non overlapping (or noncontiguous) ranges. Now don't get me wrong, it afforded me the opportunity to write some very advanced recursive SQL and build some cool real-time data management systems. However, this was at the cost of convenience and maintainability.
Almost a decade after range types were first introduced, PostgreSQL 14 now adds the ability to write some "boring SQL" to further unlock the power of working with range data. Meet the "multi range" data type.
PostgreSQL 14: Déjà Vu All Over Again
It feels like déjà vu all over again: PostgreSQL 9.2 introduced JSON and range types. PostgreSQL 14 adds better JSON and multirange types.
Recall that Postgres comes with several standard range types:
int4range
,int8range
: ranges ofint
andbigint
types.numrange
: ranges of numeric types.tstzrange
,daterange
,tsrange
, : ranges oftimestamptz
,date
, andtime
types.
and recall that creating a range type is as easy as:
SELECT daterange(CURRENT_DATE, CURRENT_DATE + 1);
Postgres 14 now adds support for multi range types. These are data types that support grouping together non contiguous ranges. Here is a quick example of what that looks like:
SELECT datemultirange(
daterange(CURRENT_DATE, CURRENT_DATE + 2),
daterange(CURRENT_DATE + 5, CURRENT_DATE + 8)
);
yielding:
datemultirange
---------------------------------------------------
{[2021-06-08,2021-06-10),[2021-06-13,2021-06-16)}
(1 row)
What's cool about this, is now I can query inside these multirange types and determine if there is a particular value in it, or if there is an overlapping range:
SELECT CURRENT_DATE + 1 <@ datemultirange(
daterange(CURRENT_DATE, CURRENT_DATE + 2),
daterange(CURRENT_DATE + 5, CURRENT_DATE + 8)
);
yielding:
?column?
----------
t
(1 row)
SELECT CURRENT_DATE + 3 <@ datemultirange(
daterange(CURRENT_DATE, CURRENT_DATE + 2),
daterange(CURRENT_DATE + 5, CURRENT_DATE + 8)
);
yielding:
?column?
----------
f
(1 row)
Let's look at a range that overlaps the non contiguous parts!
SELECT daterange(CURRENT_DATE + 1, CURRENT_DATE + 5) && datemultirange(
daterange(CURRENT_DATE, CURRENT_DATE + 2),
daterange(CURRENT_DATE + 5, CURRENT_DATE + 8)
);
yielding:
?column?
----------
t
(1 row)
And a range that does not overlap the non contiguous parts:
SELECT daterange(CURRENT_DATE + 3, CURRENT_DATE + 4) && datemultirange(
daterange(CURRENT_DATE, CURRENT_DATE + 2),
daterange(CURRENT_DATE + 5, CURRENT_DATE + 8)
);
yielding:
?column?
----------
f
(1 row)
Cool. You can create multi ranges from the existing range types by adding the multirange
suffix to the end of the range type name, e.g.:
int4multirange
,int8multirange
: ranges ofint
andbigint
range types.nummultirange
: multirange of numeric range types.tstzmultirange
,datemultirange
,tsmultirange
, : multirange oftimestamptz
,date
, andtime
range types.
However, this only scratches the surface of the power of the multi range type. I think the coolest part is how this can greatly simplify an overly complex recursive query with a single aggregate: range_agg
.
In other words, let's take 200 lines of SQL and turn it into 3.
A Real World Example: Availability
Let's build a similar example to what we had in my previous blog post on searching availability using range types. We'll change up the example a bit to days with appointments:
CREATE TABLE appointments (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
appointment_dates daterange NOT NULL,
EXCLUDE USING gist (appointment_dates WITH &&)
);
Let's create a set of appointment dates similar to what we had in the previous example:
INSERT INTO appointments (appointment_dates)
VALUES
(daterange('2021-06-02', '2021-06-02', '[]')),
(daterange('2021-06-06', '2021-06-09', '[]')),
(daterange('2021-06-11', '2021-06-12', '[]')),
(daterange('2021-06-16', '2021-06-17', '[]')),
(daterange('2021-06-25', '2021-06-27', '[]'));
In other words, the appointment dates are:
- June 2
- June 6 - June 9
- June 11 - June 12
- June 16 - June 17
- June 25 - June 27
Now, let's say we want to return all the dates that we are booked for an appointment in the month of June. We can accomplish this with a single query:
SELECT appointment_dates
FROM appointments
WHERE appointment_dates && daterange('2021-06-01', '2021-06-30', '[]');
which yields:
appointment_dates
-------------------------
[2021-06-02,2021-06-03)
[2021-06-06,2021-06-10)
[2021-06-11,2021-06-13)
[2021-06-16,2021-06-18)
[2021-06-25,2021-06-28)
If we wanted to aggregate all of these dates into a single multirange, we can do so with range_agg
:
SELECT range_agg(appointment_dates)
FROM appointments
WHERE appointment_dates && daterange('2021-06-01', '2021-06-30', '[]');
which yields:
range_agg
---------------------------------------------------------------------------------------------------------------------------
{[2021-06-02,2021-06-03),[2021-06-06,2021-06-10),[2021-06-11,2021-06-13),[2021-06-16,2021-06-18),[2021-06-25,2021-06-28)}
Recall that the problem that the previous example wanted to solve was "finding what dates I'm available for an appointment within a given month". To solve that before, we wrote a fairly involved recursive query. With multi range types, we can simplify this to:
SELECT datemultirange(daterange('2021-06-01', '2021-06-30', '[]')) -
range_agg(appointment_dates) AS availability
FROM appointments
WHERE appointment_dates && daterange('2021-06-01', '2021-06-30', '[]');
which yields:
availability
---------------------------------------------------------------------------------------------------------------------------------------------------
{[2021-06-01,2021-06-02),[2021-06-03,2021-06-06),[2021-06-10,2021-06-11),[2021-06-13,2021-06-16),[2021-06-18,2021-06-25),[2021-06-28,2021-07-01)}
Wow! That's way simpler. While the previous method I showed works, this is way simpler, less error prone and provides many advantages, including:
- Easier to understand: This is only 3 lines of straightforward SQL, vs. a complex recursive query.
- Easier to maintain: With less SQL comes less maintenance, say if there is an update to your schema.
- Less storage: due to the cost of the query, the previous methods requires you to "precompute" the availability on larger systems. With the efficiency of the range aggregate (
range_agg
) and multi data type difference (-
) operator, this lessens, if not eliminates the need to precompute your availability.
Next Steps
Postgres 14 is now in beta. I strongly suggest downloading it, kicking the tires, and seeing what other interesting ways you can deploy some of its new features.
If you're already using range types in non trivial ways, I suggest you play around with the new multi range types: you may be able to simplify your queries without sacrificing performance!
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read