Projecting Monthly Revenue Run Rate in Postgres
Monthly recurring revenue (MRR) and annual recurring revenue (ARR) are important metrics for a subscription-based business model. It allows for the business to project its available capital to make important decisions around expansion, hiring and more.
In an on-demand subscription model, MRR can fluctuate on any given day. This is especially true in the cloud-services world, where people are constantly spinning up and down services. That's why it makes sense to try and capture what the subscription revenue "run rate" is: this is a projection of how much money you are earning over a period of time based upon current service usage.
This exercise recently came up with a friend who was looking for a little help with analyzing and reporting on their SaaS business which is running on Crunchy Bridge. Because sometimes SQL is just fun I decided to roll up my sleeves and see what business insights we could get with a single query.
The twist here from my history is that this SaaS business was more of a utility model that you can start/stop subscriptions vs. having some defined plan for each month. (Think metered billing, apparently something my colleague Craig said was a big topic for Add-on Marketplaces in his Heroku days). Before I started running Postgres on Kubernetes, I was an application developer / a SQL wonk of sorts and just so happened to work for a business that had a subscription model. Thus, I was no stranger to the MRR/ARR queries.
So let's dive in.
The Query to Project Monthly Revenue Run Rate
So here it is:
SELECT
to_char(rates.day, 'Mon DD, YYYY') AS day,
to_char(rates.mrr_run_rate, '$99,999,999,999D99') AS mrr_run_rate,
to_char(12 * rates.mrr_run_rate, '$99,999,999,999D99') AS arr_run_rate,
to_char(CASE
WHEN COALESCE(lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC), 0) <> 0 THEN
(rates.mrr_run_rate - lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)) /
lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)
ELSE 0
END * 100, '9,999,999D99%') AS mrr_run_rate_mom
FROM (
SELECT
dates.day,
SUM(
CASE
WHEN tstzrange(dates.day, dates.day + '1 day'::interval) && subscription.period THEN
subscription.rate
ELSE
0
END) / 100::numeric AS mrr_run_rate
FROM subscription,
LATERAL (
SELECT *
FROM generate_series('2021-01-01'::date, CURRENT_TIMESTAMP, '1 day'::interval) AS day
) dates
WHERE subscription.org_id not in (
SELECT id
FROM account
WHERE billable
)
GROUP BY dates.day
) rates
ORDER BY rates.day DESC;
Upon first inspection, there is a lot going on here. What I have found through the years is that it's best to work through SQL queries inside-out. Let's start here:
FROM subscription
This is the heart of the query, pulling the data about all of the subscriptions. There are a few columns from this table that are required for this query, including:
rate
, the value of the subscription, measured monthly and stored in cents.period
, the duration of the subscription from start to finished. This is stored in atstzrange
range type.org_id
, which we use to ensure the org is billable
Let's move to the next part, building up the query:
FROM subscription,
LATERAL (
SELECT *
FROM generate_series('2021-01-01'::date, CURRENT_TIMESTAMP, '1 day'::interval) AS day
) dates
There are a bunch of things going on in this query. First, let's go inside and look at:
SELECT *
FROM generate_series('2021-01-01'::date, CURRENT_TIMESTAMP, '1 day'::interval) AS day
This generates a bunch of dates going back to Jan 1, 2021 to present day. We are going to use this list of dates to inspect the state of a subscription on any given day and calculate the run rate. We accomplish that using a LATERAL join, which allows us to cross-reference the subscription to a date regardless if it was active on that particular day.
Let's continue working our way inside out. Let's look at the WHERE
clause:
WHERE subscription.team_id not in (
SELECT id
FROM account
WHERE billable
)
This clause allows us to exclude any not billable subscriptions, this could be the case for demo accounts or trials a sales team is running.
We'll take a quick peek ahead at the GROUP BY
:
GROUP BY dates.day
We will be performing an aggregate query by summing up the projected monthly revenue on a given day, so we will need to aggregate that information by day!
Let's now look at the SELECT
list:
SELECT
dates.day,
SUM(
CASE
WHEN tstzrange(dates.day, dates.day + '1 day'::interval) && subscription.period THEN
subscription.rate
ELSE
0
END) / 100::numeric AS mrr_run_rate
Keeping the day that we are looking at the MRR run rate is important for display purposes, which is why it's included in the SELECT
list. However, the key part of this query is this CASE
statement:
CASE
WHEN tstzrange(dates.day, dates.day + '1 day'::interval) && subscription.period THEN
subscription.rate
ELSE
0
END
What's happening here? First, we're checking if a subscription happened to be active on a given day. We can use a range type "overlap" query there: we construct a range consisting of the present day and see if it happens to overlap with the subscription. If it does, we can include the subscription in the MRR projection. Otherwise, we're going to return 0
.
Given this is a projection, we don't necessarily need to look at how long a subscription was active for that given day, we just want a directional sense of what our MRR is. And now we have analyzed the heart of our query:
SELECT
dates.day,
SUM(
CASE
WHEN tstzrange(dates.day, dates.day + '1 day'::interval) && subscription.period THEN
subscription.rate
ELSE
0
END) / 100::numeric AS mrr_run_rate
FROM subscription,
LATERAL (
SELECT *
FROM generate_series('2021-01-01'::date, CURRENT_TIMESTAMP, '1 day'::interval) AS day
) dates
WHERE subscription.team_id not in (
SELECT id
FROM account
WHERE billable
)
GROUP BY dates.day
which gives us our MRR rate!
Formatting The Report + ARR
Our full query had an additional part, an outer layer that look liked:
SELECT
to_char(rates.day, 'Mon DD, YYYY') AS day,
to_char(rates.mrr_run_rate, '$99,999,999,999D99') AS mrr_run_rate,
to_char(12 * rates.mrr_run_rate, '$99,999,999,999D99') AS arr_run_rate
FROM (
-- ...
) rates
ORDER BY rates.day DESC;
What's happening here?
First, for convenience we're ordering our data from most recent to oldest: typically that is what we're most interested in when looking at the current MRR projection. This is handled with the ORDER BY
clause:
ORDER BY rates.day DESC;
Now, we have a series of calls to the to_char
formatting function:
SELECT
to_char(rates.day, 'Mon DD, YYYY') AS day,
to_char(rates.mrr_run_rate, '$99,999,999,999D99') AS mrr_run_rate,
to_char(12 * rates.mrr_run_rate, '$99,999,999,999D99') AS arr_run_rate
All of these allow for some nicer display of data, e.g:
SELECT
to_char(CURRENT_TIMESTAMP, 'Mon DD, YYYY') AS day,
to_char(123456789, '$99,999,999,999D99') AS mrr_run_rate,
to_char(12 * 123456789, '$99,999,999,999D99') AS arr_run_rate
yields:
day | mrr_run_rate | arr_run_rate
--------------+---------------------+---------------------
Oct 15, 2021 | $ 123,456,789.00 | $ 1,481,481,468.00
Projecting the ARR run rate is fairly straightforward: take the MRR run rate and multiply by 12
.
We could have done this in the inner part of the query, but for readability purposes it's a bit easier to write the query like this and you will not pay a penalty for it.
And voilà, this gives you a nice, presentable MRR/ARR run rate chart!
Bonus: Getting Month-over-Month MRR Growth
As part of this exercise, I was also tasked with finding the month-over-month (MoM) MRR growth. This yield the following query:
SELECT
to_char(rates.day, 'Mon DD, YYYY') AS day,
to_char(rates.mrr_run_rate, '$99,999,999,999D99') AS mrr_run_rate,
to_char(12 * rates.mrr_run_rate, '$99,999,999,999D99') AS arr_run_rate,
to_char(CASE
WHEN COALESCE(lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC), 0) <> 0 THEN
(rates.mrr_run_rate - lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)) /
lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)
ELSE 0
END * 100, '9,999,999D99%') AS mrr_run_rate_mom
FROM (
SELECT
dates.day,
SUM(
CASE
WHEN tstzrange(dates.day, dates.day + '1 day'::interval) && subscription.period THEN
subscription.rate
ELSE
0
END) / 100::numeric AS mrr_run_rate
FROM subscription,
LATERAL (
SELECT *
FROM generate_series('2021-01-01'::date, CURRENT_TIMESTAMP, '1 day'::interval) AS day
) dates
WHERE subscription.team_id not in (
SELECT id
FROM account
WHERE billable
)
GROUP BY dates.day
) rates
ORDER BY rates.day DESC;
The key portion is this:
to_char(CASE
WHEN COALESCE(lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC), 0) <> 0 THEN
(rates.mrr_run_rate - lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)) /
lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)
ELSE 0
END * 100, '9,999,999D99%') AS mrr_run_rate_mom
and really, the portion is the lead window function:
lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)
What this function call does is that it looks 30 rows back in the query to find what the MRR run rate was on that given day. Once we know that value, we can calculate the MoM growth:
(rates.mrr_run_rate - lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)) /
lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)
Why do we have a COALESCE
statement?
COALESCE(lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC), 0)
The way this query is constructed, we will get to a point where 30 previous rows will not exist, which makes it impossible to find the MoM value. While this makes the data NULL
, I use COALESCE to check for 0
values as well, since we can't divide by 0
. If we wanted to determine the MoM growth for, say, January 2, 2021, we would need to also include the previous 30 dates in the query, and then ultimately exclude those from the display.
The Power of Postgres
Being able to get data insights like the above is one of the many reasons why I love working with Postgres so much. This example combines many of my favorite PostgreSQL features: date/time support, range types, conditional expressions, window functions, LATERAL joins, and more. Instead of having to pull my raw data into a spreadsheet or a different programming language, I can get the insights I need directly with a few lines of SQL.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read