Fun with SQL in Postgres: Finding Revenue Accrued Per Day
I recently wrote an example of how you can project monthly recurring revenue (MRR) in Postgres. This is a helpful metric to understand how a subscription-based business is doing and can help inform all sorts of financial and operational decisions at the company.
Since writing that example, my same friend running their SaaS business on Crunchy Bridge wanted to find out how much revenue they were accruing per day over the course of a month. When a new month started, the accrued revenue would reset. Think of this metric, as a way to know what you are actually billing in that month, so you have a sense of what your short term cash flow would be.
So the data my friend is looking for would look something like this:
date | revenue (k)
------------+-------------
2021-11-02 | 200
2021-11-01 | 100
2021-10-31 | 3100
2021-10-30 | 3000
As before, my friend has a utility model model: while it is a subscription-based business, the subscriptions can start/stop at any time. Thus, trying to calculate the accrual is a bit tricky.
Fortunately, Postgres is well prepared to handle this. Let's see how we can do it.
Finding Revenue Accrued Per Day
Recall that our original query to calculate the projected MRR/ARR looks like this:
SELECT
to_char(rates.day, 'Mon DD, YYYY') AS day,
to_char(rates.mrr_run_rate, '$99,999,999,999D99') AS mrr_run_rate,
to_char(12 * rates.mrr_run_rate, '$99,999,999,999D99') AS arr_run_rate,
to_char(CASE
WHEN COALESCE(lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC), 0) <> 0 THEN
(rates.mrr_run_rate - lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)) /
lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)
ELSE 0
END * 100, '9,999,999D99%') AS mrr_run_rate_mom
FROM (
SELECT
dates.day,
SUM(
CASE
WHEN tstzrange(dates.day, dates.day + '1 day'::interval) && subscription.period THEN
subscription.rate
ELSE
0
END) / 100::numeric AS mrr_run_rate
FROM subscription,
LATERAL (
SELECT *
FROM generate_series('2021-01-01'::date, CURRENT_TIMESTAMP, '1 day'::interval) AS day
) dates
WHERE subscription.org_id not in (
SELECT id
FROM account
WHERE billable
)
GROUP BY dates.day
) rates
ORDER BY rates.day DESC;
Let's modify the query to find revenue accrued per day over a given month:
SELECT
to_char(rates.month_accrual, '$99,999,999,999D99') AS monthly_accrual_to_date
FROM (
SELECT
dates.day,
SUM(
subscription.rate * CASE
WHEN upper_inf(subscription.period) AND tstzrange(date_trunc('month', dates.day), dates.day + '1 day'::interval) && subscription.period
THEN EXTRACT('epoch' FROM dates.day + '1 day'::interval - GREATEST(lower(period), date_trunc('month', dates.day)))::numeric
WHEN tstzrange(date_trunc('month', dates.day), dates.day + '1 day'::interval) && subscription.period THEN
EXTRACT('epoch' FROM LEAST(upper(period), dates.day + '1 day'::interval) - GREATEST(lower(period), date_trunc('month', dates.day)))::numeric
ELSE 0
END
) / (86400 * 30) / 100::numeric AS month_accrual
FROM subscription,
LATERAL (
SELECT *
FROM generate_series('2021-01-01'::date, CURRENT_TIMESTAMP, '1 day'::interval) AS day
) dates
WHERE subscription.org_id not in (
SELECT id
FROM account
WHERE billable
)
GROUP BY dates.day
) rates
ORDER BY rates.day DESC;
So what's going on? The heart of the query is this portion here:
SUM(
subscription.rate * CASE
WHEN upper_inf(subscription.period) AND tstzrange(date_trunc('month', dates.day), dates.day + '1 day'::interval) && subscription.period
THEN EXTRACT('epoch' FROM dates.day + '1 day'::interval - GREATEST(lower(period), date_trunc('month', dates.day)))::numeric
WHEN tstzrange(date_trunc('month', dates.day), dates.day + '1 day'::interval) && subscription.period
THEN
EXTRACT('epoch' FROM LEAST(upper(period), dates.day + '1 day'::interval) - GREATEST(lower(period), date_trunc('month', dates.day)))::numeric
ELSE 0
END
) / (86400 * 30) / 100::numeric AS month_accrual
Let's break it down.
Recall that we are trying to get the amount of revenue accrued on a given day over the course of a month. In our utility model, we need to consider that a subscription can start and stop within a day, or may have not started yet on that day, or may have already stopped. That's a lot of cases to consider!
We go through each day in the query (e.g. 11/01/2021, 11/02/2021 etc.) and determine how much revenue was accrued on that day. Here is how the logic works in the CASE
statement works:
- First, the query checks to see if a subscription is currently active (i.e. the range it is active in has an "infinite" upper bound). If the subscription is active, let's then check to see if it is active on the current day that we are evaluating. If it is, calculate how much revenue was accrued.
- Otherwise, if the subscription is no longer active, check to see if the subscription was active on the day in question. If it is, calculated how much revenue was accrued.
Let's go one step further into the code. We evaluate the first case with this statement:
WHEN upper_inf(subscription.period) AND tstzrange(date_trunc('month', dates.day), dates.day + '1 day'::interval) && subscription.period
This checks to see if the subscription is still active (upper_inf(subscription.period)
) and the current day we're evaluating (e.g. November 2, 2021) overlaps with the subscription (tstzrange(date_trunc('month', dates.day), dates.day + '1 day'::interval) && subscription.period
). If it does, we perform the revenue accrual calculation:
THEN EXTRACT('epoch' FROM dates.day + '1 day'::interval - GREATEST(lower(period), date_trunc('month', dates.day)))::numeric
What does this do? We're trying to get the number of seconds this subscription was for the month in question up until this day. Huh?
Let's break this down:
dates.day + '1 day'::interval - GREATEST(lower(period), date_trunc('month', dates.day))
The first thing we do is calculate the upper part of our day, i.e. we start from the end of the current day (dates.day + '1 day'::interval
). At the other end, we want to subtract either from the beginning of the month or when the subscription started, whichever was more recent (GREATEST(lower(period), date_trunc('month', dates.day)
).
Finally, we need to get the total number of seconds this subscription was active for. We can do that using the PostgreSQL EXTRACT
function and doing it from the 'epoch'
. Thus, we end up with:
EXTRACT('epoch' FROM dates.day + '1 day'::interval - GREATEST(lower(period), date_trunc('month', dates.day)))::numeric
Now let's look at the other condition: a subscription that is no longer active:
WHEN tstzrange(date_trunc('month', dates.day), dates.day + '1 day'::interval) && subscription.period
THEN
EXTRACT('epoch' FROM LEAST(upper(period), dates.day + '1 day'::interval) - GREATEST(lower(period), date_trunc('month', dates.day)))::numeric
Our WHEN
clause checks to see if the subscription overlapped on the day in question. The calculation is then similar to the one for the subscription that is still ongoing.
Now we have how much revenue was accrued for a subscription on a given day. We can multiply that by the value of the subscription and sum it up for the given day:
SUM(
subscription.rate * CASE ...
) / (86400 * 30) / 100::numeric AS month_accrual AS month_accrual
...
GROUP BY days.day
However, we're not done yet. Notice this part:
SUM(...) / (86400 * 30) / 100::numeric AS month_accrual
Why are we dividing by these numbers?
Months have different numbers of days. For convenience, my friend's model normalizes each month to 30 days. Additionally, while we calculate the total number of seconds each subscription lasted, the rate
is actually stored as a monthly value, so we need to extract that by divinding by the total number of seconds in a month (86400
). Finally, the rate
is stored in cents; dividing by 100 moves it to dollars.
Last but not least, we can pretty up the display of the monthly accrual using the to_char
function:
SELECT to_char(rates.month_accrual, '$99,999,999,999D99') AS monthly_accrual_to_date
Conclusion
There is more than one way to calculate a running accrual of revenue using Postgres. This method made sense for my friend's utility model. It may make more sense to use window functions in some other models.
That said, this again showcases how powerful PostgreSQL is for all types of reporting. Instead of having to pull the raw data into a spreadsheet or an application to perform the calculations, you can write a query that does all the work and the only data you have to transfer are the results!
Related Articles
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read
- Smarter Postgres LLM with Retrieval Augmented Generation
6 min read