Easy Totals and Subtotals in Postgres with Rollup and Cube
Postgres is being used more and more for analytical workloads. There’s a few hidden gems I recently ran across that are really handy for doing SQL for data analysis, ROLLUP
and CUBE
. Rollup and cube don’t get a lot of attention, but follow along with me in this post to see how they can save you a few steps and enhance your date binning and summary reporting.
We also have a web based tutorial that covers Postgres Functions for Rolling Up Data by Date if you want to try it yourself with a sample data set.
Superpowered Group By
Before we dig into rollup and cube, let’s look at GROUP BY
statements. Here’s an example query where I want to get totals for all my months of sales and categories.
Using to_char
with date_trunc
I can roll up things by month. With GROUP BY
I can rollup data by category.
-- Get totals for each month and category
SELECT
to_char(date_trunc('month', order_date), 'FMMonth YYYY') AS month,
category,
COUNT(*) AS total_orders,
SUM(total_amount) AS total_amount
FROM
orders
GROUP BY
date_trunc('month', order_date), category
ORDER BY
date_trunc('month', order_date), category;
month | category | total_orders | total_amount
----------------+-------------+--------------+--------------
October 2021 | Books | 3 | 2375.73
October 2021 | Clothing | 18 | 13770.09
October 2021 | Computers | 17 | 13005.87
If I wanted to get subtotals for years, I would have to pull this into Excel or write separate select statements and unions. You would have to add a lot more SQL in snippets and section like this:
-- Get total for each month
....
UNION ALL
SELECT
to_char(date_trunc('month', order_date), 'FMMonth YYYY') AS month,
NULL AS category,
COUNT(*) AS total_orders,
SUM(total_amount) AS total_amount
FROM
orders
GROUP BY
date_trunc('month', order_date)
ORDER BY
date_trunc('month', order_date)
-- Get grand total across all months and categories
...
UNION ALL
SELECT
NULL AS month,
NULL AS category,
COUNT(*) AS total_orders,
SUM(total_amount) AS total_amount
FROM
orders
ROLLUP
and CUBE
will do the subtotals all for you though! Let’s take a closer look.
GROUP BY ROLLUP
ROLLUP
is an extension you can add to the GROUP BY
clause. When you use it, ROLLUP
will give you both individual bins of totals and a sub-total. Here’s an example where I just add ROLLUP
to my group by.
SELECT
to_char(date_trunc('month', order_date), 'FMMonth YYYY') AS month,
category,
COUNT(*) AS total_orders,
SUM(total_amount) AS total_amount
FROM
orders
GROUP BY
ROLLUP (date_trunc('month', order_date), category)
ORDER BY
date_trunc('month', order_date), category;
If you see in this example, the sub-total has a null value for the category.
month | category | total_orders | total_amount
----------------+-------------+--------------+--------------
October 2021 | Books | 3 | 2375.73
October 2021 | Clothing | 18 | 13770.09
October 2021 | Computers | 17 | 13005.87
October 2021 | Electronics | 25 | 16358.96
October 2021 | | 63 | 45510.65
These null values represent the subtotals.
You can probably see that this is a really handy function for reporting. Rollup will give you big batches of things, including things with NULL
values. If you want to do a quick survey of your products or data by certain categories, rollup is a great tool. You can combine that with date_trunc
to get a rollup of categories by any date bin.
GROUP BY CUBE
The CUBE
function takes the rollup one step further and does subtotals and grand totals across all the dimensions you’ve queried. Very similar to ROLLUP, we can look at both dates and categories of sales. Again, in this example, I just add CUBE to the GROUP BY statement.
SELECT
to_char(date_trunc('month', order_date), 'FMMonth YYYY') AS month,
category,
COUNT(*) AS total_orders,
SUM(total_amount) AS total_amount
FROM
orders
GROUP BY
CUBE (date_trunc('month', order_date), category)
ORDER BY
date_trunc('month', order_date), category;
month | category | total_orders | total_amount
----------------+-------------+--------------+--------------
October 2024 | Books | 9 | 5574.92
October 2024 | Clothing | 19 | 11856.80
October 2024 | Computers | 22 | 13002.10
October 2024 | Electronics | 50 | 34251.83
October 2024 | | 100 | 64685.65
| Books | 521 | 328242.79
| Clothing | 1133 | 739866.25
| Computers | 1069 | 680817.70
| Electronics | 2709 | 1707713.80
| | 5432 | 3456640.54
Like ROLLUP these sub-totals aren’t labeled, they have null values representing the totals, like this:
Label ROLLUP and CUBE totals with COALESCE
I find these null values for the sub-totals kind of strange. If you’re like more, or sharing these raw reports with several people, you might want labels instead of these null values. You can use the COALESCE
function to do some basic renaming of the null values. COALESCE is commonly used in cases like these when you want to handle NULL
values in queries.
Here’s a sample where COALESCE comes before each category and time bin so when we add the group by cube below, the sub-totals are labeled.
SELECT
COALESCE(to_char(date_trunc('month', order_date), 'FMMonth YYYY'), 'Grand Total') AS month,
COALESCE(category, 'Subtotal') AS category,
COUNT(*) AS total_orders,
SUM(total_amount) AS total_amount
FROM
orders
GROUP BY
CUBE (date_trunc('month', order_date), category)
ORDER BY
date_trunc('month', order_date), category;
month | category | total_orders | total_amount
----------------+-------------+--------------+--------------
October 2024 | Books | 9 | 5574.92
October 2024 | Clothing | 19 | 11856.80
October 2024 | Computers | 22 | 13002.10
October 2024 | Electronics | 50 | 34251.83
October 2024 | Subtotal | 100 | 64685.65
Grand Total | Books | 521 | 328242.79
Grand Total | Clothing | 1133 | 739866.25
Grand Total | Computers | 1069 | 680817.70
Grand Total | Electronics | 2709 | 1707713.80
Grand Total | Subtotal | 5432 | 3456640.54
Summary
If you need to do date binning or rollups for your data by date, check out rollup and cube. They’re super easy additions to the GROUP BY function that will do your subtotals and grand totals.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read