Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Bob Pacheco
Bob Pacheco
Postgres clusters should be updated regularly and have routine maintenance. This regular maintenance is often referred to as “Day 2 operations” and can include a wide variety of tasks like restarting services, resetting passwords, or updating versions. Performing Day 2 operations can be complex and time consuming, especially if you are supporting a large number of Postgres clusters.
With the adoption of GitOps and a little help from continuous delivery tools like Argo CD
Paul Ramsey
Paul Ramsey
Over and over when I look at applications for performance, the lesson I learn and re-learn is, do more things right inside the database.
Create, read, update, delete! All the things you do to a table or collection of tables to work with your ever-changing data.
Most CRUD examples, and most CRUD thinking, tend to focus on one table at a time. That's easy to understand. It's also unrealistic. Even the simplest application will be working with several interlinked normalized tables.
Here's our working example tables.
Greg Sabino Mullane
Greg Sabino Mullane
This article will contain spoilers both on how I solved 2022 Day 15's challenge "Beacon Exclusion Zone" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.
Craig Kerstiens
Craig Kerstiens
For developers who have been around SQL long enough, they know that there is often more than one way to get the same result. Today I wanted to look at a few different ways to aggregate and show a total with paid and unpaid status. First, we will use a common table expression (CTE), which is a nice method for organizing subqueries. Second, we use CASE
statements within aggregation context to filter out the values we want. Lastly, we use FILTER
to clean up the syntax, but effectively do the same as the CASE
statement.
We've loaded a sample data set and a hands on tutorial in our browser based Postgres playground.
The report we're going to work to generate is a monthly report of revenue from an invoices table. We'll want our end report to look something like:
mnth | billed | uncollected | collected
------------+---------+-------------+----------
2023-02-01 | 1498.06 | 1498.06 | 0
2023-01-01 | 2993.95 | 1483.04 | 1510.91
2022-12-01 | 1413.17 | 382.84 | 1030.33
2022-11-01 | 1378.18 | 197.52 | 1180.66
2022-10-01 | 1342.91 | 185.03 | 1157.88
2022-09-01 | 1299.90 | 88.01 | 1211.89
2022-08-01 | 1261.97 | 85.29 | 1176.68
First, let’s look at the underlying data. Show the details of the invoices table:
Table "public.invoices"
Column | Type | Collation | Nullable | Default
--------------------+--------------------------------+-----------+----------+--------------------------------------
id | bigint | | not null | nextval('invoices_id_seq'::regclass)
account_id | integer | | |
net_total_in_cents | integer | | |
invoice_period | daterange | | |
status | text | | |
created_at | timestamp(6) without time zone | | not null |
updated_at | timestamp(6) without time zone | | not null |
Indexes:
"invoices_pkey" PRIMARY KEY, btree (id)
"index_invoices_on_account_id" btree (account_id)
We’ve made a few stylistic choices here for this invoices table.
The invoice_period
is a date range. The lower value on the range is the start of the period, and the upper value is the end of the period. To To extract the first value of this range, use lower(invoice_period)
.
We name the net_total_in_cents
field because it contains cents instead of dollars, and we store it as an integer instead of a float. This prevents fractional cents.
Now, let’s look at a few records:
SELECT * FROM invoices LIMIT 3;
Which returns:
id | account_id | net_total_in_cents | invoice_period | status | created_at | updated_at
----+------------+--------------------+-------------------------+--------+----------------------------+----------------------------
1 | 4 | 1072 | [2022-02-01,2022-02-28) | paid | 2023-02-01 17:28:37.420197 | 2023-02-01 17:28:37.420197
2 | 6 | 955 | [2022-02-01,2022-02-28) | paid | 2023-02-01 17:28:37.422361 | 2023-02-01 17:28:37.422361
3 | 7 | 322 | [2022-02-01,2022-02-28) | paid | 2023-02-01 17:28:37.423726 | 2023-02-01 17:28:37.423726
(3 rows)
Let’s start with an example that uses CTEs (Common Table Expressions) or sometimes referred to as WITH
clauses.
At first, the following can look complex, but just think of it as 3 different queries aggregated into a single query. The billed
, collected
, and invoiced
queries find their respective values for each month, then the final query joins those values together based on the mnth
value.
WITH totals AS (
SELECT
lower(i.invoice_period) as mnth,
SUM(i.net_total_in_cents) / 100.0 as net_total
FROM invoices i
GROUP BY 1
), collected AS (
SELECT
lower(i.invoice_period) as mnth,
SUM(i.net_total_in_cents) / 100.0 as amount
FROM invoices i
WHERE status = 'paid'
GROUP BY 1
), invoiced AS (
SELECT
lower(i.invoice_period) AS mnth,
sum(i.net_total_in_cents) / 100.0 AS amount
FROM invoices i
WHERE status = 'invoiced'
GROUP BY 1
)
SELECT totals.mnth,
totals.net_total as billed,
COALESCE(invoiced.amount, 0) as uncollected,
COALESCE(collected.amount, 0) as collected
FROM totals
LEFT JOIN invoiced ON totals.mnth = invoiced.mnth
LEFT JOIN collected on totals.mnth = collected.mnth
ORDER BY 1 desc;
The output will show one record per month with the total amount, uncollected amount, and collected amount for the invoices. To experiment dissecting this SQL, you can pluck each of the WITH
statements and run them individually to see how they respond.
Another option here is leveraging the CASE
statement for filtering in how we want to aggregate. For this scenario, the query will be significantly shorter than the use of CTEs.
SELECT LOWER(i.invoice_period) as mnth,
SUM(i.net_total_in_cents) / 100.0 as billed,
SUM (CASE WHEN status = 'invoiced' THEN i.net_total_in_cents END) / 100.00 as uncollected,
SUM (CASE WHEN status = 'paid' THEN i.net_total_in_cents END) / 100.00 as collected
FROM invoices i
GROUP BY 1
ORDER BY 1 desc;
Think of CASE
as an IF/THEN
statement that returns the net_total_in_cents
for the SUM
if a condition is met. These types of case statements are fairly common in data analysis. But there is another option: FILTER
FILTER is functionally similar to the CASE statement, but makes the SQL a bit more readable:
SELECT LOWER(i.invoice_period) as mnth,
SUM (i.net_total_in_cents) / 100.0 as billed,
SUM (i.net_total_in_cents) FILTER (WHERE status = 'invoiced') / 100.00 as uncollected,
SUM (i.net_total_in_cents) FILTER (WHERE status = 'paid') / 100.00 as collected
FROM invoices i
GROUP BY 1
ORDER BY 1 desc;
From the standpoint of making your Postgres syntax as easy to read and streamlined, FILTER
is a good tool to keep in your pocket!
Each of these approaches can work just fine, but in this scenario, using FILTER
keeps the SQL cleaner. Experts may use any of these approaches, depending on the situation. Generally, our recommendation is: choose the approach that makes your SQL readable.
Craig Kerstiens
Craig Kerstiens
Postgres is a great database with a ton of features including really rich indexing. Postgres itself maintains all sorts of data under the covers about things like cache hits
Elizabeth Christensen
Elizabeth Christensen
Custom data types is one of those many features that makes PostgreSQL flexible for development of a huge variety of business and application use cases. Data types will help you primarily for data integrity, ensuring your data is stored in the database as you want it to be. A common surprise gift of using strict data types is that you can end up reducing your long term data maintenance.
There’s two main ways to customize data types in Postgres:
DOMAIN
Ben Blattberg
Ben Blattberg
Kubernetes was developed originally as an orchestration system for stateless applications. Today, Kubernetes is the backbone of countless full stack applications with, notably, a database as part of the stack. So, a question we often hear is:
How can Kubernetes be the foundation of that most stateful application of all, the database?
Greg Sabino Mullane
Greg Sabino Mullane
This article will contain spoilers both on how I solved 2022 Day 14's challenge "Regolith Reservoir" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.
Greg Sabino Mullane
Greg Sabino Mullane
This article will contain spoilers both on how I solved 2022 Day 13's challenge "Distress Signal" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.
Greg Sabino Mullane
Greg Sabino Mullane
This article will contain spoilers both on how I solved 2022 Day 12's challenge "Hill Climbing Algorithm" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.